runtime: stack allocator, separate from mallocgc
In order to move malloc to Go, we need to have a
separate stack allocator. If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.
Stacks are the last remaining FlagNoGC objects in the
GC heap. Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)
Fixes issue 7468
Fixes issue 7424
On Thu, Jun 19, 2014 at 8:51 PM, <dvyukov@google.com> wrote: > > https://codereview.appspot.com/104200047/diff/90001/src/ > pkg/runtime/malloc.h ...
10 years, 9 months ago
(2014-06-20 04:50:29 UTC)
#3
On Thu, Jun 19, 2014 at 8:51 PM, <dvyukov@google.com> wrote:
>
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/malloc.h
> File src/pkg/runtime/malloc.h (right):
>
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/malloc.h#newcode257
> src/pkg/runtime/malloc.h:257: uint64 stacks_sys; // always 0
> you actually set it in mgc0.c
>
>
Sorry, a bit unclear. We never set runtime.h:MStats.stacks_sys. We do
set mem.go:MemStats.StackSys. They have identical layouts and mgc0.c uses
the C field to set what is really the Go field. The type aliasing is ugly
and should go away at some point. Probably more accurate is to say we
never set the instance of the field mstats.stacks_sys.
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/malloc.h#newcode315
> src/pkg/runtime/malloc.h:315: struct StackFreeList {
> { on next line
>
> https://codereview.appspot.com/104200047/diff/90001/src/pkg/runtime/mgc0.c
> File src/pkg/runtime/mgc0.c (right):
>
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/mgc0.c#newcode1280
> src/pkg/runtime/mgc0.c:1280: if(s->state != MSpanInUse)
> does the following condition fail for non MSpanInUse spans?
> it was a useful sanity check
>
>
Yes, for MSpanStack spans the sweepgen is not updated. It only makes sense
for GC-tracked spans.
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/mgc0.c#newcode2152
> src/pkg/runtime/mgc0.c:2152: c = p->mcache;
> do we want to call runtime·stackcache_clear(c) here?
> one one hand, the stack cache is bounded; but on the other hand is we
> flush it, then that spans can be released to OS later
> I would flush it here, and then if we see that it negatively affects
> performance remove it
>
>
Sure, I'll add it.
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/mheap.c
> File src/pkg/runtime/mheap.c (right):
>
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/mheap.c#newcode294
> src/pkg/runtime/mheap.c:294:
> remove
>
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/mheap.c#newcode421
> src/pkg/runtime/mheap.c:421: //runtime·printf("spannew %p\n", v);
> remove
>
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/mheap.c#newcode476
> src/pkg/runtime/mheap.c:476: s->types.compression = MTypes_Empty;
> why do you move it?
> it won't do harm in MHeap_FreeSpanLocked
>
>
Type info only applies to GC spans. Stack spans don't have and don't care
about types.
> https://codereview.appspot.com/104200047/diff/90001/src/pkg/runtime/proc.c
> File src/pkg/runtime/proc.c (right):
>
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/proc.c#newcode2442
> src/pkg/runtime/proc.c:2442: #pragma textflag NOSPLIT
> why does it need to be nosplit?
>
>
I get "no mcache" errors without it. The problem is that the stack may
split at the start of acquirep, and when acquirep is called from procresize
m->mcache is nil.
procresize:
...
m->mcache = nil;
p = runtime·allp[0];
p->m = nil;
p->status = Pidle;
acquirep(p);
acquirep:
...
m->mcache = p->mcache;
Between the m->mcache = nil and m->mcache = p->mcache, we can't grow the
stack.
Long term, I think procresize should run on the M stack not the G stack.
But not this change.
And see acquirep comment below.
https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/proc.c#newcode2449
> src/pkg/runtime/proc.c:2449: //runtime·printf("acquirep: p->m=%p(%d)
> p->status=%d\n", p->m, p->m ? p->m->id : 0, p->status);
> why do you comment it?
>
>
Because we overflow the nosplit region with that print in there. Yuck, but
it's the easiest fix.
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/stack.c
> File src/pkg/runtime/stack.c (right):
>
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/stack.c#newcode41
> src/pkg/runtime/stack.c:41: for(i = 0; i < NumStackOrders; i++) {
> drop {}
>
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/stack.c#newcode187
> src/pkg/runtime/stack.c:187: if(m->mcache == nil)
> hummm... I am sure that exitsyscall was splitting stack w/o P very
> recently
> it such case we are allocating fixed size stack from sysalloc
> are you sure it is working? why exitsyscall is not splitting stack
> anymore?
>
>
I'm not sure, but maybe it has to do with acquirep now being nosplit.
That's the only routine that used to be splittable that I can see in the
exitsyscall path.
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/stack.c#newcode219
> src/pkg/runtime/stack.c:219: p = c->stackcache[order].list;
> p is almost always P* in runtime
> please name it differently
>
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/stack.c#newcode729
> src/pkg/runtime/stack.c:729: if(false || StackDebug >= 1)
> remove "false ||"
>
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/stack.c#newcode905
> src/pkg/runtime/stack.c:905: if(false || StackDebug >= 1) {
> remove "false ||"
> you can optionally bump log level
>
>
Old debugging cruft, I've removed it.
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/stack.c#newcode997
> src/pkg/runtime/stack.c:997: if(true) return; // TODO: disable for now
> why is it disabled?
> maybe set runtime·copystack=0 instead?
>
>
Sorry, I forgot to fix this up.
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/stack.c#newcode1011
> src/pkg/runtime/stack.c:1011: if(1 || newsize < PageSize/2) {
> remove "1 ||"
>
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/stack_test.go
> File src/pkg/runtime/stack_test.go (right):
>
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/stack_test.go#newcode290
> src/pkg/runtime/stack_test.go:290: var b [1024 / 4]int32 // makes frame
> about 1KB
> [1024]byte?
>
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/stack_test.go#newcode305
> src/pkg/runtime/stack_test.go:305: const R = 4
> make it a const block:
> const (
> ...
> )
> I don't know why, but that's what people told me
>
> https://codereview.appspot.com/104200047/diff/90001/src/
> pkg/runtime/stack_test.go#newcode307
> src/pkg/runtime/stack_test.go:307: const S = 5
> what is the execution time of this test?
> it looks like it can take significant time on arm builders
>
>
It's 71ms on my laptop. I'm not too worried, but I'll check it on arm just
in case.
> https://codereview.appspot.com/104200047/
>
On Thu, Jun 19, 2014 at 9:50 PM, Keith Randall <khr@google.com> wrote: https://codereview.appspot.com/104200047/diff/90001/src/pkg/runtime/stack.c#newcode187 >> src/pkg/runtime/stack.c:187: ...
10 years, 9 months ago
(2014-06-20 05:04:08 UTC)
#4
On Thu, Jun 19, 2014 at 9:50 PM, Keith Randall <khr@google.com> wrote:
https://codereview.appspot.com/104200047/diff/90001/src/pkg/runtime/stack.c#n...
>> src/pkg/runtime/stack.c:187: if(m->mcache == nil)
>> hummm... I am sure that exitsyscall was splitting stack w/o P very
>> recently
>> it such case we are allocating fixed size stack from sysalloc
>> are you sure it is working? why exitsyscall is not splitting stack
>> anymore?
>>
>
> I'm not sure, but maybe it has to do with acquirep now being nosplit.
> That's the only routine that used to be splittable that I can see in the
> exitsyscall path.
There is also runtime.lock/unlock
https://codereview.appspot.com/104200047/diff/110001/src/pkg/runtime/mcentral.c File src/pkg/runtime/mcentral.c (right): https://codereview.appspot.com/104200047/diff/110001/src/pkg/runtime/mcentral.c#newcode267 src/pkg/runtime/mcentral.c:267: runtime·throw("still in list"); could you make the message a ...
10 years, 9 months ago
(2014-06-20 10:28:59 UTC)
#5
*** Submitted as https://code.google.com/p/go/source/detail?r=318b04f28372 *** runtime: stack allocator, separate from mallocgc In order to move ...
10 years, 8 months ago
(2014-07-01 01:59:33 UTC)
#16
*** Submitted as https://code.google.com/p/go/source/detail?r=318b04f28372 ***
runtime: stack allocator, separate from mallocgc
In order to move malloc to Go, we need to have a
separate stack allocator. If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.
Stacks are the last remaining FlagNoGC objects in the
GC heap. Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)
Fixes issue 7468
Fixes issue 7424
LGTM=rsc, dvyukov
R=golang-codereviews, dvyukov, khr, dave, rsc
CC=golang-codereviews
https://codereview.appspot.com/104200047
This change also increased RSS on garbage-8 from 1.1GB to 1.5GB (+25%): http://goperfd.appspot.com/perfdetail?commit=318b04f28372345dc41cdd79fe86f12921f0c60d&commit0=26db394e3aca97c30ede13804fdd411825610770&kind=builder&builder=linux-amd64&benchmark=garbage Looks like ...
10 years, 8 months ago
(2014-07-01 06:06:29 UTC)
#19
On Tue, Jul 1, 2014 at 10:06 AM, <dvyukov@google.com> wrote: > This change also increased ...
10 years, 8 months ago
(2014-07-16 20:25:01 UTC)
#22
On Tue, Jul 1, 2014 at 10:06 AM, <dvyukov@google.com> wrote:
> This change also increased RSS on garbage-8 from 1.1GB to 1.5GB (+25%):
>
http://goperfd.appspot.com/perfdetail?commit=318b04f28372345dc41cdd79fe86f129...
> Looks like a quite serious degradation.
This can be caused by a memory leak, it's easy to test by running the
benchmark for a longer time. Or by the fact that stacks are somehow
accounted against next_gc, which adds same amount of garbage to the
heap. It's unclear how they must be accounted, because during GC we
compact stacks and potentially recover memory; so they must produce
some GC pressure.
I found a memory leak and fixed it. I haven't tested post-fix though. Hopefully that ...
10 years, 8 months ago
(2014-07-16 21:04:14 UTC)
#23
I found a memory leak and fixed it. I haven't tested post-fix though.
Hopefully that was the only leak.
On Wed, Jul 16, 2014 at 1:24 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Tue, Jul 1, 2014 at 10:06 AM, <dvyukov@google.com> wrote:
> > This change also increased RSS on garbage-8 from 1.1GB to 1.5GB (+25%):
> >
>
http://goperfd.appspot.com/perfdetail?commit=318b04f28372345dc41cdd79fe86f129...
> > Looks like a quite serious degradation.
>
> This can be caused by a memory leak, it's easy to test by running the
> benchmark for a longer time. Or by the fact that stacks are somehow
> accounted against next_gc, which adds same amount of garbage to the
> heap. It's unclear how they must be accounted, because during GC we
> compact stacks and potentially recover memory; so they must produce
> some GC pressure.
>
By the way, how do I reproduce the RSS size measurements you mentioned? On Wed, ...
10 years, 8 months ago
(2014-07-17 04:09:18 UTC)
#24
By the way, how do I reproduce the RSS size measurements you mentioned?
On Wed, Jul 16, 2014 at 2:04 PM, Keith Randall <khr@google.com> wrote:
> I found a memory leak and fixed it. I haven't tested post-fix though.
> Hopefully that was the only leak.
>
>
> On Wed, Jul 16, 2014 at 1:24 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>
>> On Tue, Jul 1, 2014 at 10:06 AM, <dvyukov@google.com> wrote:
>> > This change also increased RSS on garbage-8 from 1.1GB to 1.5GB (+25%):
>> >
>>
http://goperfd.appspot.com/perfdetail?commit=318b04f28372345dc41cdd79fe86f129...
>> > Looks like a quite serious degradation.
>>
>> This can be caused by a memory leak, it's easy to test by running the
>> benchmark for a longer time. Or by the fact that stacks are somehow
>> accounted against next_gc, which adds same amount of garbage to the
>> heap. It's unclear how they must be accounted, because during GC we
>> compact stacks and potentially recover memory; so they must produce
>> some GC pressure.
>>
>
>
I think it is via the go perf dashboard. You could try running a perf ...
10 years, 8 months ago
(2014-07-17 04:15:08 UTC)
#25
I think it is via the go perf dashboard. You could try running a perf
builder locally, but I don't know the specifics of doing such a thing.
On 17 Jul 2014 14:09, "Keith Randall" <khr@google.com> wrote:
> By the way, how do I reproduce the RSS size measurements you mentioned?
>
>
>
> On Wed, Jul 16, 2014 at 2:04 PM, Keith Randall <khr@google.com> wrote:
>
>> I found a memory leak and fixed it. I haven't tested post-fix though.
>> Hopefully that was the only leak.
>>
>>
>> On Wed, Jul 16, 2014 at 1:24 PM, Dmitry Vyukov <dvyukov@google.com>
>> wrote:
>>
>>> On Tue, Jul 1, 2014 at 10:06 AM, <dvyukov@google.com> wrote:
>>> > This change also increased RSS on garbage-8 from 1.1GB to 1.5GB (+25%):
>>> >
>>>
http://goperfd.appspot.com/perfdetail?commit=318b04f28372345dc41cdd79fe86f129...
>>> > Looks like a quite serious degradation.
>>>
>>> This can be caused by a memory leak, it's easy to test by running the
>>> benchmark for a longer time. Or by the fact that stacks are somehow
>>> accounted against next_gc, which adds same amount of garbage to the
>>> heap. It's unclear how they must be accounted, because during GC we
>>> compact stacks and potentially recover memory; so they must produce
>>> some GC pressure.
>>>
>>
>>
>
go get code.google.com/p/go.benchmarks cd $GOPATH/src/code.google.com/p/go.benchmarks go build -o bench0 ./bench/bench.go GOMAXPROCS=8 ./bench0 -bench=garbage -benchmem=1024 -benchtime=5s ...
10 years, 8 months ago
(2014-07-17 04:36:13 UTC)
#26
go get code.google.com/p/go.benchmarks
cd $GOPATH/src/code.google.com/p/go.benchmarks
go build -o bench0 ./bench/bench.go
GOMAXPROCS=8 ./bench0 -bench=garbage -benchmem=1024 -benchtime=5s -benchnum=3
On Thu, Jul 17, 2014 at 8:09 AM, Keith Randall <khr@google.com> wrote:
> By the way, how do I reproduce the RSS size measurements you mentioned?
>
>
>
> On Wed, Jul 16, 2014 at 2:04 PM, Keith Randall <khr@google.com> wrote:
>>
>> I found a memory leak and fixed it. I haven't tested post-fix though.
>> Hopefully that was the only leak.
>>
>>
>> On Wed, Jul 16, 2014 at 1:24 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>
>>> On Tue, Jul 1, 2014 at 10:06 AM, <dvyukov@google.com> wrote:
>>> > This change also increased RSS on garbage-8 from 1.1GB to 1.5GB (+25%):
>>> >
>>> >
http://goperfd.appspot.com/perfdetail?commit=318b04f28372345dc41cdd79fe86f129...
>>> > Looks like a quite serious degradation.
>>>
>>> This can be caused by a memory leak, it's easy to test by running the
>>> benchmark for a longer time. Or by the fact that stacks are somehow
>>> accounted against next_gc, which adds same amount of garbage to the
>>> heap. It's unclear how they must be accounted, because during GC we
>>> compact stacks and potentially recover memory; so they must produce
>>> some GC pressure.
>>
>>
>
Issue 104200047: code review 104200047: runtime: stack allocator, separate from mallocgc
(Closed)
Created 10 years, 9 months ago by khr
Modified 10 years, 8 months ago
Reviewers: gobot, dave_cheney.net, dvyukov
Base URL:
Comments: 26