runtime: use special records hung off the MSpan to
record finalizers and heap profile info. Enables
removing the special bit from the heap bitmap. Also
provides a generic mechanism for annotating occasional
heap objects.
finalizers
overhead per obj
old 680 B 80 B avg
new 16 B/span 48 B
profile
overhead per obj
old 32KB 24 B + hash tables
new 16 B/span 24 B
https://codereview.appspot.com/13314053/diff/6001/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (right): https://codereview.appspot.com/13314053/diff/6001/src/pkg/runtime/mgc0.c#newcode1679 src/pkg/runtime/mgc0.c:1679: // free any special records for this object. This ...
11 years, 9 months ago
(2013-09-24 21:56:17 UTC)
#1
On Tue, Sep 24, 2013 at 2:56 PM, <cshapiro@google.com> wrote: > > https://codereview.appspot.**com/13314053/diff/6001/src/** > pkg/runtime/mgc0.c<https://codereview.appspot.com/13314053/diff/6001/src/pkg/runtime/mgc0.c> ...
11 years, 9 months ago
(2013-09-24 22:56:58 UTC)
#2
On Tue, Sep 24, 2013 at 2:56 PM, <cshapiro@google.com> wrote:
>
> https://codereview.appspot.**com/13314053/diff/6001/src/**
>
pkg/runtime/mgc0.c<https://codereview.appspot.com/13314053/diff/6001/src/pkg/runtime/mgc0.c>
> File src/pkg/runtime/mgc0.c (right):
>
> https://codereview.appspot.**com/13314053/diff/6001/src/**
>
pkg/runtime/mgc0.c#newcode1679<https://codereview.appspot.com/13314053/diff/6001/src/pkg/runtime/mgc0.c#newcode1679>
> src/pkg/runtime/mgc0.c:1679: // free any special records for this
> object.
> This looks quadratic. Why not walk the special list once after all
> objects are processed? If the special record points to an unmarked
> object it can be removed from the linked list. This guarantees one
> traversal of the linked list per span, rather than one traversal per
> each of the n objects.
>
This isn't quadratic - I'm walking the special linked list in parallel with
walking the span. In other words, we never reset to s->specials, traversal
only ever goes forwards.
That said, maybe it would be cleaner to do the special walk separately,
I'll think about it.
>
> https://codereview.appspot.**com/13314053/diff/6001/src/**
>
pkg/runtime/mheap.c<https://codereview.appspot.com/13314053/diff/6001/src/pkg/runtime/mheap.c>
> File src/pkg/runtime/mheap.c (right):
>
> https://codereview.appspot.**com/13314053/diff/6001/src/**
>
pkg/runtime/mheap.c#newcode561<https://codereview.appspot.com/13314053/diff/6001/src/pkg/runtime/mheap.c#newcode561>
> src/pkg/runtime/mheap.c:561: static Special*
> Why not put these objects on the heap rather than in a special area?
>
The tricky part is we have to recursively call malloc/free to do that. Not
impossible, but tricky. I also thought that it would be better to keep the
space for these objects accounted separately. For the profile specials at
least, it's a good idea to keep the space used by the profiler out of the
profile.
>
>
https://codereview.appspot.**com/13314053/<https://codereview.appspot.com/133...
>
On Tue, Sep 24, 2013 at 3:56 PM, Keith Randall <khr@google.com> wrote: > This isn't ...
11 years, 9 months ago
(2013-09-24 23:28:17 UTC)
#3
On Tue, Sep 24, 2013 at 3:56 PM, Keith Randall <khr@google.com> wrote:
> This isn't quadratic - I'm walking the special linked list in parallel
> with walking the span. In other words, we never reset to s->specials,
> traversal only ever goes forwards.
> That said, maybe it would be cleaner to do the special walk separately,
> I'll think about it.
>
Right.
It might not make a difference in this case but I find that the emit order
decisions made by {5,6,8}c for basic blocks is more sensible when the loop
structure is simpler.
> The tricky part is we have to recursively call malloc/free to do that.
> Not impossible, but tricky. I also thought that it would be better to
> keep the space for these objects accounted separately. For the profile
> specials at least, it's a good idea to keep the space used by the profiler
> out of the profile.
>
Okay. You can always preallocate the profile nodes.
On Fri, Dec 27, 2013 at 8:18 AM, <dvyukov@google.com> wrote: > generally looks good, I ...
11 years, 6 months ago
(2013-12-31 20:58:37 UTC)
#8
On Fri, Dec 27, 2013 at 8:18 AM, <dvyukov@google.com> wrote:
> generally looks good, I think it's the right direction
> this flexible "specials" infrastructure will also open some other
> possibilities
>
> I will need to do another pass, here are lots of code
>
> I guess that everything except sweepspan can not affect
> performance/memory consumption, because it's executed very infrequently,
> right?
>
Yes, sweeping is the only performance-critical part. Usually the special
list will be empty so there will be nothing to do during sweep.
>
>
> https://codereview.appspot.com/13314053/diff/19001/src/
> pkg/runtime/malloc.goc
> File src/pkg/runtime/malloc.goc (right):
>
> https://codereview.appspot.com/13314053/diff/19001/src/
> pkg/runtime/malloc.goc#newcode790
> src/pkg/runtime/malloc.goc:790: if(!runtime·addfinalizer(obj.data,
> finalizer.data, nret, fint, ot)) {
> remove trailing tabs
>
> fixed.
> https://codereview.appspot.com/13314053/diff/19001/src/
> pkg/runtime/malloc.goc#newcode791
> src/pkg/runtime/malloc.goc:791: runtime·printf("runtime.SetFinalizer:
> finalizer already set\n");
> remove trailing tabs
>
fixed.
>
> https://codereview.appspot.com/13314053/diff/19001/src/
> pkg/runtime/malloc.h
> File src/pkg/runtime/malloc.h (right):
>
> https://codereview.appspot.com/13314053/diff/19001/src/
> pkg/runtime/malloc.h#newcode344
> src/pkg/runtime/malloc.h:344: enum {
> { on next line
>
> fixed.
> https://codereview.appspot.com/13314053/diff/19001/src/
> pkg/runtime/malloc.h#newcode352
> src/pkg/runtime/malloc.h:352: Special *next; // linked list in span
> please stick * to type in field definitions
> use tabs for indentation of fields
>
> fixed. Although, the majority of the structs in this file use the other
convention (MSpan, MHeap, ...)
> https://codereview.appspot.com/13314053/diff/19001/src/pkg/runtime/mgc0.c
> File src/pkg/runtime/mgc0.c (right):
>
> https://codereview.appspot.com/13314053/diff/19001/src/
> pkg/runtime/mgc0.c#newcode1747
> src/pkg/runtime/mgc0.c:1747: while(special != nil) {
> have you measured performance of this variant vs merged loop in patchset
> 3?
> walking linked lists is always slow, and that other option was kind of
> more amortized and easier to add prefetching. It could be faster.
> I am concerned because the new "mark objects in span freelist" linked
> list walk added 10% to GC pause on some programs.
>
No, I didn't explicitly measure performance. It just made the code
cleaner. Tell me which programs you see a difference on and I can
investigate.
>
> https://codereview.appspot.com/13314053/diff/19001/src/pkg/runtime/mheap.c
> File src/pkg/runtime/mheap.c (right):
>
> https://codereview.appspot.com/13314053/diff/19001/src/
> pkg/runtime/mheap.c#newcode557
> src/pkg/runtime/mheap.c:557: Lock speciallock;
> static?
>
fixed.
>
> https://codereview.appspot.com/13314053/diff/19001/src/
> pkg/runtime/mheap.c#newcode607
> src/pkg/runtime/mheap.c:607: }
> add empty line between functions
fixed.
>
>
https://codereview.appspot.com/13314053/diff/19001/src/
> pkg/runtime/mheap.c#newcode646
> src/pkg/runtime/mheap.c:646: s->fn = f;
> what if I set and remove finalizer for a single block concurrently from
> 2 goroutines?
> as far as I see I can now seriously corrupt internal data structures
> I don't know yet what to do with it
>
True. We should probably delay the release of the specialLock until we set
those fields. I'll think about it.
>
> https://codereview.appspot.com/13314053/diff/19001/src/
> pkg/runtime/mheap.c#newcode683
> src/pkg/runtime/mheap.c:683: runtime·freespecial(Special *s, void *p,
> uintptr size) {
> { on the next line
>
fixed.
>
> https://codereview.appspot.com/13314053/diff/19001/src/
> pkg/runtime/mheap.c#newcode721
> src/pkg/runtime/mheap.c:721: if(offset == s->offset) {
> we need to consider allowing setting finalizers for any byte in a block
> (not necessary beginning of a block)
>
That is not allowed by the spec.
"The argument x must be a pointer to an object allocated by calling new or
by taking the address of a composite literal."
SetFinalizer (malloc.goc) already checks to make sure the pointer is to the
start of an object before calling addfinalizer or removefinalizer.
> https://codereview.appspot.com/13314053/diff/19001/src/
> pkg/runtime/mprof.goc
> File src/pkg/runtime/mprof.goc (right):
>
> https://codereview.appspot.com/13314053/diff/19001/src/
> pkg/runtime/mprof.goc#newcode457
> src/pkg/runtime/mprof.goc:457: proflock.key = 0; // nonempty body to
> keep linker happy.
> remove this function?
>
Sure.
>
> https://codereview.appspot.com/13314053/
>
On Tue, Dec 31, 2013 at 12:58 PM, Keith Randall <khr@google.com> wrote: > > > ...
11 years, 6 months ago
(2013-12-31 21:34:05 UTC)
#9
On Tue, Dec 31, 2013 at 12:58 PM, Keith Randall <khr@google.com> wrote:
>
>
>
> On Fri, Dec 27, 2013 at 8:18 AM, <dvyukov@google.com> wrote:
>
>> generally looks good, I think it's the right direction
>> this flexible "specials" infrastructure will also open some other
>> possibilities
>>
>> I will need to do another pass, here are lots of code
>>
>> I guess that everything except sweepspan can not affect
>> performance/memory consumption, because it's executed very infrequently,
>> right?
>>
> Yes, sweeping is the only performance-critical part. Usually the special
> list will be empty so there will be nothing to do during sweep.
>
>>
>>
>> https://codereview.appspot.com/13314053/diff/19001/src/
>> pkg/runtime/malloc.goc
>> File src/pkg/runtime/malloc.goc (right):
>>
>> https://codereview.appspot.com/13314053/diff/19001/src/
>> pkg/runtime/malloc.goc#newcode790
>> src/pkg/runtime/malloc.goc:790: if(!runtime·addfinalizer(obj.data,
>> finalizer.data, nret, fint, ot)) {
>> remove trailing tabs
>>
>> fixed.
>
>> https://codereview.appspot.com/13314053/diff/19001/src/
>> pkg/runtime/malloc.goc#newcode791
>> src/pkg/runtime/malloc.goc:791: runtime·printf("runtime.SetFinalizer:
>> finalizer already set\n");
>> remove trailing tabs
>>
> fixed.
>
>>
>> https://codereview.appspot.com/13314053/diff/19001/src/
>> pkg/runtime/malloc.h
>> File src/pkg/runtime/malloc.h (right):
>>
>> https://codereview.appspot.com/13314053/diff/19001/src/
>> pkg/runtime/malloc.h#newcode344
>> src/pkg/runtime/malloc.h:344: enum {
>> { on next line
>>
>> fixed.
>
>> https://codereview.appspot.com/13314053/diff/19001/src/
>> pkg/runtime/malloc.h#newcode352
>> src/pkg/runtime/malloc.h:352: Special *next; // linked list in span
>> please stick * to type in field definitions
>> use tabs for indentation of fields
>>
>> fixed. Although, the majority of the structs in this file use the other
> convention (MSpan, MHeap, ...)
>
>
>> https://codereview.appspot.com/13314053/diff/19001/src/pkg/runtime/mgc0.c
>> File src/pkg/runtime/mgc0.c (right):
>>
>> https://codereview.appspot.com/13314053/diff/19001/src/
>> pkg/runtime/mgc0.c#newcode1747
>> src/pkg/runtime/mgc0.c:1747: while(special != nil) {
>> have you measured performance of this variant vs merged loop in patchset
>> 3?
>> walking linked lists is always slow, and that other option was kind of
>> more amortized and easier to add prefetching. It could be faster.
>> I am concerned because the new "mark objects in span freelist" linked
>> list walk added 10% to GC pause on some programs.
>>
>
> No, I didn't explicitly measure performance. It just made the code
> cleaner. Tell me which programs you see a difference on and I can
> investigate.
>
>
>>
>> https://codereview.appspot.com/13314053/diff/19001/src/
>> pkg/runtime/mheap.c
>> File src/pkg/runtime/mheap.c (right):
>>
>> https://codereview.appspot.com/13314053/diff/19001/src/
>> pkg/runtime/mheap.c#newcode557
>> src/pkg/runtime/mheap.c:557: Lock speciallock;
>> static?
>>
> fixed.
>
>>
>> https://codereview.appspot.com/13314053/diff/19001/src/
>> pkg/runtime/mheap.c#newcode607
>> src/pkg/runtime/mheap.c:607: }
>> add empty line between functions
>
> fixed.
>
>>
>>
> https://codereview.appspot.com/13314053/diff/19001/src/
>> pkg/runtime/mheap.c#newcode646
>> src/pkg/runtime/mheap.c:646: s->fn = f;
>> what if I set and remove finalizer for a single block concurrently from
>> 2 goroutines?
>> as far as I see I can now seriously corrupt internal data structures
>> I don't know yet what to do with it
>>
> True. We should probably delay the release of the specialLock until we
> set those fields. I'll think about it.
>
Ok, I reorg'd a bit to fix this case.
>
>> https://codereview.appspot.com/13314053/diff/19001/src/
>> pkg/runtime/mheap.c#newcode683
>> src/pkg/runtime/mheap.c:683: runtime·freespecial(Special *s, void *p,
>> uintptr size) {
>> { on the next line
>>
> fixed.
>
>>
>> https://codereview.appspot.com/13314053/diff/19001/src/
>> pkg/runtime/mheap.c#newcode721
>> src/pkg/runtime/mheap.c:721: if(offset == s->offset) {
>> we need to consider allowing setting finalizers for any byte in a block
>> (not necessary beginning of a block)
>>
> That is not allowed by the spec.
> "The argument x must be a pointer to an object allocated by calling new or
> by taking the address of a composite literal."
> SetFinalizer (malloc.goc) already checks to make sure the pointer is to
> the start of an object before calling addfinalizer or removefinalizer.
>
>
>> https://codereview.appspot.com/13314053/diff/19001/src/
>> pkg/runtime/mprof.goc
>> File src/pkg/runtime/mprof.goc (right):
>>
>> https://codereview.appspot.com/13314053/diff/19001/src/
>> pkg/runtime/mprof.goc#newcode457
>> src/pkg/runtime/mprof.goc:457: proflock.key = 0; // nonempty body to
>> keep linker happy.
>> remove this function?
>>
> Sure.
>
>>
>> https://codereview.appspot.com/13314053/
>>
>
>
Hello cshapiro@google.com, khr@google.com, dvyukov@google.com (cc: golang-codereviews@googlegroups.com), I'd like you to review this change to https://khr%40golang.org@code.google.com/p/go/
11 years, 6 months ago
(2013-12-31 21:37:19 UTC)
#10
11 years, 6 months ago
(2014-01-01 11:40:37 UTC)
#12
On 2013/12/31 21:34:05, khr1 wrote:
> >> https://codereview.appspot.com/13314053/diff/19001/src/
> >> pkg/runtime/mheap.c#newcode721
> >> src/pkg/runtime/mheap.c:721: if(offset == s->offset) {
> >> we need to consider allowing setting finalizers for any byte in a block
> >> (not necessary beginning of a block)
> >>
> > That is not allowed by the spec.
> > "The argument x must be a pointer to an object allocated by calling new or
> > by taking the address of a composite literal."
> > SetFinalizer (malloc.goc) already checks to make sure the pointer is to
> > the start of an object before calling addfinalizer or removefinalizer.
Yes, but I mean that with this change we may consider allowing this, right?
There is no more implementation limitation that prohibited that.
On Wed, Jan 1, 2014 at 3:39 AM, <dvyukov@google.com> wrote: > LGTM > > > ...
11 years, 6 months ago
(2014-01-05 05:54:17 UTC)
#13
On Wed, Jan 1, 2014 at 3:39 AM, <dvyukov@google.com> wrote:
> LGTM
>
>
> https://codereview.appspot.com/13314053/diff/19001/src/pkg/runtime/mgc0.c
> File src/pkg/runtime/mgc0.c (right):
>
> https://codereview.appspot.com/13314053/diff/19001/src/
> pkg/runtime/mgc0.c#newcode1747
> src/pkg/runtime/mgc0.c:1747: while(special != nil) {
> On 2013/12/27 16:18:27, dvyukov wrote:
>
>> have you measured performance of this variant vs merged loop in
>>
> patchset 3?
>
>> walking linked lists is always slow, and that other option was kind of
>>
> more
>
>> amortized and easier to add prefetching. It could be faster.
>> I am concerned because the new "mark objects in span freelist" linked
>>
> list walk
>
>> added 10% to GC pause on some programs.
>>
>
>
> I've seen the GC pause increase here:
> http://goperfd.appspot.com/perfgraph?builder=linux-amd64&
> benchmark=widefinder&procs=1&metric=gc-pause-one&
> startcommit=1450&commitnum=100&refresh=Refresh
>
> https://codereview.appspot.com/13314053/diff/79001/src/
> pkg/runtime/malloc.goc
> File src/pkg/runtime/malloc.goc (right):
>
> https://codereview.appspot.com/13314053/diff/79001/src/
> pkg/runtime/malloc.goc#newcode216
> src/pkg/runtime/malloc.goc:216: runtime·freeallspecials(s, v, size);
> please fast-path it as:
> if(s->specials != nil)
>
> Done.
> free is used at least for select descriptors
>
> https://codereview.appspot.com/13314053/diff/79001/src/
> pkg/runtime/malloc.h
> File src/pkg/runtime/malloc.h (right):
>
> https://codereview.appspot.com/13314053/diff/79001/src/
> pkg/runtime/malloc.h#newcode346
> src/pkg/runtime/malloc.h:346: KindSpecialFinalizer = 1, // must be first
> I do not see where "must be first" is important
> either remove the comment or expand the comment
>
> Done.
> https://codereview.appspot.com/13314053/diff/79001/src/
> pkg/runtime/malloc.h#newcode523
> src/pkg/runtime/malloc.h:523: void runtime·setaddrbucket(void *p,
> Bucket
> *b);
> this marks the object as being profiled, setaddrbucket is not very
> indicative for this
> probably a better name would be something like set/markprofiled
>
> I changed it to setprofilebucket.
> https://codereview.appspot.com/13314053/diff/79001/src/
> pkg/runtime/malloc.h#newcode524
> src/pkg/runtime/malloc.h:524: Bucket* runtime·getaddrbucket(void *p);
> does this function exist? I can not find it.
>
Nope. Deleted.
>
> https://codereview.appspot.com/13314053/diff/79001/src/pkg/runtime/mgc0.c
> File src/pkg/runtime/mgc0.c (right):
>
> https://codereview.appspot.com/13314053/diff/79001/src/
> pkg/runtime/mgc0.c#newcode1645
> src/pkg/runtime/mgc0.c:1645: for(spanidx=0; spanidx<runtime·mheap.nspan;
> spanidx++) {
> it's a bit unfortunate that we need to walk all spans in GC pause, there
> can be hundreds of thousands of spans
> but let's leave it as is for now
>
We already walk all the spans to mark the type info arrays. If needed we
could do this in parallel.
>
> https://codereview.appspot.com/13314053/diff/79001/src/pkg/runtime/mheap.c
> File src/pkg/runtime/mheap.c (right):
>
> https://codereview.appspot.com/13314053/diff/79001/src/
> pkg/runtime/mheap.c#newcode557
> src/pkg/runtime/mheap.c:557: static Lock speciallock;
> I think it's better to move it to MHeap struct definition, since the
> variables it protects are there
> sorry that I said to mark it as static first, now I am looking at code
> more closely
>
Right, that sounds reasonable. Fixed.
>
> https://codereview.appspot.com/13314053/diff/79001/src/
> pkg/runtime/mprof.goc
> File src/pkg/runtime/mprof.goc (right):
>
> https://codereview.appspot.com/13314053/diff/79001/src/
> pkg/runtime/mprof.goc#newcode211
> src/pkg/runtime/mprof.goc:211: if(runtime·debug.allocfreetrace) {
> un-indent by 1 tab
>
Done.
>
> https://codereview.appspot.com/13314053/
>
Well, we could do that anyway. In malloc.goc:SetFinalizer, we call mlookup which finds the MSpan. ...
11 years, 6 months ago
(2014-01-05 05:54:31 UTC)
#14
Well, we could do that anyway. In malloc.goc:SetFinalizer, we call mlookup
which finds the MSpan. We could round down to the start of the object at
that point, if we wanted to.
On Wed, Jan 1, 2014 at 3:40 AM, <dvyukov@google.com> wrote:
> On 2013/12/31 21:34:05, khr1 wrote:
>
>> >> https://codereview.appspot.com/13314053/diff/19001/src/
>> >> pkg/runtime/mheap.c#newcode721
>> >> src/pkg/runtime/mheap.c:721: if(offset == s->offset) {
>> >> we need to consider allowing setting finalizers for any byte in a
>>
> block
>
>> >> (not necessary beginning of a block)
>> >>
>> > That is not allowed by the spec.
>> > "The argument x must be a pointer to an object allocated by calling
>>
> new or
>
>> > by taking the address of a composite literal."
>> > SetFinalizer (malloc.goc) already checks to make sure the pointer is
>>
> to
>
>> > the start of an object before calling addfinalizer or
>>
> removefinalizer.
>
>
> Yes, but I mean that with this change we may consider allowing this,
> right? There is no more implementation limitation that prohibited that.
>
> https://codereview.appspot.com/13314053/
>
*** Submitted as https://code.google.com/p/go/source/detail?r=0dee786a73f7 *** runtime: use special records hung off the MSpan to record ...
11 years, 6 months ago
(2014-01-07 21:45:54 UTC)
#17
*** Submitted as https://code.google.com/p/go/source/detail?r=0dee786a73f7 ***
runtime: use special records hung off the MSpan to
record finalizers and heap profile info. Enables
removing the special bit from the heap bitmap. Also
provides a generic mechanism for annotating occasional
heap objects.
finalizers
overhead per obj
old 680 B 80 B avg
new 16 B/span 48 B
profile
overhead per obj
old 32KB 24 B + hash tables
new 16 B/span 24 B
R=cshapiro, khr, dvyukov, gobot
CC=golang-codereviews
https://codereview.appspot.com/13314053
Issue 13314053: code review 13314053: runtime: use special records hung off the MSpan to
(Closed)
Created 11 years, 9 months ago by khr
Modified 11 years, 6 months ago
Reviewers:
Base URL:
Comments: 22