Issue 9716045: code review 9716045: runtime: simplify runtime·settype()

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+63 lines, -222 lines)			Patch
M	src/pkg/runtime/malloc.h	View	1 2 3 4 5 6 7 8 9 10 11	2 chunks	+8 lines, -16 lines	0 comments	Download
M	src/pkg/runtime/malloc.goc	View	1 2 3 4 5 6 7 8 9 10 11	2 chunks	+45 lines, -177 lines	0 comments	Download
M	src/pkg/runtime/mgc0.c	View	1 2 3 4 5 6 7 8 9 10 11	6 chunks	+5 lines, -21 lines	0 comments	Download
M	src/pkg/runtime/mheap.c	View	1 2 3 4 5 6 7 8 9 10 11	4 chunks	+5 lines, -5 lines	0 comments	Download
M	src/pkg/runtime/runtime.h	View	1 2 3 4 5 6 7 8 9 10 11	1 chunk	+0 lines, -3 lines	0 comments	Download

Messages

Total messages: 35

Expand All Messages | Collapse All Messages

atom

Hello golang-dev@googlegroups.com (cc: dvyukov@google.com, rsc@golang.org), I'd like you to review this change to https://code.google.com/p/go/

12 years, 1 month ago (2013-05-30 14:24:15 UTC) #1

khr

https://codereview.appspot.com/9716045/diff/1007/src/pkg/runtime/malloc.goc File src/pkg/runtime/malloc.goc (right): https://codereview.appspot.com/9716045/diff/1007/src/pkg/runtime/malloc.goc#newcode572 src/pkg/runtime/malloc.goc:572: data2 = runtime·mallocgc(nbytes2, FlagNoProfiling|FlagNoPointers, 0, 1); Don't call malloc ...

12 years, 1 month ago (2013-05-30 17:34:02 UTC) #2

atom

12 years, 1 month ago (2013-05-30 18:25:54 UTC) #3

khr1

I did not see that, sorry. I'm still concerned about the spin lock, though, if ...

12 years, 1 month ago (2013-05-30 19:39:44 UTC) #4

atom

Hello golang-dev@googlegroups.com, khr@golang.org, khr@google.com (cc: dvyukov@google.com, golang-dev@googlegroups.com, rsc@golang.org), Please take another look.

12 years, 1 month ago (2013-05-30 19:52:31 UTC) #5

khr1

LGTM. Have Dmitry or Carl take a look also. On Thu, May 30, 2013 at ...

12 years, 1 month ago (2013-05-30 20:03:22 UTC) #6

cshapiro1

LGTM What, if any, is the performance consequence of this change? https://codereview.appspot.com/9716045/diff/18006/src/pkg/runtime/malloc.goc File src/pkg/runtime/malloc.goc (right): ...

12 years, 1 month ago (2013-05-30 21:59:07 UTC) #7

atom

https://codereview.appspot.com/9716045/diff/18006/src/pkg/runtime/malloc.goc File src/pkg/runtime/malloc.goc (right): https://codereview.appspot.com/9716045/diff/18006/src/pkg/runtime/malloc.goc#newcode509 src/pkg/runtime/malloc.goc:509: uintptr ntypes, nbytes2, nbytes3; On 2013/05/30 21:59:08, cshapiro1 wrote: ...

12 years, 1 month ago (2013-05-31 05:26:23 UTC) #8

dvyukov

The "remove sysalloc" part looks good. But the "remove caching" NOT LGTM. There is already ...

12 years, 1 month ago (2013-05-31 06:33:07 UTC) #9

The "remove sysalloc" part looks good. But the "remove caching" NOT LGTM.
There is already very significant slowdown in the memory subsystem since Go1.1,
and this adds another 6%.

On test/bench/garbage/parser:
before:
garbage.BenchmarkParser 4 2851583751 ns/op
after:
garbage.BenchmarkParser 4 3037543341 ns/op

before:
  9.78%  parser  parser             [.] scanblock                               
                                              ◆
  7.83%  parser  parser             [.] sweepspan                               
                                              ▒
  6.61%  parser  parser             [.] runtime.mallocgc                        
                                              ▒
  6.44%  parser  parser             [.] flushptrbuf                             
                                              ▒
  4.06%  parser  parser             [.] runtime.settype_flush                   
                                              ▒
  3.76%  parser  parser             [.] go/scanner.(*Scanner).next              
                                              ▒
  2.92%  parser  parser             [.] runtime.gettype                         
                                              ▒
  2.44%  parser  parser             [.] runtime.newstack                        
                                              ▒
  2.29%  parser  parser             [.] runtime.memclr                          


after:
  9.82%  parser  parser             [.] scanblock                               
                                              ◆
  7.63%  parser  parser             [.] sweepspan                               
                                              ▒
  6.20%  parser  parser             [.] flushptrbuf                             
                                              ▒
  5.77%  parser  parser             [.] runtime.mallocgc                        
                                              ▒
  5.11%  parser  parser             [.] runtime.xchg                            
                                             ▒
  4.10%  parser  parser             [.] runtime.settype                         
                                             ▒
  3.32%  parser  parser             [.] go/scanner.(*Scanner).next              
                                              ▒
  2.93%  parser  parser             [.] runtime.gettype                         
                                              ▒
  2.45%  parser  parser             [.] runtime.newstack                        
                                              ▒
  2.19%  parser  parser             [.] go/scanner.(*Scanner).Scan              
                                              ▒
  2.13%  parser  parser             [.] runtime.memclr                          
                                              ▒

Note that 5.11% for runtime.xchg.
Malloc really should be just a pop from freelist (or bump pointer), in no way it
must take locks (even if local).

atom

Patchset 5 has runtime·lock() inlined into runtime·settype(). Could you please rerun the benchmark with patchset ...

12 years, 1 month ago (2013-05-31 06:40:27 UTC) #10

Patchset 5 has runtime·lock() inlined into runtime·settype(). Could you please
rerun the benchmark with patchset 5 and state whether the performance of
patchset 5 would be LGTM?

I can rewrite the for(;;) loop into assembler if necessary. 

On 2013/05/31 06:33:07, dvyukov wrote:
> The "remove sysalloc" part looks good. But the "remove caching" NOT LGTM.
> There is already very significant slowdown in the memory subsystem since
Go1.1,
> and this adds another 6%.
> 
> On test/bench/garbage/parser:
> before:
> garbage.BenchmarkParser 4 2851583751 ns/op
> after:
> garbage.BenchmarkParser 4 3037543341 ns/op
> 
> before:
>   9.78%  parser  parser             [.] scanblock                             
 
>                                               ◆
>   7.83%  parser  parser             [.] sweepspan                             
 
>                                               ▒
>   6.61%  parser  parser             [.] runtime.mallocgc                      
 
>                                               ▒
>   6.44%  parser  parser             [.] flushptrbuf                           
 
>                                               ▒
>   4.06%  parser  parser             [.] runtime.settype_flush                 
 
>                                               ▒
>   3.76%  parser  parser             [.] go/scanner.(*Scanner).next            
 
>                                               ▒
>   2.92%  parser  parser             [.] runtime.gettype                       
 
>                                               ▒
>   2.44%  parser  parser             [.] runtime.newstack                      
 
>                                               ▒
>   2.29%  parser  parser             [.] runtime.memclr                        
 
> 
> 
> after:
>   9.82%  parser  parser             [.] scanblock                             
 
>                                               ◆
>   7.63%  parser  parser             [.] sweepspan                             
 
>                                               ▒
>   6.20%  parser  parser             [.] flushptrbuf                           
 
>                                               ▒
>   5.77%  parser  parser             [.] runtime.mallocgc                      
 
>                                               ▒
>   5.11%  parser  parser             [.] runtime.xchg                          
 
>                                              ▒
>   4.10%  parser  parser             [.] runtime.settype                       
 
>                                              ▒
>   3.32%  parser  parser             [.] go/scanner.(*Scanner).next            
 
>                                               ▒
>   2.93%  parser  parser             [.] runtime.gettype                       
 
>                                               ▒
>   2.45%  parser  parser             [.] runtime.newstack                      
 
>                                               ▒
>   2.19%  parser  parser             [.] go/scanner.(*Scanner).Scan            
 
>                                               ▒
>   2.13%  parser  parser             [.] runtime.memclr                        
 
>                                               ▒
> 
> Note that 5.11% for runtime.xchg.
> Malloc really should be just a pop from freelist (or bump pointer), in no way
it
> must take locks (even if local).

atom

I expect the spinlock variable to be zero on entry to settype() in majority of ...

12 years, 1 month ago (2013-05-31 06:51:09 UTC) #11

dvyukov

On 2013/05/31 06:40:27, atom wrote: > Patchset 5 has runtime·lock() inlined into runtime·settype(). Could you ...

12 years, 1 month ago (2013-05-31 07:17:01 UTC) #12

atom

I updated the settype function and fixed the irrational code introduced in the previous changeset. ...

12 years, 1 month ago (2013-05-31 18:38:19 UTC) #14

iant

https://codereview.appspot.com/9716045/diff/35001/src/pkg/runtime/malloc.goc File src/pkg/runtime/malloc.goc (right): https://codereview.appspot.com/9716045/diff/35001/src/pkg/runtime/malloc.goc#newcode564 src/pkg/runtime/malloc.goc:564: if(DebugTypeAtBlockEnd) { This seems unrelated to this CL. https://codereview.appspot.com/9716045/diff/35001/src/pkg/runtime/malloc.goc#newcode577 ...

12 years, 1 month ago (2013-05-31 19:04:17 UTC) #15

atom

https://codereview.appspot.com/9716045/diff/35001/src/pkg/runtime/malloc.goc File src/pkg/runtime/malloc.goc (right): https://codereview.appspot.com/9716045/diff/35001/src/pkg/runtime/malloc.goc#newcode577 src/pkg/runtime/malloc.goc:577: switch(s->types.compression) { On 2013/05/31 19:04:17, iant wrote: > This ...

12 years, 1 month ago (2013-05-31 19:33:11 UTC) #16

atom

12 years, 1 month ago (2013-06-01 16:33:45 UTC) #17

dvyukov

https://codereview.appspot.com/9716045/diff/41001/src/pkg/runtime/malloc.goc File src/pkg/runtime/malloc.goc (right): https://codereview.appspot.com/9716045/diff/41001/src/pkg/runtime/malloc.goc#newcode560 src/pkg/runtime/malloc.goc:560: // The suffix 3 (nbytes3, data3) means that the ...

12 years, 1 month ago (2013-06-03 07:30:52 UTC) #18

dvyukov

https://codereview.appspot.com/9716045/diff/41001/src/pkg/runtime/malloc.goc File src/pkg/runtime/malloc.goc (right): https://codereview.appspot.com/9716045/diff/41001/src/pkg/runtime/malloc.goc#newcode687 src/pkg/runtime/malloc.goc:687: // The probability of the race condition is extremely ...

12 years, 1 month ago (2013-06-03 08:23:34 UTC) #19

dvyukov

Now it's quite debatable whether it's a simplification as the change description says. I've remeasured ...

12 years, 1 month ago (2013-06-03 09:04:17 UTC) #20

atom

Considering what's been written in discussion "Better GC and Memory Allocator" (https://groups.google.com/forum/?fromgroups#!topic/golang-dev/pwUh0BVFpY0) and taking into ...

12 years, 1 month ago (2013-06-03 12:21:51 UTC) #21

dvyukov

On 2013/06/03 12:21:51, atom wrote: > Considering what's been written in discussion "Better GC and ...

12 years, 1 month ago (2013-06-03 12:29:17 UTC) #22

atom

On 2013/06/03 12:29:17, dvyukov wrote: > On 2013/06/03 12:21:51, atom wrote: > > Considering what's ...

12 years, 1 month ago (2013-06-03 13:11:39 UTC) #23

dvyukov

On 2013/06/03 13:11:39, atom wrote: > On 2013/06/03 12:29:17, dvyukov wrote: > > On 2013/06/03 ...

12 years, 1 month ago (2013-06-03 13:23:43 UTC) #24

atom

On 2013/06/03 13:23:43, dvyukov wrote: > Please also measure memory consumption on linux/amd64. You can ...

12 years, 1 month ago (2013-06-03 13:28:56 UTC) #25

atom

On 2013/06/03 13:28:56, atom wrote: > On 2013/06/03 13:23:43, dvyukov wrote: > > Please also ...

12 years, 1 month ago (2013-06-03 14:11:03 UTC) #26

dvyukov

So your benchmarks show 2.22% speedup and 2.86% memory increase. On linux/am64 I see 2.19% ...

12 years, 1 month ago (2013-06-03 14:55:26 UTC) #27

atom

On 2013/06/03 14:55:26, dvyukov wrote: > So your benchmarks show 2.22% speedup and 2.86% memory ...

12 years, 1 month ago (2013-06-03 15:25:28 UTC) #28

atom

On 2013/06/03 14:55:26, dvyukov wrote: > So your benchmarks show 2.22% speedup and 2.86% memory ...

12 years ago (2013-06-05 10:11:30 UTC) #29

dvyukov

On 2013/06/05 10:11:30, atom wrote: > On 2013/06/03 14:55:26, dvyukov wrote: > > So your ...

12 years ago (2013-06-05 11:50:05 UTC) #30

atom

On 2013/06/05 11:50:05, dvyukov wrote: > On 2013/06/05 10:11:30, atom wrote: > > I have ...

12 years ago (2013-06-05 12:29:37 UTC) #31

On 2013/06/05 11:50:05, dvyukov wrote:
> On 2013/06/05 10:11:30, atom wrote:
> > I have a prototype implementation of settype() without mallocgc(). It uses a
> new
> > allocator.
> > 
> > The results:
> > 
> > A = with MTypes_Bytes, with mallocgc in settype
> > B = without MTypes_Bytes, without mallocgc in settype
> > 
> > linux/386
> > 
> > test/bench/garbage/parser.go:
> >   A: 65.190user 2.057system 0:54.463elapsed 123%CPU 390488maxresident
> >   B: 66.798user 2.015system 0:54.812elapsed 125%CPU 381224maxresident
> > 
> > test/bench/garbage/tree2.go:
> >   A: 12.243user 0.192system 0:11.563elapsed 107%CPU 148576maxresident
> >   B: 12.081user 0.219system 0:11.436elapsed 107%CPU 169152maxresident
> > 
> > The user times for parser.go are not comparable because in case B the
> > scanblock() function is consuming about 1 second more time than in case A.
The
> > garbage collector also runs more often in case B when running parser.go, but
> > this is to be expected because of smaller mheap.
> 
> 
> parser looks good. I don't care too much about tree2 for such changes.
> What do you mean by "new allocator"?

https://codereview.appspot.com/10046043

It isn't prepared for code review yet, but you can comment on it. I published it
in advance because this CL (9716045) cannot be LGTMed without seeing the
allocator source code. Please ignore the style of the source code for now.

> Also please allocate the type array directly when the span is allocated for
> small blocks, it will simplify settype() significantly. And free it directly
> when the span is returned to heap, it should be possible with custom
allocator.

I agree. Because of recursion it wasn't possible when settype() was using
mallocgc(). I would suggest for this to be a separate code review that should be
posted after we are done with CL 10046043. Splitting the process into 3 CLs
should make it easier to review.

dvyukov

On Wed, Jun 5, 2013 at 4:29 PM, <0xE2.0x9A.0x9B@gmail.com> wrote: > On 2013/06/05 11:50:05, dvyukov ...

12 years ago (2013-06-05 13:03:01 UTC) #32

On Wed, Jun 5, 2013 at 4:29 PM,  <0xE2.0x9A.0x9B@gmail.com> wrote:
> On 2013/06/05 11:50:05, dvyukov wrote:
>>
>> On 2013/06/05 10:11:30, atom wrote:
>> > I have a prototype implementation of settype() without mallocgc().
>
> It uses a
>>
>> new
>> > allocator.
>> >
>> > The results:
>> >
>> > A = with MTypes_Bytes, with mallocgc in settype
>> > B = without MTypes_Bytes, without mallocgc in settype
>> >
>> > linux/386
>> >
>> > test/bench/garbage/parser.go:
>> >   A: 65.190user 2.057system 0:54.463elapsed 123%CPU
>
> 390488maxresident
>>
>> >   B: 66.798user 2.015system 0:54.812elapsed 125%CPU
>
> 381224maxresident
>>
>> >
>> > test/bench/garbage/tree2.go:
>> >   A: 12.243user 0.192system 0:11.563elapsed 107%CPU
>
> 148576maxresident
>>
>> >   B: 12.081user 0.219system 0:11.436elapsed 107%CPU
>
> 169152maxresident
>>
>> >
>> > The user times for parser.go are not comparable because in case B
>
> the
>>
>> > scanblock() function is consuming about 1 second more time than in
>
> case A. The
>>
>> > garbage collector also runs more often in case B when running
>
> parser.go, but
>>
>> > this is to be expected because of smaller mheap.
>
>
>
>> parser looks good. I don't care too much about tree2 for such changes.
>> What do you mean by "new allocator"?
>
>
> https://codereview.appspot.com/10046043
>
> It isn't prepared for code review yet, but you can comment on it. I
> published it in advance because this CL (9716045) cannot be LGTMed
> without seeing the allocator source code. Please ignore the style of the
> source code for now.


Kind of lot of code... and another allocator in the runtime...
Can we do what we've discussed in golang-dev about improved GC --
embed this type table directly at the end of the span itself? Then it
won't require all that code.
MSpan will need to embed few slots for large objects (so that 4K
object can fit into 4K span).



>> Also please allocate the type array directly when the span is
>
> allocated for
>>
>> small blocks, it will simplify settype() significantly. And free it
>
> directly
>>
>> when the span is returned to heap, it should be possible with custom
>
> allocator.
>
> I agree. Because of recursion it wasn't possible when settype() was
> using mallocgc(). I would suggest for this to be a separate code review
> that should be posted after we are done with CL 10046043. Splitting the
> process into 3 CLs should make it easier to review.
>
> https://codereview.appspot.com/9716045/

atom

On 2013/06/05 13:03:01, dvyukov wrote: > On Wed, Jun 5, 2013 at 4:29 PM, <mailto:0xE2.0x9A.0x9B@gmail.com> ...

12 years ago (2013-06-05 13:29:51 UTC) #33

dvyukov

On Wed, Jun 5, 2013 at 5:29 PM, <0xE2.0x9A.0x9B@gmail.com> wrote: > On 2013/06/05 13:03:01, dvyukov ...

12 years ago (2013-06-05 13:36:49 UTC) #34

On Wed, Jun 5, 2013 at 5:29 PM,  <0xE2.0x9A.0x9B@gmail.com> wrote:
> On 2013/06/05 13:03:01, dvyukov wrote:
>>
>> On Wed, Jun 5, 2013 at 4:29 PM,  <mailto:0xE2.0x9A.0x9B@gmail.com>
>
> wrote:
>
>> >
>> > https://codereview.appspot.com/10046043
>> >
>> > It isn't prepared for code review yet, but you can comment on it. I
>> > published it in advance because this CL (9716045) cannot be LGTMed
>> > without seeing the allocator source code. Please ignore the style of
>
> the
>>
>> > source code for now.
>
>
>
>> Kind of lot of code... and another allocator in the runtime...
>
>
> I believe an allocator completely separate from mallocgc() is necessary
> and can be used from multiple places in the runtime.

Where else?

>> Can we do what we've discussed in golang-dev about improved GC --
>> embed this type table directly at the end of the span itself? Then it
>> won't require all that code.
>> MSpan will need to embed few slots for large objects (so that 4K
>> object can fit into 4K span).
>
>
> The typical size of a memory block holding typeinfos is about 1024
> bytes. The list of memory allocation requests for the parser benchmark
> looks as follows (size in bytes, 32-bit platform):
>
> ... 2048 512 1024 512 2048 1024 344 512 144 168 256 128 1024 344 512
> 2048 1024 256 512 1024 512 2048 1024 32 512 1024 512 1024 2048 512 1024
> 344 512 2048 1024 16 512 1024 512 344 2048 1024 512 512 1024 2048 512 64
> 1024 344 512 2048 1024 512 256 1024 2048 512 16 512 1024 2048 512 1024
> 512 1024 2048 88 1024 8 8 ...
>
> A span is aligned to a page size, which is 4096 bytes. Embedding the
> type table directly into span itself would mean that the above numbers
> have to be rounded to a multiple of 4096.

What they have to be rounded to 4096?
A single 4096-byte span can hold lots of 8-byte allocations and the
associated type info, no rounding is required.


> This would increase memory
> consumption.
>
> I would like to close this CL (9716045) soon if possible. There is a lot
> of other improvements ahead.

atom

12 years ago (2013-06-05 14:40:35 UTC) #35

On 2013/06/05 13:36:49, dvyukov wrote:
> On Wed, Jun 5, 2013 at 5:29 PM,  <mailto:0xE2.0x9A.0x9B@gmail.com> wrote:
> > The typical size of a memory block holding typeinfos is about 1024
> > bytes. The list of memory allocation requests for the parser benchmark
> > looks as follows (size in bytes, 32-bit platform):
> >
> > ... 2048 512 1024 512 2048 1024 344 512 144 168 256 128 1024 344 512
> > 2048 1024 256 512 1024 512 2048 1024 32 512 1024 512 1024 2048 512 1024
> > 344 512 2048 1024 16 512 1024 512 344 2048 1024 512 512 1024 2048 512 64
> > 1024 344 512 2048 1024 512 256 1024 2048 512 16 512 1024 2048 512 1024
> > 512 1024 2048 88 1024 8 8 ...
> >
> > A span is aligned to a page size, which is 4096 bytes. Embedding the
> > type table directly into span itself would mean that the above numbers
> > have to be rounded to a multiple of 4096.
> 
> What they have to be rounded to 4096?
> A single 4096-byte span can hold lots of 8-byte allocations and the
> associated type info, no rounding is required.

I don't have enough vitality. I am closing this CL.

Expand All Messages | Collapse All Messages