Issue 106260045: code review 106260045: runtime: simpler and faster GC

Issue 106260045: code review 106260045: runtime: simpler and faster GC (Closed)

Can't Edit
Can't Publish+Mail
Start Review

Created:
11 years, 1 month ago by dvyukov

Modified:
11 years ago

Reviewers:
gobot, khr, dave

CC:
golang-codereviews, rsc, tux21b, khr1, khr, rlh

Visibility:
Public.

Description

runtime: simpler and faster GC Implement the design described in: https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_uNQw3DfSY-ztoA-4wWbKcg/pub Summary of the changes: GC uses "2-bits per word" pointer type info embed directly into bitmap. Scanning of stacks/data/heap is unified. The old spans types go away. Compiler generates "sparse" 4-bits type info for GC (directly for GC bitmap). Linker generates "dense" 2-bits type info for data/bss (the same as stacks use). Summary of results: -1680 lines of code total (-1000+ in mgc0.c only) -25% memory consumption -3-7% binary size -15% GC pause reduction -7% run time reduction

Patch Set 1 #

Patch Set 2 : diff -r 90616fa61ef4 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 3 : diff -r 4af9a710ed44 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 4 : diff -r 4af9a710ed44 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 5 : diff -r 4af9a710ed44 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 6 : diff -r fbd798b1842c https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 7 : diff -r fbd798b1842c https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 8 : diff -r 3e4c66709a62 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 9 : diff -r 0f766178488c https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 10 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 11 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 12 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 13 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 14 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 15 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 16 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 17 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 18 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 19 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 20 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 21 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 22 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 23 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 24 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 25 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 26 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 27 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 28 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 29 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 30 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 31 : diff -r 1a9d124153b9 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 32 : diff -r 1a9d124153b9 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 33 : diff -r d339b01e3c31 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 34 : diff -r 690153652172 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 35 : diff -r 690153652172 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 36 : diff -r 690153652172 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 37 : diff -r 690153652172 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 38 : diff -r 9562b65a3c51 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 39 : diff -r 914f3530fe9e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 40 : diff -r 914f3530fe9e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 41 : diff -r 914f3530fe9e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 42 : diff -r 914f3530fe9e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 43 : diff -r 914f3530fe9e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 44 : diff -r 82cbf874e066 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 45 : diff -r 82cbf874e066 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 46 : diff -r da1002b78b19 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 47 : diff -r da1002b78b19 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 48 : diff -r da1002b78b19 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 49 : diff -r da1002b78b19 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 50 : diff -r da1002b78b19 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 51 : diff -r da1002b78b19 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 52 : diff -r da1002b78b19 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 53 : diff -r da1002b78b19 https://dvyukov%40google.com@code.google.com/p/go/ #

Total comments: 24

Patch Set 54 : diff -r 2fa7b95cf34c https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 55 : diff -r 2fa7b95cf34c https://dvyukov%40google.com@code.google.com/p/go/ #

Total comments: 21

Patch Set 56 : diff -r 2fa7b95cf34c https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 57 : diff -r 2fa7b95cf34c https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 58 : diff -r 2fa7b95cf34c https://dvyukov%40google.com@code.google.com/p/go/ #

Total comments: 38

Patch Set 59 : diff -r 14c560b87afb https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 60 : diff -r b6a42b0aeef6 https://dvyukov%40google.com@code.google.com/p/go/ #

Total comments: 2

Patch Set 61 : diff -r d4f4fb0e307c https://dvyukov%40google.com@code.google.com/p/go/ #

Created: 11 years ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Stats (+1625 lines, -2406 lines)			Patch
M	src/cmd/gc/go.h	View	2 chunks	+1 line, -1 line	0 comments	Download
M	src/cmd/gc/plive.c	View	2 chunks	+2 lines, -3 lines	0 comments	Download
M	src/cmd/gc/reflect.c	View	7 chunks	+264 lines, -216 lines	0 comments	Download
M	src/cmd/ld/data.c	View	5 chunks	+162 lines, -34 lines	0 comments	Download
M	src/cmd/ld/decodesym.c	View	2 chunks	+24 lines, -4 lines	0 comments	Download
M	src/cmd/ld/lib.h	View	1 chunk	+4 lines, -1 line	0 comments	Download
M	src/pkg/reflect/type.go	View	10 chunks	+134 lines, -265 lines	0 comments	Download
M	src/pkg/runtime/chan.goc	View	1 chunk	+1 line, -1 line	0 comments	Download
M	src/pkg/runtime/export_test.go	View	1 chunk	+3 lines, -0 lines	0 comments	Download
M	src/pkg/runtime/gc_test.go	View	2 chunks	+24 lines, -0 lines	0 comments	Download
A	src/pkg/runtime/gcinfo_test.go	View	1 chunk	+147 lines, -0 lines	0 comments	Download
M	src/pkg/runtime/heapdump.c	View	9 chunks	+59 lines, -193 lines	0 comments	Download
M	src/pkg/runtime/malloc.h	View	8 chunks	+4 lines, -73 lines	0 comments	Download
M	src/pkg/runtime/malloc.goc	View	11 chunks	+23 lines, -132 lines	0 comments	Download
M	src/pkg/runtime/malloc_test.go	View	1 chunk	+13 lines, -0 lines	0 comments	Download
M	src/pkg/runtime/mgc0.h	View	1 chunk	+55 lines, -63 lines	0 comments	Download
M	src/pkg/runtime/mgc0.c	View	40 chunks	+680 lines, -1388 lines	0 comments	Download
M	src/pkg/runtime/mheap.c	View	3 chunks	+0 lines, -3 lines	0 comments	Download
M	src/pkg/runtime/mprof.goc	View	1 chunk	+4 lines, -22 lines	0 comments	Download
M	src/pkg/runtime/proc.c	View	1 chunk	+1 line, -0 lines	0 comments	Download
M	src/pkg/runtime/race.c	View	2 chunks	+2 lines, -2 lines	0 comments	Download
M	src/pkg/runtime/runtime.h	View	1 chunk	+0 lines, -1 line	0 comments	Download
M	src/pkg/runtime/slice.goc	View	1 chunk	+1 line, -1 line	0 comments	Download
M	src/pkg/runtime/stack.c	View	1 chunk	+1 line, -0 lines	0 comments	Download
M	src/pkg/runtime/type.h	View	2 chunks	+13 lines, -2 lines	0 comments	Download
M	src/pkg/runtime/type.go	View	1 chunk	+1 line, -1 line	0 comments	Download
M	src/pkg/runtime/typekind.h	View	1 chunk	+2 lines, -0 lines	0 comments	Download

Messages

Total messages: 64

Expand All Messages | Collapse All Messages

dvyukov

Hello golang-codereviews@googlegroups.com, I'd like you to review this change to https://dvyukov%40google.com@code.google.com/p/go/

11 years ago (2014-07-07 20:05:17 UTC) #1

dvyukov

Rietveld does not like large change descriptions, so here are benchmark results separately: garbage-1 allocated ...

11 years ago (2014-07-07 20:06:58 UTC) #2

Rietveld does not like large change descriptions, so here are benchmark results
separately:

garbage-1
allocated                 3060956      2965058      -3.13%
allocs                      57729        57624      -0.18%
cputime                  16178480     16739958      +3.47%
gc-pause-one            133386156    113134986     -15.18%
gc-pause-total            2934495      3394049     +15.66%
rss                     300072960    218271744     -27.26%
sys-gc                   17670144     11980800     -32.20%
sys-heap                265945088    191496192     -27.99%
sys-other                10015392      8266216     -17.46%
sys-stack                 9043968      8912896      -1.45%
sys-total               302674592    220656104     -27.10%
time                     16188036     16743835      +3.43%
virtual-mem             491409408    407719936     -17.03%

garbage-2
allocated                 3073933      2964383      -3.56%
allocs                      57739        57623      -0.20%
cputime                  17588468     17919136      +1.88%
gc-pause-one             79779842     75689054      -5.13%
gc-pause-total            1834936      2346360     +27.87%
rss                     310652928    227540992     -26.75%
sys-gc                   18128896     12316672     -32.06%
sys-heap                273285120    196739072     -28.01%
sys-other                10285144      8522624     -17.14%
sys-stack                 9306112      9306112      +0.00%
sys-total               311005272    226884480     -27.05%
time                      8881479      9263811      +4.30%
virtual-mem             574124032    423456768     -26.24%

garbage-4
allocated                 3071034      2964380      -3.47%
allocs                      57735        57623      -0.19%
cputime                  18395208     19084593      +3.75%
gc-pause-one             42328695     39775705      -6.03%
gc-pause-total             931231      1153495     +23.87%
rss                     325066752    241741824     -25.63%
sys-gc                   18980864     13180928     -30.56%
sys-heap                286916608    210370560     -26.68%
sys-other                10613232      8787872     -17.20%
sys-stack                10092544     10223616      +1.30%
sys-total               326603248    242562976     -25.73%
time                      4608702      4830480      +4.81%
virtual-mem             547790848    531423232      -2.99%

garbage-8
allocated                 3048763      2957203      -3.00%
allocs                      57718        57623      -0.16%
cputime                  19151198     19957434      +4.21%
gc-pause-one             24585584     22530624      -8.36%
gc-pause-total             491711       608326     +23.72%
rss                     356458496    262660096     -26.31%
sys-gc                   20750336     14323712     -30.97%
sys-heap                315228160    228196352     -27.61%
sys-other                11164304      9297984     -16.72%
sys-stack                12451840     12320768      -1.05%
sys-total               359594640    264138816     -26.55%
time                      2422566      2531798      +4.51%
virtual-mem             707973120    645025792      -8.89%

Since heap size is reduced GCs happen more frequently,
this increases total GC time. It's possible to trade memory
back for performance. Below is comparison of old vs new+GOGC=165
(which brings memory consumption roughly to the old level).

garbage-1
allocated                 3052052      2965030      -2.85%
allocs                      57720        57624      -0.17%
cputime                  16621510     15395976      -7.37%
gc-pause-one            136762062    112110308     -18.03%
gc-pause-total            3008765      2017985     -32.93%
rss                     299483136    285331456      -4.73%
sys-gc                   17670144     15785984     -10.66%
sys-heap                265945088    252313600      -5.13%
sys-other                10015504      9198984      -8.15%
sys-stack                 8912896      9043968      +1.47%
sys-total               302543632    286342536      -5.35%
time                     16621385     15404268      -7.32%
virtual-mem             557273088    474525696     -14.85%

garbage-2
allocated                 3073702      2964237      -3.56%
allocs                      57738        57623      -0.20%
cputime                  17307008     15874992      -8.27%
gc-pause-one             86934156     85146756      -2.06%
gc-pause-total            1999485      1532641     -23.35%
rss                     309473280    295632896      -4.47%
sys-gc                   18063360     16244736     -10.07%
sys-heap                272236544    259653632      -4.62%
sys-other                10280824      9464032      -7.94%
sys-stack                 9437184      9175040      -2.78%
sys-total               310017912    294537440      -4.99%
time                      8936413      8389483      -6.12%
virtual-mem             574255104    491114496     -14.48%

garbage-4
allocated                 3065954      2956425      -3.57%
allocs                      57731        57621      -0.19%
cputime                  18431932     17124132      -7.10%
gc-pause-one             44980995     39581480     -12.00%
gc-pause-total             989581       692675     -30.00%
rss                     321982464    314998784      -2.17%
sys-gc                   18784256     17375232      -7.50%
sys-heap                283770880    277479424      -2.22%
sys-other                10346984      9989656      -3.45%
sys-stack                10223616      9961472      -2.56%
sys-total               323125736    314805784      -2.57%
time                      4685893      4336762      -7.45%
virtual-mem             737247232    595279872     -19.26%

garbage-8
allocated                 3041382      2956830      -2.78%
allocs                      57712        57622      -0.16%
cputime                  19078905     18206227      -4.57%
gc-pause-one             25034791     22108232     -11.69%
gc-pause-total             500695       364785     -27.14%
rss                     357806080    345243648      -3.51%
sys-gc                   20750336     19107840      -7.92%
sys-heap                315228160    304742400      -3.33%
sys-other                11163856     10510576      -5.85%
sys-stack                12451840     12189696      -2.11%
sys-total               359594192    346550512      -3.63%
time                      2419061      2291795      -5.26%
virtual-mem             808402944    727433216     -10.02%

rsc

What are the benchmarks? It looks like they are the test/bench/garbage benchmarks? Those are not ...

11 years ago (2014-07-10 18:42:35 UTC) #3

dvyukov

On 2014/07/10 18:42:35, rsc wrote: > What are the benchmarks? It looks like they are ...

11 years ago (2014-07-11 09:49:49 UTC) #4

On 2014/07/10 18:42:35, rsc wrote:
> What are the benchmarks? It looks like they are the test/bench/garbage
> benchmarks? Those are not terribly interesting overall. What is the effect
> on real code like the test/bench/go1 benchmarks?

The benchmark is garbage benchmark from go.benchmarks repo (which is roughly the
same as test/bench/garbage).

go1 benchmarks:

benchmark                          old ns/op      new ns/op      delta       
BenchmarkBinaryTree17              3882774233     3544278312     -8.72%      
BenchmarkFannkuch11                2760029132     2618294904     -5.14%      
BenchmarkFmtFprintfEmpty           71.6           72.7           +1.54%      
BenchmarkFmtFprintfString          231            219            -5.19%      
BenchmarkFmtFprintfInt             186            164            -11.83%     
BenchmarkFmtFprintfIntInt          256            257            +0.39%      
BenchmarkFmtFprintfPrefixedInt     257            242            -5.84%      
BenchmarkFmtFprintfFloat           460            354            -23.04%     
BenchmarkFmtManyArgs               1364           1072           -21.41%     
BenchmarkGobDecode                 16333122       13887405       -14.97%     
BenchmarkGobEncode                 10785547       9556318        -11.40%     
BenchmarkGzip                      438134110      406536964      -7.21%      
BenchmarkGunzip                    113861506      104544186      -8.18%      
BenchmarkHTTPClientServer          92163          89593          -2.79%      
BenchmarkJSONEncode                21279690       20524117       -3.55%      
BenchmarkJSONDecode                81470709       71832238       -11.83%     
BenchmarkMandelbrot200             5294830        4101358        -22.54%     
BenchmarkGoParse                   5982219        5655285        -5.47%      
BenchmarkRegexpMatchEasy0_32       155            152            -1.94%      
BenchmarkRegexpMatchEasy0_1K       303            294            -2.97%      
BenchmarkRegexpMatchEasy1_32       144            149            +3.47%      
BenchmarkRegexpMatchEasy1_1K       798            826            +3.51%      
BenchmarkRegexpMatchMedium_32      232            247            +6.47%      
BenchmarkRegexpMatchMedium_1K      63225          75730          +19.78%     
BenchmarkRegexpMatchHard_32        3701           4377           +18.27%     
BenchmarkRegexpMatchHard_1K        143579         136470         -4.95%      
BenchmarkRevcomp                   1059133929     697414142      -34.15%     
BenchmarkTemplate                  132634561      115347401      -13.03%     
BenchmarkTimeParse                 389            385            -1.03%      
BenchmarkTimeFormat                397            394            -0.76%      

benchmark                         old MB/s     new MB/s     speedup     
BenchmarkGobDecode                46.99        55.27        1.18x       
BenchmarkGobEncode                71.16        80.32        1.13x       
BenchmarkGzip                     44.29        47.73        1.08x       
BenchmarkGunzip                   170.42       185.61       1.09x       
BenchmarkJSONEncode               91.19        94.55        1.04x       
BenchmarkJSONDecode               23.82        27.01        1.13x       
BenchmarkGoParse                  9.68         10.24        1.06x       
BenchmarkRegexpMatchEasy0_32      205.66       209.52       1.02x       
BenchmarkRegexpMatchEasy0_1K      2485.81      2260.24      0.91x       
BenchmarkRegexpMatchEasy1_32      222.08       214.21       0.96x       
BenchmarkRegexpMatchEasy1_1K      930.48       798.08       0.86x       
BenchmarkRegexpMatchMedium_32     4.31         4.05         0.94x       
BenchmarkRegexpMatchMedium_1K     15.74        11.53        0.73x       
BenchmarkRegexpMatchHard_32       8.65         7.31         0.85x       
BenchmarkRegexpMatchHard_1K       7.13         7.50         1.05x       
BenchmarkRevcomp                  239.98       364.44       1.52x       
BenchmarkTemplate                 14.63        16.82        1.15x    

BenchmarkRegexpMatchHard_32 with +18.27% degradation looks like phantom code
movement issue, profile do not include anything GC/malloc-related, but the
numbers for unrelated functions are slightly different:
before:
 43.73%  go1.test  go1.test           [.] regexp.(*machine).add
 17.52%  go1.test  go1.test           [.] regexp.(*machine).step
  7.98%  go1.test  go1.test           [.]
_/ssd/src/go1/test/bench/go1.fastaRandom
  7.49%  go1.test  go1.test           [.] regexp.(*machine).match
  3.93%  go1.test  go1.test           [.] regexp/syntax.(*Inst).MatchRunePos
  3.57%  go1.test  go1.test           [.] regexp/syntax.EmptyOpContext
  2.63%  go1.test  go1.test           [.] regexp.(*inputBytes).step
  1.94%  go1.test  go1.test           [.] regexp.(*machine).alloc
  1.32%  go1.test  go1.test           [.] compress/flate.(*compressor).findMatch
  1.19%  go1.test  go1.test           [.] regexp/syntax.(*Inst).MatchRune
  1.18%  go1.test  go1.test           [.] compress/flate.(*compressor).deflate
  0.48%  go1.test  go1.test           [.] hash/crc32.update
  0.40%  go1.test  go1.test           [.] sync/atomic.AddUint32
  0.31%  go1.test  go1.test           [.] sync/atomic.CompareAndSwapUint32
after:
 46.94%  go1.test  go1.test           [.] regexp.(*machine).add
 16.17%  go1.test  go1.test           [.] regexp.(*machine).step
  7.16%  go1.test  go1.test           [.] regexp.(*machine).match
  7.05%  go1.test  go1.test           [.]
_/ssd/src/go2/test/bench/go1.fastaRandom
  6.04%  go1.test  go1.test           [.] regexp/syntax.EmptyOpContext
  3.38%  go1.test  go1.test           [.] regexp/syntax.(*Inst).MatchRunePos
  2.14%  go1.test  go1.test           [.] regexp.(*machine).alloc
  2.02%  go1.test  go1.test           [.] regexp.(*inputBytes).step
  1.27%  go1.test  go1.test           [.] compress/flate.(*compressor).deflate
  1.06%  go1.test  go1.test           [.] compress/flate.(*compressor).findMatch
  0.87%  go1.test  go1.test           [.] regexp/syntax.(*Inst).MatchRune
  0.42%  go1.test  go1.test           [.] hash/crc32.update
  0.32%  go1.test  go1.test           [.] sync/atomic.AddUint32
  0.29%  go1.test  go1.test           [.] sync/atomic.CompareAndSwapUint32


go1.test binary is 6.32% smaller.

Still it's somewhat apples to oranges, because RSS for e.g. BenchmarkGoParse
stabilizes at 700m before and 584m after (-16.6%). And improved memory
consumption negatively affects total GC time (because GCs are more frequent with
default GOGC setting).

And I want to emphasize that the main point of this CL is not direct performance
benefits, but rather long term benefits (e.g. it opens road for 1-bit-per-word
type info that is required for concurrent GC; and for other optimizations).

tux21b

On 2014/07/11 09:49:49, dvyukov wrote: > go1 benchmarks: Have you executed those benchmarks with the ...

11 years ago (2014-07-11 20:23:53 UTC) #5

dvyukov

On 2014/07/11 20:23:53, tux21b wrote: > On 2014/07/11 09:49:49, dvyukov wrote: > > go1 benchmarks: ...

11 years ago (2014-07-14 11:00:42 UTC) #6

rsc

Thanks for working on this. The doc linked in the CL description is not really ...

11 years ago (2014-07-15 20:24:52 UTC) #7

dvyukov

On 2014/07/15 20:24:52, rsc wrote: > Thanks for working on this. > > The doc ...

11 years ago (2014-07-16 12:07:42 UTC) #8

dvyukov

On 2014/07/16 12:07:42, dvyukov wrote: > On 2014/07/15 20:24:52, rsc wrote: > > Thanks for ...

11 years ago (2014-07-16 12:58:20 UTC) #9

khr1

As far as the heap dumper goes, we might be able to reconstruct types. The ...

11 years ago (2014-07-16 16:20:11 UTC) #10

As far as the heap dumper goes, we might be able to reconstruct types.  The
dwarf info provides types for all the stack frames and globals, and we can
propagate types into the heap like GC does (or soon-to-be used to).  The
only trouble is unsafe.Pointer, so there may be some objects that end up
with no type, but probably not many.  It's a fair amount of work in the
dump reader, but probably worth it.  As a bonus propagating types will
correctly type no-pointer objects that currently don't have a type in the
heap dump.

On Wed, Jul 16, 2014 at 5:58 AM, <dvyukov@google.com> wrote:

> On 2014/07/16 12:07:42, dvyukov wrote:
>
>  On 2014/07/15 20:24:52, rsc wrote:
>> > Thanks for working on this.
>> >
>> > The doc linked in the CL description is not really a design doc for
>>
> this
>
>> > change. It is a sketch of a bunch of possible changes, only some of
>>
> which
>
>> > are in this CL, and it is very light on actual details. I would like
>>
> to see
>
>> > a separate doc that is more detailed and describes only the changes
>>
> being
>
>> > made in this CL. It should also discuss the implications of the
>>
> change. One
>
>> > implication is that the heap dumper will be dramatically less useful
>> > because it will not have type information to dump. Another concern
>>
> is the
>
>> > size of the new tables in the generated object file. I don't see any
>>
> kind
>
>> > of compression that takes care of large arrays or structs containing
>>
> large
>
>> > arrays. More importantly, I don't see any discussion of that in the
>>
> design
>
>> > doc. We need to weigh those sorts of considerations too.
>> >
>> > Thanks.
>> > Russ
>>
>
>
>
>  Here we go:
>>
>
> https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_
> uNQw3DfSY-ztoA-4wWbKcg/pub
>
> Also replaced the link in the description.
>
> https://codereview.appspot.com/106260045/
>

dvyukov

Nice another option Btw, Keith, please take a look at the heapdump changes. It compiles, ...

11 years ago (2014-07-16 16:27:12 UTC) #11

Nice
another option

Btw, Keith, please take a look at the heapdump changes. It compiles,
writes something and does not crash. That's all I can say. I suspect
that it requires more changes (maybe in a separate cl). For example,
as far as I understand it dumps type descriptors to later associate
them with objects, but we don't have Type* for objects anymore; so
analyzer probably won't be able to do anything useful with type
descriptors...


On Wed, Jul 16, 2014 at 8:20 PM, Keith Randall <khr@google.com> wrote:
> As far as the heap dumper goes, we might be able to reconstruct types.  The
> dwarf info provides types for all the stack frames and globals, and we can
> propagate types into the heap like GC does (or soon-to-be used to).  The
> only trouble is unsafe.Pointer, so there may be some objects that end up
> with no type, but probably not many.  It's a fair amount of work in the dump
> reader, but probably worth it.  As a bonus propagating types will correctly
> type no-pointer objects that currently don't have a type in the heap dump.
>
>
>
> On Wed, Jul 16, 2014 at 5:58 AM, <dvyukov@google.com> wrote:
>>
>> On 2014/07/16 12:07:42, dvyukov wrote:
>>
>>> On 2014/07/15 20:24:52, rsc wrote:
>>> > Thanks for working on this.
>>> >
>>> > The doc linked in the CL description is not really a design doc for
>>
>> this
>>>
>>> > change. It is a sketch of a bunch of possible changes, only some of
>>
>> which
>>>
>>> > are in this CL, and it is very light on actual details. I would like
>>
>> to see
>>>
>>> > a separate doc that is more detailed and describes only the changes
>>
>> being
>>>
>>> > made in this CL. It should also discuss the implications of the
>>
>> change. One
>>>
>>> > implication is that the heap dumper will be dramatically less useful
>>> > because it will not have type information to dump. Another concern
>>
>> is the
>>>
>>> > size of the new tables in the generated object file. I don't see any
>>
>> kind
>>>
>>> > of compression that takes care of large arrays or structs containing
>>
>> large
>>>
>>> > arrays. More importantly, I don't see any discussion of that in the
>>
>> design
>>>
>>> > doc. We need to weigh those sorts of considerations too.
>>> >
>>> > Thanks.
>>> > Russ
>>
>>
>>
>>
>>> Here we go:
>>
>>
>>
>>
https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_uNQw3DfSY-ztoA-4wWbK...
>>
>> Also replaced the link in the description.
>>
>> https://codereview.appspot.com/106260045/
>
>

khr1

The heapdump changes look ok for now. It might need a fixup after I implement ...

11 years ago (2014-07-16 17:06:51 UTC) #12

The heapdump changes look ok for now.  It might need a fixup after I
implement the type propagation.

We'll still need the type entries, just to map Type* (read from Eface.type
words) to dwarf types.  V1 would use the type name as the key, although I'd
like to implement a more robust map at some point.


On Wed, Jul 16, 2014 at 9:26 AM, Dmitry Vyukov <dvyukov@google.com> wrote:

> Nice
> another option
>
> Btw, Keith, please take a look at the heapdump changes. It compiles,
> writes something and does not crash. That's all I can say. I suspect
> that it requires more changes (maybe in a separate cl). For example,
> as far as I understand it dumps type descriptors to later associate
> them with objects, but we don't have Type* for objects anymore; so
> analyzer probably won't be able to do anything useful with type
> descriptors...
>
>
> On Wed, Jul 16, 2014 at 8:20 PM, Keith Randall <khr@google.com> wrote:
> > As far as the heap dumper goes, we might be able to reconstruct types.
>  The
> > dwarf info provides types for all the stack frames and globals, and we
> can
> > propagate types into the heap like GC does (or soon-to-be used to).  The
> > only trouble is unsafe.Pointer, so there may be some objects that end up
> > with no type, but probably not many.  It's a fair amount of work in the
> dump
> > reader, but probably worth it.  As a bonus propagating types will
> correctly
> > type no-pointer objects that currently don't have a type in the heap
> dump.
> >
> >
> >
> > On Wed, Jul 16, 2014 at 5:58 AM, <dvyukov@google.com> wrote:
> >>
> >> On 2014/07/16 12:07:42, dvyukov wrote:
> >>
> >>> On 2014/07/15 20:24:52, rsc wrote:
> >>> > Thanks for working on this.
> >>> >
> >>> > The doc linked in the CL description is not really a design doc for
> >>
> >> this
> >>>
> >>> > change. It is a sketch of a bunch of possible changes, only some of
> >>
> >> which
> >>>
> >>> > are in this CL, and it is very light on actual details. I would like
> >>
> >> to see
> >>>
> >>> > a separate doc that is more detailed and describes only the changes
> >>
> >> being
> >>>
> >>> > made in this CL. It should also discuss the implications of the
> >>
> >> change. One
> >>>
> >>> > implication is that the heap dumper will be dramatically less useful
> >>> > because it will not have type information to dump. Another concern
> >>
> >> is the
> >>>
> >>> > size of the new tables in the generated object file. I don't see any
> >>
> >> kind
> >>>
> >>> > of compression that takes care of large arrays or structs containing
> >>
> >> large
> >>>
> >>> > arrays. More importantly, I don't see any discussion of that in the
> >>
> >> design
> >>>
> >>> > doc. We need to weigh those sorts of considerations too.
> >>> >
> >>> > Thanks.
> >>> > Russ
> >>
> >>
> >>
> >>
> >>> Here we go:
> >>
> >>
> >>
> >>
>
https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_uNQw3DfSY-ztoA-4wWbK...
> >>
> >> Also replaced the link in the description.
> >>
> >> https://codereview.appspot.com/106260045/
> >
> >
>

rsc

In the longer term I want to pass less information to new, so that in ...

11 years ago (2014-07-16 17:20:47 UTC) #13

khr1

It's not the Type*s to new that matter, it is the Type*s in Eface.type and ...

11 years ago (2014-07-16 17:52:43 UTC) #14

rsc

On Wed, Jul 16, 2014 at 8:07 AM, <dvyukov@google.com> wrote: > Here we go: > ...

11 years ago (2014-07-18 16:35:41 UTC) #15

dvyukov

On 2014/07/18 16:35:41, rsc wrote: > On Wed, Jul 16, 2014 at 8:07 AM, <mailto:dvyukov@google.com> ...

11 years ago (2014-07-19 18:19:55 UTC) #16

khr1

Yes, I'll handle the heap dump. On Sat, Jul 19, 2014 at 11:19 AM, <dvyukov@google.com> ...

11 years ago (2014-07-19 18:29:15 UTC) #17

dvyukov

great! thanks! On Sat, Jul 19, 2014 at 10:29 PM, Keith Randall <khr@google.com> wrote: > ...

11 years ago (2014-07-19 18:40:47 UTC) #18

dvyukov

On 2014/07/18 16:35:41, rsc wrote: > On Wed, Jul 16, 2014 at 8:07 AM, <mailto:dvyukov@google.com> ...

11 years ago (2014-07-21 12:10:06 UTC) #19

dvyukov

On 2014/07/21 12:10:06, dvyukov wrote: > On 2014/07/18 16:35:41, rsc wrote: > > On Wed, ...

11 years ago (2014-07-21 12:16:15 UTC) #20

rsc

I am still concerned about the runtime footprint. If I allocate a big fixed-size array ...

11 years ago (2014-07-22 13:31:35 UTC) #21

dvyukov

There is a copy of the mask in GC bitmap as well. Even when the ...

11 years ago (2014-07-22 13:57:41 UTC) #22

rsc

On Tue, Jul 22, 2014 at 9:57 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > There is ...

11 years ago (2014-07-22 14:10:10 UTC) #23

On Tue, Jul 22, 2014 at 9:57 AM, Dmitry Vyukov <dvyukov@google.com> wrote:

> There is a copy of the mask in GC bitmap as well. Even when the object
> is freed, the bitmap region is still in memory.
>

I don't understand. When the object memory is reused for some other type,
that GC bitmap region is reused. The unrolled per-type copy in the BSS is
not.

> I don't wan't to unwind it every time. On 32-bits it's used for
> objects >64 bytes (256 for 64-bits). So these allocations can be quite
> frequent. Now there is a very clear separation between in-binary
> format and in-memory format. Which is, I think, very good because each
> format concentrates on own thing and e.g. we don't need to optimize
> programs for runtime performance (as I have them now, they are quite
> suboptimal from this point of view).
>

The program looks absolutely trivial to execute. It should cost no more
than a memmove would. In fact it may be faster, because it will use less
cache.

> I think I can do a slightly different thing which will address your
> concern:
> 1. When a huge span is passed to SysUnused, also call SysUnused for
> the region of GC bitmap.
> 2. When unrolling a huge program (>>PageSize), remember the Type in a
> global list. During GC walk that list, mark the program as not yet
> unwound and mark the mask as SysUnused. This works even if the mask is
> in BSS.
> Note that I need at least a word in BSS either way, because I need a
> fast mapping from Type to the mask during malloc.
>

You don't need any words in BSS if you replay the program every time. I
still believe that can be made just as fast as memmove. You're already
zeroing the memory at some point, so filling in the bitmap can't be
prohibitive.

I don't believe that caching+memmove will be appreciably faster than
executing the program, which you've made nice and simple and should be very
quick to execute. Do you have numbers?

Russ

dvyukov

On Tue, Jul 22, 2014 at 6:10 PM, Russ Cox <rsc@golang.org> wrote: > On Tue, ...

11 years ago (2014-07-22 14:38:24 UTC) #24

On Tue, Jul 22, 2014 at 6:10 PM, Russ Cox <rsc@golang.org> wrote:
> On Tue, Jul 22, 2014 at 9:57 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>
>> There is a copy of the mask in GC bitmap as well. Even when the object
>> is freed, the bitmap region is still in memory.
>
>
> I don't understand. When the object memory is reused for some other type,
> that GC bitmap region is reused. The unrolled per-type copy in the BSS is
> not.

I am think about a situation when you allocate 10GB during startup,
and then use 100MB during normal work. GC bitmap will still consume
another 512MB.


>> I don't wan't to unwind it every time. On 32-bits it's used for
>> objects >64 bytes (256 for 64-bits). So these allocations can be quite
>> frequent. Now there is a very clear separation between in-binary
>> format and in-memory format. Which is, I think, very good because each
>> format concentrates on own thing and e.g. we don't need to optimize
>> programs for runtime performance (as I have them now, they are quite
>> suboptimal from this point of view).
>
>
> The program looks absolutely trivial to execute. It should cost no more than
> a memmove would. In fact it may be faster, because it will use less cache.
>
>>
>> I think I can do a slightly different thing which will address your
>> concern:
>> 1. When a huge span is passed to SysUnused, also call SysUnused for
>> the region of GC bitmap.
>> 2. When unrolling a huge program (>>PageSize), remember the Type in a
>> global list. During GC walk that list, mark the program as not yet
>> unwound and mark the mask as SysUnused. This works even if the mask is
>> in BSS.
>> Note that I need at least a word in BSS either way, because I need a
>> fast mapping from Type to the mask during malloc.
>
>
> You don't need any words in BSS if you replay the program every time.

You are right, I just always assumed that it must be stored in unwound
form in runtime.


> I still believe that can be made just as fast as memmove. You're already
> zeroing the memory at some point, so filling in the bitmap can't be
> prohibitive.
>
> I don't believe that caching+memmove will be appreciably faster than
> executing the program, which you've made nice and simple and should be very
> quick to execute. Do you have numbers?


I will measure the difference.

A good enough solution can be to execute programs directly into GC
bitmap when the unwound program would consume > PageSize. However, it
still requires special handling of arrays in compiler, linker and
runtime (so that unwound program for [1<<20][]byte describes only 1
element -- runtime knows how to multiply type info for slices anyway).

rsc

On Tue, Jul 22, 2014 at 10:37 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Tue, ...

11 years ago (2014-07-22 14:45:53 UTC) #25

dvyukov

On 2014/07/22 14:38:24, dvyukov wrote: > On Tue, Jul 22, 2014 at 6:10 PM, Russ ...

11 years ago (2014-07-22 16:56:36 UTC) #26

On 2014/07/22 14:38:24, dvyukov wrote:
> On Tue, Jul 22, 2014 at 6:10 PM, Russ Cox <mailto:rsc@golang.org> wrote:
> > On Tue, Jul 22, 2014 at 9:57 AM, Dmitry Vyukov <mailto:dvyukov@google.com>
wrote:
> >>
> >> There is a copy of the mask in GC bitmap as well. Even when the object
> >> is freed, the bitmap region is still in memory.
> >
> >
> > I don't understand. When the object memory is reused for some other type,
> > that GC bitmap region is reused. The unrolled per-type copy in the BSS is
> > not.
> 
> I am think about a situation when you allocate 10GB during startup,
> and then use 100MB during normal work. GC bitmap will still consume
> another 512MB.
> 
> 
> >> I don't wan't to unwind it every time. On 32-bits it's used for
> >> objects >64 bytes (256 for 64-bits). So these allocations can be quite
> >> frequent. Now there is a very clear separation between in-binary
> >> format and in-memory format. Which is, I think, very good because each
> >> format concentrates on own thing and e.g. we don't need to optimize
> >> programs for runtime performance (as I have them now, they are quite
> >> suboptimal from this point of view).
> >
> >
> > The program looks absolutely trivial to execute. It should cost no more than
> > a memmove would. In fact it may be faster, because it will use less cache.
> >
> >>
> >> I think I can do a slightly different thing which will address your
> >> concern:
> >> 1. When a huge span is passed to SysUnused, also call SysUnused for
> >> the region of GC bitmap.
> >> 2. When unrolling a huge program (>>PageSize), remember the Type in a
> >> global list. During GC walk that list, mark the program as not yet
> >> unwound and mark the mask as SysUnused. This works even if the mask is
> >> in BSS.
> >> Note that I need at least a word in BSS either way, because I need a
> >> fast mapping from Type to the mask during malloc.
> >
> >
> > You don't need any words in BSS if you replay the program every time.
> 
> You are right, I just always assumed that it must be stored in unwound
> form in runtime.
> 
> 
> > I still believe that can be made just as fast as memmove. You're already
> > zeroing the memory at some point, so filling in the bitmap can't be
> > prohibitive.
> >
> > I don't believe that caching+memmove will be appreciably faster than
> > executing the program, which you've made nice and simple and should be very
> > quick to execute. Do you have numbers?
> 
> 
> I will measure the difference.
> 
> A good enough solution can be to execute programs directly into GC
> bitmap when the unwound program would consume > PageSize. However, it
> still requires special handling of arrays in compiler, linker and
> runtime (so that unwound program for [1<<20][]byte describes only 1
> element -- runtime knows how to multiply type info for slices anyway).



On the following benchmark:

type LargeStruct struct {
        x [40]*byte
}
func BenchmarkMallocLargeStruct(b *testing.B) {
        var x uintptr
        for i := 0; i < b.N; i++ {
                p := new(LargeStruct)
                x ^= uintptr(unsafe.Pointer(p))
        }
        mallocSink = x
}

Direct copy is 172 ns/op, program execution is 581 ns/op (+338%).

Profiles are:

 39.00%  runtime.test  runtime.test       [.] runtime.markallocated
 12.00%  runtime.test  runtime.test       [.] runtime.mallocgc
  9.53%  runtime.test  runtime.test       [.] runtime.MSpan_Sweep
  7.57%  runtime.test  runtime.test       [.] runtime.memclr
  5.52%  runtime.test  runtime.test       [.] runtime.MSpanList_IsEmpty
  3.13%  runtime.test  runtime.test       [.] scanblock
  3.11%  runtime.test  runtime.test       [.] MHeap_AllocSpanLocked

vs:

 79.84%  runtime.test  runtime.test       [.] unrollgcprog1
  4.14%  runtime.test  runtime.test       [.] runtime.mallocgc
  2.60%  runtime.test  runtime.test       [.] runtime.MSpan_Sweep
  2.29%  runtime.test  runtime.test       [.] runtime.memclr
  1.87%  runtime.test  runtime.test       [.] runtime.MSpanList_IsEmpty
  1.75%  runtime.test  runtime.test       [.] runtime.markallocated

The code is uploaded (however somewhat dirty for now).

dvyukov

On 2014/07/22 16:56:36, dvyukov wrote: > On 2014/07/22 14:38:24, dvyukov wrote: > > On Tue, ...

11 years ago (2014-07-22 17:06:48 UTC) #27

On 2014/07/22 16:56:36, dvyukov wrote:
> On 2014/07/22 14:38:24, dvyukov wrote:
> > On Tue, Jul 22, 2014 at 6:10 PM, Russ Cox <mailto:rsc@golang.org> wrote:
> > > On Tue, Jul 22, 2014 at 9:57 AM, Dmitry Vyukov <mailto:dvyukov@google.com>
> wrote:
> > >>
> > >> There is a copy of the mask in GC bitmap as well. Even when the object
> > >> is freed, the bitmap region is still in memory.
> > >
> > >
> > > I don't understand. When the object memory is reused for some other type,
> > > that GC bitmap region is reused. The unrolled per-type copy in the BSS is
> > > not.
> > 
> > I am think about a situation when you allocate 10GB during startup,
> > and then use 100MB during normal work. GC bitmap will still consume
> > another 512MB.
> > 
> > 
> > >> I don't wan't to unwind it every time. On 32-bits it's used for
> > >> objects >64 bytes (256 for 64-bits). So these allocations can be quite
> > >> frequent. Now there is a very clear separation between in-binary
> > >> format and in-memory format. Which is, I think, very good because each
> > >> format concentrates on own thing and e.g. we don't need to optimize
> > >> programs for runtime performance (as I have them now, they are quite
> > >> suboptimal from this point of view).
> > >
> > >
> > > The program looks absolutely trivial to execute. It should cost no more
than
> > > a memmove would. In fact it may be faster, because it will use less cache.
> > >
> > >>
> > >> I think I can do a slightly different thing which will address your
> > >> concern:
> > >> 1. When a huge span is passed to SysUnused, also call SysUnused for
> > >> the region of GC bitmap.
> > >> 2. When unrolling a huge program (>>PageSize), remember the Type in a
> > >> global list. During GC walk that list, mark the program as not yet
> > >> unwound and mark the mask as SysUnused. This works even if the mask is
> > >> in BSS.
> > >> Note that I need at least a word in BSS either way, because I need a
> > >> fast mapping from Type to the mask during malloc.
> > >
> > >
> > > You don't need any words in BSS if you replay the program every time.
> > 
> > You are right, I just always assumed that it must be stored in unwound
> > form in runtime.
> > 
> > 
> > > I still believe that can be made just as fast as memmove. You're already
> > > zeroing the memory at some point, so filling in the bitmap can't be
> > > prohibitive.
> > >
> > > I don't believe that caching+memmove will be appreciably faster than
> > > executing the program, which you've made nice and simple and should be
very
> > > quick to execute. Do you have numbers?
> > 
> > 
> > I will measure the difference.
> > 
> > A good enough solution can be to execute programs directly into GC
> > bitmap when the unwound program would consume > PageSize. However, it
> > still requires special handling of arrays in compiler, linker and
> > runtime (so that unwound program for [1<<20][]byte describes only 1
> > element -- runtime knows how to multiply type info for slices anyway).
> 
> 
> 
> On the following benchmark:
> 
> type LargeStruct struct {
>         x [40]*byte
> }
> func BenchmarkMallocLargeStruct(b *testing.B) {
>         var x uintptr
>         for i := 0; i < b.N; i++ {
>                 p := new(LargeStruct)
>                 x ^= uintptr(unsafe.Pointer(p))
>         }
>         mallocSink = x
> }
> 
> Direct copy is 172 ns/op, program execution is 581 ns/op (+338%).
> 
> Profiles are:
> 
>  39.00%  runtime.test  runtime.test       [.] runtime.markallocated
>  12.00%  runtime.test  runtime.test       [.] runtime.mallocgc
>   9.53%  runtime.test  runtime.test       [.] runtime.MSpan_Sweep
>   7.57%  runtime.test  runtime.test       [.] runtime.memclr
>   5.52%  runtime.test  runtime.test       [.] runtime.MSpanList_IsEmpty
>   3.13%  runtime.test  runtime.test       [.] scanblock
>   3.11%  runtime.test  runtime.test       [.] MHeap_AllocSpanLocked
> 
> vs:
> 
>  79.84%  runtime.test  runtime.test       [.] unrollgcprog1
>   4.14%  runtime.test  runtime.test       [.] runtime.mallocgc
>   2.60%  runtime.test  runtime.test       [.] runtime.MSpan_Sweep
>   2.29%  runtime.test  runtime.test       [.] runtime.memclr
>   1.87%  runtime.test  runtime.test       [.] runtime.MSpanList_IsEmpty
>   1.75%  runtime.test  runtime.test       [.] runtime.markallocated
> 
> The code is uploaded (however somewhat dirty for now).




Are you sure it is important to address right now?
For this to happen, user must (1) have a ridiculously large type (not just a
large slice of small objects), (2) allocate it (which is not the case for e.g.
reflect Itab fake), (3) discard the object and never allocate it again.
There is significant number of issues that affect real programs, e.g. dead G's
are never discarded (we actually have user reports about that), or defers are
slow, or closures cause unnecessary memory allocations.

rsc

I would rather have this code and not need it than need it and not ...

11 years ago (2014-07-22 17:49:31 UTC) #28

dvyukov

On 2014/07/22 17:49:31, rsc wrote: > I would rather have this code and not need ...

11 years ago (2014-07-22 18:29:23 UTC) #29

rsc

Okay, well then it needs to be made faster. We can't leave this code out. ...

11 years ago (2014-07-22 18:32:01 UTC) #30

dvyukov

On Tue, Jul 22, 2014 at 10:32 PM, Russ Cox <rsc@golang.org> wrote: > Okay, well ...

11 years ago (2014-07-22 18:35:59 UTC) #31

dvyukov

PTAL I've added GC programs (still stored in bss for every allocated type).

11 years ago (2014-07-22 19:52:52 UTC) #32

rsc

On Tue, Jul 22, 2014 at 2:35 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > Big slices ...

11 years ago (2014-07-22 20:18:43 UTC) #33

dvyukov

On 2014/07/22 20:18:43, rsc wrote: > On Tue, Jul 22, 2014 at 2:35 PM, Dmitry ...

11 years ago (2014-07-23 08:59:58 UTC) #34

rsc

On Wed, Jul 23, 2014 at 4:59 AM, <dvyukov@google.com> wrote: > Done > PTAL > ...

11 years ago (2014-07-23 12:02:39 UTC) #35

rsc

Can you please update the doc? It still talks about caching in the BSS.

11 years ago (2014-07-23 12:03:17 UTC) #36

dvyukov

On 2014/07/23 12:03:17, rsc wrote: > Can you please update the doc? It still talks ...

11 years ago (2014-07-23 12:17:01 UTC) #37

rsc

Did not read runtime yet. The compiler changes look mostly reasonable. https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c File src/cmd/gc/reflect.c (right): ...

11 years ago (2014-07-23 13:34:03 UTC) #38

dvyukov

PTAL https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c File src/cmd/gc/reflect.c (right): https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#newcode807 src/cmd/gc/reflect.c:807: ot = duintptr(s, ot, ((uvlong*)gcmask)[0]); On 2014/07/23 13:34:02, ...

11 years ago (2014-07-23 15:35:59 UTC) #39

PTAL

https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c
File src/cmd/gc/reflect.c (right):

https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new...
src/cmd/gc/reflect.c:807: ot = duintptr(s, ot, ((uvlong*)gcmask)[0]);
On 2014/07/23 13:34:02, rsc wrote:
> gcmask is [16]uint8 and this code is interpreting it as [2]uintptr.
Effectively
> the code is assuming that the byte order of the compiler matches the byte
order
> of the target system. That is not a valid assumption. Please correct this.
> If you are writing bytes, use duint8.
> If you are writing uintptrs, make gengcmask fill in integers.
> 

Done.

https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new...
src/cmd/gc/reflect.c:1275: size *= 2;	// repeated twice
On 2014/07/23 13:34:02, rsc wrote:
> // repeated
> (repeated twice would be 3 times)

Done.

https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new...
src/cmd/gc/reflect.c:1304: // Unford the mask for the GC bitmap format:
On 2014/07/23 13:34:02, rsc wrote:
> Unfold

Done.

https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new...
src/cmd/gc/reflect.c:1305: // 4 bits per word, 2 high bits encode pointer info.
On 2014/07/23 13:34:02, rsc wrote:
> For the record, once stack maps are down to 1 bit per word instead of 2, I am
> going to object to the waste of pre-expanding this 4x. I am willing to stomach
> 2x for now.

ack
fwiw now it's 24x

https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new...
src/cmd/gc/reflect.c:1310: // If number of words is odd, repeat the mask twice.
On 2014/07/23 13:34:02, rsc wrote:
> s/ twice//

Done.

https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new...
src/cmd/gc/reflect.c:1313: for(i=0; i<nptr; i+=1) {
On 2014/07/23 13:34:02, rsc wrote:
> i++

Done.

https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new...
src/cmd/gc/reflect.c:1314: bits = ((uint8*)vec->b)[i*BitsPerPointer/8];
On 2014/07/23 13:34:02, rsc wrote:
> Use the accessor functions, not direct bitmap manipulation. They will
translate
> better.
> bvset/bvreset/bvget.
> 
> bits = bvget(vec, i*2)<<1 | bvget(vec, i*2+1)

Done.

https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new...
src/cmd/gc/reflect.c:1443: if(size <= MaxGCMask) {
On 2014/07/23 13:34:02, rsc wrote:
> Please make this 
> 
> if(0) { // if(size <= MaxGCMask) to enable BSS caching
> 
> I do not want to use ANY space in the bss until we understand how fast we can
> make the non-cached version. You have MaxGCMask set to 8 kB which is still
much
> more profligate than I would like.
> 
> Once this change is in with no BSS caching, we can do a follow-up round trying
> to improve performance and understanding what caching is worthwhile. But the
> initial checkin should focus on not using unnecessary memory.
> 
> Memory is not always as cheap as you think.

Russ, you are going to extremes.

During http serving godoc consumes 25MB of memory. Total size of unwound masks
is 1365 bytes. One thousand three hundred sixty five B-Y-T-E-S. There are just
39 types with unwound masks.
Even if you have 10 weird types with mask size 4K, that's just 40K of memory.
And if you actually allocate them, then the objects occupy some megabytes of
memory.

I can show you 10 places where we waste 10 times more memory. The masks are
negligible and unimportant.

https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new...
src/cmd/gc/reflect.c:1520: } else if(t1->width<widthptr || !haspointers(t1)) {
On 2014/07/23 13:34:02, rsc wrote:
> Simplify:
> 
> } else if(!haspointers(t1)) {
>     n = t->width;
>     n -= -*xoffset&(widthptr-1); // skip to next ptr boundary
>     proggenarray(g, (n+widthptr-1)/widthptr);
>     proggendata(g, BitsScalar);
>     proggenarrayend(g);
>     *xoffset += t->width;
> }
> 
> Note the variable is named 'n'. o is for offsets, and this isn't one.

Done.

https://codereview.appspot.com/106260045/diff/880001/src/pkg/runtime/mgc0.h
File src/pkg/runtime/mgc0.h (right):

https://codereview.appspot.com/106260045/diff/880001/src/pkg/runtime/mgc0.h#n...
src/pkg/runtime/mgc0.h:22: WordsPerByte	= 8/BitsPerPointer,
On 2014/07/23 13:34:02, rsc wrote:
> Please call this PointersPerByte. Words is too generic. The name is confusing
no
> matter what, but PointersPerByte sounds more like a reminder of
BitsPerPointer.

Done.

rsc

https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c File src/cmd/gc/reflect.c (right): https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#newcode1443 src/cmd/gc/reflect.c:1443: if(size <= MaxGCMask) { No. Waste is waste. And ...

11 years ago (2014-07-23 15:55:13 UTC) #40

rsc

https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c File src/cmd/gc/reflect.c (right): https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#newcode1305 src/cmd/gc/reflect.c:1305: // 4 bits per word, 2 high bits encode ...

11 years ago (2014-07-23 15:55:59 UTC) #41

rsc

I got about halfway through mgc0. I hope Keith will look at the runtime changes ...

11 years ago (2014-07-23 16:23:20 UTC) #42

rsc

https://codereview.appspot.com/106260045/diff/920001/src/cmd/gc/reflect.c File src/cmd/gc/reflect.c (right): https://codereview.appspot.com/106260045/diff/920001/src/cmd/gc/reflect.c#newcode1440 src/cmd/gc/reflect.c:1440: On 2014/07/23 16:23:19, rsc wrote: > please add > ...

11 years ago (2014-07-23 16:24:17 UTC) #43

dvyukov

PTAL https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c File src/cmd/gc/reflect.c (right): https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#newcode1305 src/cmd/gc/reflect.c:1305: // 4 bits per word, 2 high bits ...

11 years ago (2014-07-23 19:01:45 UTC) #44

dvyukov

On 2014/07/23 16:23:20, rsc wrote: > I got about halfway through mgc0. I hope Keith ...

11 years ago (2014-07-23 19:02:30 UTC) #45

khr

I did a first pass. I'm really worried that this is too much to go ...

11 years ago (2014-07-23 22:42:04 UTC) #46

dvyukov

On 2014/07/23 22:42:04, khr wrote: > I did a first pass. > > I'm really ...

11 years ago (2014-07-24 07:57:27 UTC) #47

dvyukov

PTAL https://codereview.appspot.com/106260045/diff/980001/src/cmd/gc/reflect.c File src/cmd/gc/reflect.c (right): https://codereview.appspot.com/106260045/diff/980001/src/cmd/gc/reflect.c#newcode1279 src/cmd/gc/reflect.c:1279: size = size*PointersPerByte/8; // 4 bits per word ...

11 years ago (2014-07-24 09:35:17 UTC) #48

rsc

On 2014/07/24 07:57:27, dvyukov wrote: > On 2014/07/23 22:42:04, khr wrote: > > I did ...

11 years ago (2014-07-24 13:49:40 UTC) #49

rsc

I am going to be away from later today until August 4. The compiler changes ...

11 years ago (2014-07-24 13:57:12 UTC) #50

dvyukov

ack, thanks On Thu, Jul 24, 2014 at 5:57 PM, Russ Cox <rsc@golang.org> wrote: > ...

11 years ago (2014-07-24 14:02:06 UTC) #51

dvyukov

On 2014/07/24 07:57:27, dvyukov wrote: > On 2014/07/23 22:42:04, khr wrote: > > I did ...

11 years ago (2014-07-25 15:38:52 UTC) #52

dvyukov

On 2014/07/25 15:38:52, dvyukov wrote: > On 2014/07/24 07:57:27, dvyukov wrote: > > On 2014/07/23 ...

11 years ago (2014-07-28 13:51:37 UTC) #53

khr1

Later today. Is there a separate CL for the channel change? On Mon, Jul 28, ...

11 years ago (2014-07-28 15:16:35 UTC) #54

dvyukov

https://codereview.appspot.com/115280043 is the chan change On Mon, Jul 28, 2014 at 7:16 PM, Keith Randall ...

11 years ago (2014-07-28 15:33:02 UTC) #55

khr1

On Thu, Jul 24, 2014 at 2:35 AM, <dvyukov@google.com> wrote: > PTAL > > > ...

11 years ago (2014-07-28 23:28:41 UTC) #56

On Thu, Jul 24, 2014 at 2:35 AM, <dvyukov@google.com> wrote:

> PTAL
>
>
>
> https://codereview.appspot.com/106260045/diff/980001/src/cmd/gc/reflect.c
> File src/cmd/gc/reflect.c (right):
>
> https://codereview.appspot.com/106260045/diff/980001/src/
> cmd/gc/reflect.c#newcode1279
> src/cmd/gc/reflect.c:1279: size = size*PointersPerByte/8;       // 4 bits
> per
> word
> On 2014/07/23 22:42:03, khr wrote:
>
>> The units of this conversion seem strange to me.  I want size *
>>
> BitsPerPointer /
>
>> 8, as size is already # of pointers.
>>
>
>  Or size / PointersPerByte, maybe.
>>
>
> You are right. It actually must be gcBits (4 -- 4 bits per word in GC
> bitmap) instead of PointersPerByte (which happened to be 4 as well).
>
>
> https://codereview.appspot.com/106260045/diff/980001/src/
> cmd/gc/reflect.c#newcode1281
> src/cmd/gc/reflect.c:1281: // We could use a more elaborate condition,
> but this seems to work good in practice.
> On 2014/07/23 22:42:03, khr wrote:
>
>> s/good/well/
>>
>
> Done.
>
> https://codereview.appspot.com/106260045/diff/980001/src/
> cmd/gc/reflect.c#newcode1426
> src/cmd/gc/reflect.c:1426:
>
> On 2014/07/23 22:42:03, khr wrote:
>
>> Is the format of these GC programs described anywhere in a comment?  I
>>
> didn't
>
>> see it.
>>
>
> Added description to mgc0.h, where instruction constants are defined
>
>
> https://codereview.appspot.com/106260045/diff/980001/src/
> cmd/gc/reflect.c#newcode1440
> src/cmd/gc/reflect.c:1440: size++;      // unroll flag in the beginning
> On 2014/07/23 22:42:03, khr wrote:
>
>> Where is the unroll flag set?
>>
>
>  Update: I found it, but this really needs a comment.
>>
>
> add "used by runtime (see runtime.markallocated)"
>
>
> https://codereview.appspot.com/106260045/diff/980001/src/
> pkg/reflect/type.go
> File src/pkg/reflect/type.go (left):
>
> https://codereview.appspot.com/106260045/diff/980001/src/
> pkg/reflect/type.go#oldcode1099
> src/pkg/reflect/type.go:1099: if t.kind&kindNoPointers != 0 {
> On 2014/07/23 22:42:03, khr wrote:
>
>> Why is there no GC program?  Or is the non-existence of a program
>>
> imply
>
>> conservative scanning?
>>
>
>
> With the type-punned GC info (it does not encode type of the pointee),
> prototype GC info works just fine. So we don't need to generate a new
> type info. E.g. if you a Hmap type info, it now works for all maps
> regardless of types. The same for a single pointer or chan.
> We only need to generate type info when we create completely new object
> layout (e.g. map bucket).
>
>
Oh, I see, the gc info is copied from the prototype map/channel/whatever
and never overwritten.


>
> https://codereview.appspot.com/106260045/diff/980001/src/
> pkg/reflect/type.go#oldcode1485
> src/pkg/reflect/type.go:1485: ch.gc = unsafe.Pointer(&chanGC{
> On 2014/07/23 22:42:03, khr wrote:
>
>> Channel GC program?
>>
>
> the same here
>
>
> https://codereview.appspot.com/106260045/diff/980001/src/
> pkg/reflect/type.go#oldcode1540
> src/pkg/reflect/type.go:1540: mt.hmap = hMapOf(mt.bucket)
> On 2014/07/23 22:42:03, khr wrote:
>
>> where did this go?  Is there just one of these for all maps?
>>
>
> the same here
>
>
> https://codereview.appspot.com/106260045/diff/980001/src/
> pkg/reflect/type.go#oldcode1544
> src/pkg/reflect/type.go:1544: mt.gc = unsafe.Pointer(&ptrGC{
> On 2014/07/23 22:42:03, khr wrote:
>
>> gc for this?
>>
>
> the same here
>
>
> https://codereview.appspot.com/106260045/diff/980001/src/
> pkg/runtime/mgc0.c
> File src/pkg/runtime/mgc0.c (right):
>
> https://codereview.appspot.com/106260045/diff/980001/src/
> pkg/runtime/mgc0.c#newcode306
> src/pkg/runtime/mgc0.c:306: // Check is we have reached end of heap.
> On 2014/07/23 22:42:03, khr wrote:
>
>> s/heap/span/
>>
>
> Done.
>
>
> https://codereview.appspot.com/106260045/diff/980001/src/
> pkg/runtime/mgc0.c#newcode324
> src/pkg/runtime/mgc0.c:324: bits = (bits>>2)&BitsMask;
> On 2014/07/23 22:42:04, khr wrote:
>
>> BitsMask vs. bitMask?  Very confusing.
>>
>
> agree :)
> here we extract type info, so it's BitsMask
>
>
> https://codereview.appspot.com/106260045/diff/980001/src/
> pkg/runtime/mgc0.c#newcode953
> src/pkg/runtime/mgc0.c:953: // Garbage!
> On 2014/07/23 22:42:03, khr wrote:
>
>> What?
>>
>
> Done.
>
>
> https://codereview.appspot.com/106260045/diff/980001/src/
> pkg/runtime/mgc0.c#newcode1362
> src/pkg/runtime/mgc0.c:1362: flushallmcaches();
> On 2014/07/23 22:42:03, khr wrote:
>
>> If we have to do this, why only in Debug mode?
>>
>
> mmm... the comment above talks about it
>
>
I see, the check in scanblock that would fail is also guarded by Debug.
I think we should do this flush unconditionally.  Otherwise it means we're
scanning freed memory.  Maybe the check doesn't fail without Debug on, but
scanning freed memory can lead to leaks.


>
> https://codereview.appspot.com/106260045/diff/980001/src/
> pkg/runtime/mgc0.c#newcode1760
> src/pkg/runtime/mgc0.c:1760: prog1 = unrollgcprog1(mask, prog, &pos,
> inplace, sparse);
> On 2014/07/23 22:42:03, khr wrote:
>
>> This is potentially too recursive to run on the M stack.
>>
>
> With Russ' fix for program generation (already applied), each insArray
> encodes array with at least 2 elements and each element is at least 1
> word. So max recursion is 34.
>
>
Ok.


>
> https://codereview.appspot.com/106260045/diff/980001/src/
> pkg/runtime/mgc0.c#newcode1815
> src/pkg/runtime/mgc0.c:1815: // Mark word after last as bitAllocated.
> On 2014/07/23 22:42:03, khr wrote:
>
>> s/bitAllocated/bitDead/?
>>
>
>  Actually use bitDead below.
>>
>
> Done.
>
>
> https://codereview.appspot.com/106260045/diff/980001/src/
> pkg/runtime/mgc0.c#newcode1875
> src/pkg/runtime/mgc0.c:1875: return;
> On 2014/07/23 22:42:04, khr wrote:
>
>> You don't add bitsDead here.  I presume you're assuming bitsDead is 0.
>>
>  You can
>
>> just add it explicitly, e.g. ((bitAllocated+(bitsDead<<2)) << shift).
>>
>
> Done.
>
>
> https://codereview.appspot.com/106260045/diff/980001/src/
> pkg/runtime/mgc0.h
> File src/pkg/runtime/mgc0.h (right):
>
> https://codereview.appspot.com/106260045/diff/980001/src/
> pkg/runtime/mgc0.h#newcode14
> src/pkg/runtime/mgc0.h:14: insData = 1,
> On 2014/07/23 22:42:04, khr wrote:
>
>> GC program description goes here?
>>
>
> Done.
>
>
> https://codereview.appspot.com/106260045/diff/980001/src/
> pkg/runtime/mgc0.h#newcode20
> src/pkg/runtime/mgc0.h:20: BitsPerPointer       = 2,
> On 2014/07/23 22:42:04, khr wrote:
>
>> We need some names for these different types of bitmaps (2 bit/ptr, 4
>>
> bit/ptr,
>
>> maybe soon 1 bit/ptr).  BitsPerPointer and friends are no longer clear
>>
> about
>
>> which bitmap type they are talking about.
>>
>
> Yes.
> We need some consistent scheme and naming.
> Some of them are capitalized and some are not, because that's what we
> have now (BitsPerPointer vs. bitMask).
>
> Do you think we need to do it in this CL?
> I am just thinking that if we tackle interfaces, then we can go to 1 bit
> per word. And that will remove whole bunch of constants. So it will be
> simpler to come up with consistent scheme.
>
>
Yes, this can wait.


>
> https://codereview.appspot.com/106260045/diff/980001/src/
> pkg/runtime/mgc0.h#newcode35
> src/pkg/runtime/mgc0.h:35: MaxGCMask    = 0,    // disabled because wastes
> several bytes of memory
> On 2014/07/23 22:42:04, khr wrote:
>
>> Why are some of these constants capitalized and some aren't?  Same for
>>
> the bit*
>
>> constants below.
>>
>
> see above
>
>
> https://codereview.appspot.com/106260045/diff/980001/src/
> pkg/runtime/type.h
> File src/pkg/runtime/type.h (right):
>
> https://codereview.appspot.com/106260045/diff/980001/src/
> pkg/runtime/type.h#newcode30
> src/pkg/runtime/type.h:30: uintptr gc[2];
> On 2014/07/23 22:42:04, khr wrote:
>
>> This really needs a long comment describing what gc[0] and gc[1] are.
>>
>
> Done.
>
> https://codereview.appspot.com/106260045/
>

khr

LGTM with those fixes. https://codereview.appspot.com/106260045/diff/1020001/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (left): https://codereview.appspot.com/106260045/diff/1020001/src/pkg/runtime/mgc0.c#oldcode1273 src/pkg/runtime/mgc0.c:1273: enqueue1(&wbuf, (Obj){(void*)&spf->fint, PtrSize, 0}); What ...

11 years ago (2014-07-28 23:28:56 UTC) #57

dvyukov

On Tue, Jul 29, 2014 at 3:28 AM, Keith Randall <khr@google.com> wrote: https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.c#newcode1362 >> src/pkg/runtime/mgc0.c:1362: ...

11 years ago (2014-07-29 05:34:53 UTC) #58

dvyukov

https://codereview.appspot.com/106260045/diff/1020001/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (left): https://codereview.appspot.com/106260045/diff/1020001/src/pkg/runtime/mgc0.c#oldcode1273 src/pkg/runtime/mgc0.c:1273: enqueue1(&wbuf, (Obj){(void*)&spf->fint, PtrSize, 0}); On 2014/07/28 23:28:55, khr wrote: ...

11 years ago (2014-07-29 05:38:55 UTC) #59

dvyukov

*** Submitted as https://code.google.com/p/go/source/detail?r=e1fc05ce4181 *** runtime: simpler and faster GC Implement the design described in: ...

11 years ago (2014-07-29 07:01:14 UTC) #60

dvyukov

On 2014/07/29 07:01:14, dvyukov wrote: > *** Submitted as https://code.google.com/p/go/source/detail?r=e1fc05ce4181 *** > > runtime: simpler ...

11 years ago (2014-07-29 07:01:55 UTC) #61

dvyukov

On 2014/07/29 07:01:14, dvyukov wrote: > *** Submitted as https://code.google.com/p/go/source/detail?r=e1fc05ce4181 *** > > runtime: simpler ...

11 years ago (2014-07-29 07:02:00 UTC) #62

gobot

This CL appears to have broken the android-arm-crawshaw builder. See http://build.golang.org/log/840ff0ebd0a41f5acfc8505cf8032495b3dcdc8b

11 years ago (2014-07-29 07:02:36 UTC) #63

dave_cheney.net

11 years ago (2014-07-29 07:11:12 UTC) #64

not looking good so far ...


On Tue, Jul 29, 2014 at 5:01 PM, dvyukov via golang-codereviews <
golang-codereviews@googlegroups.com> wrote:

> On 2014/07/29 07:01:14, dvyukov wrote:
>
>> *** Submitted as
>>
> https://code.google.com/p/go/source/detail?r=e1fc05ce4181 ***
>
>  runtime: simpler and faster GC
>>
>
>  Implement the design described in:
>>
>
> https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_
> uNQw3DfSY-ztoA-4wWbKcg/pub
>
>  Summary of the changes:
>> GC uses "2-bits per word" pointer type info embed directly into
>>
> bitmap.
>
>> Scanning of stacks/data/heap is unified.
>> The old spans types go away.
>> Compiler generates "sparse" 4-bits type info for GC (directly for GC
>>
> bitmap).
>
>> Linker generates "dense" 2-bits type info for data/bss (the same as
>>
> stacks use).
>
>  Summary of results:
>> -1680 lines of code total (-1000+ in mgc0.c only)
>> -25% memory consumption
>> -3-7% binary size
>> -15% GC pause reduction
>> -7% run time reduction
>>
>
>  LGTM=khr
>> R=golang-codereviews, rsc, christoph, khr
>> CC=golang-codereviews, rlh
>> https://codereview.appspot.com/106260045
>>
>
> fingers crossed
>
>
> https://codereview.appspot.com/106260045/
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-codereviews" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-codereviews+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

Expand All Messages | Collapse All Messages