|
|
Descriptionruntime: simpler and faster GC
Implement the design described in:
https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_uNQw3DfSY-ztoA-4wWbKcg/pub
Summary of the changes:
GC uses "2-bits per word" pointer type info embed directly into bitmap.
Scanning of stacks/data/heap is unified.
The old spans types go away.
Compiler generates "sparse" 4-bits type info for GC (directly for GC bitmap).
Linker generates "dense" 2-bits type info for data/bss (the same as stacks use).
Summary of results:
-1680 lines of code total (-1000+ in mgc0.c only)
-25% memory consumption
-3-7% binary size
-15% GC pause reduction
-7% run time reduction
Patch Set 1 #Patch Set 2 : diff -r 90616fa61ef4 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 3 : diff -r 4af9a710ed44 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 4 : diff -r 4af9a710ed44 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 5 : diff -r 4af9a710ed44 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 6 : diff -r fbd798b1842c https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 7 : diff -r fbd798b1842c https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 8 : diff -r 3e4c66709a62 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 9 : diff -r 0f766178488c https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 10 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 11 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 12 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 13 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 14 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 15 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 16 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 17 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 18 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 19 : diff -r b8fa05f72494 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 20 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 21 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 22 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 23 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 24 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 25 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 26 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 27 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 28 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 29 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 30 : diff -r aa480930f37e https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 31 : diff -r 1a9d124153b9 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 32 : diff -r 1a9d124153b9 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 33 : diff -r d339b01e3c31 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 34 : diff -r 690153652172 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 35 : diff -r 690153652172 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 36 : diff -r 690153652172 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 37 : diff -r 690153652172 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 38 : diff -r 9562b65a3c51 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 39 : diff -r 914f3530fe9e https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 40 : diff -r 914f3530fe9e https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 41 : diff -r 914f3530fe9e https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 42 : diff -r 914f3530fe9e https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 43 : diff -r 914f3530fe9e https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 44 : diff -r 82cbf874e066 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 45 : diff -r 82cbf874e066 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 46 : diff -r da1002b78b19 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 47 : diff -r da1002b78b19 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 48 : diff -r da1002b78b19 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 49 : diff -r da1002b78b19 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 50 : diff -r da1002b78b19 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 51 : diff -r da1002b78b19 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 52 : diff -r da1002b78b19 https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 53 : diff -r da1002b78b19 https://dvyukov%40google.com@code.google.com/p/go/ #
Total comments: 24
Patch Set 54 : diff -r 2fa7b95cf34c https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 55 : diff -r 2fa7b95cf34c https://dvyukov%40google.com@code.google.com/p/go/ #
Total comments: 21
Patch Set 56 : diff -r 2fa7b95cf34c https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 57 : diff -r 2fa7b95cf34c https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 58 : diff -r 2fa7b95cf34c https://dvyukov%40google.com@code.google.com/p/go/ #
Total comments: 38
Patch Set 59 : diff -r 14c560b87afb https://dvyukov%40google.com@code.google.com/p/go/ #Patch Set 60 : diff -r b6a42b0aeef6 https://dvyukov%40google.com@code.google.com/p/go/ #
Total comments: 2
Patch Set 61 : diff -r d4f4fb0e307c https://dvyukov%40google.com@code.google.com/p/go/ #
MessagesTotal messages: 64
Hello golang-codereviews@googlegroups.com, I'd like you to review this change to https://dvyukov%40google.com@code.google.com/p/go/
Sign in to reply to this message.
Rietveld does not like large change descriptions, so here are benchmark results separately: garbage-1 allocated 3060956 2965058 -3.13% allocs 57729 57624 -0.18% cputime 16178480 16739958 +3.47% gc-pause-one 133386156 113134986 -15.18% gc-pause-total 2934495 3394049 +15.66% rss 300072960 218271744 -27.26% sys-gc 17670144 11980800 -32.20% sys-heap 265945088 191496192 -27.99% sys-other 10015392 8266216 -17.46% sys-stack 9043968 8912896 -1.45% sys-total 302674592 220656104 -27.10% time 16188036 16743835 +3.43% virtual-mem 491409408 407719936 -17.03% garbage-2 allocated 3073933 2964383 -3.56% allocs 57739 57623 -0.20% cputime 17588468 17919136 +1.88% gc-pause-one 79779842 75689054 -5.13% gc-pause-total 1834936 2346360 +27.87% rss 310652928 227540992 -26.75% sys-gc 18128896 12316672 -32.06% sys-heap 273285120 196739072 -28.01% sys-other 10285144 8522624 -17.14% sys-stack 9306112 9306112 +0.00% sys-total 311005272 226884480 -27.05% time 8881479 9263811 +4.30% virtual-mem 574124032 423456768 -26.24% garbage-4 allocated 3071034 2964380 -3.47% allocs 57735 57623 -0.19% cputime 18395208 19084593 +3.75% gc-pause-one 42328695 39775705 -6.03% gc-pause-total 931231 1153495 +23.87% rss 325066752 241741824 -25.63% sys-gc 18980864 13180928 -30.56% sys-heap 286916608 210370560 -26.68% sys-other 10613232 8787872 -17.20% sys-stack 10092544 10223616 +1.30% sys-total 326603248 242562976 -25.73% time 4608702 4830480 +4.81% virtual-mem 547790848 531423232 -2.99% garbage-8 allocated 3048763 2957203 -3.00% allocs 57718 57623 -0.16% cputime 19151198 19957434 +4.21% gc-pause-one 24585584 22530624 -8.36% gc-pause-total 491711 608326 +23.72% rss 356458496 262660096 -26.31% sys-gc 20750336 14323712 -30.97% sys-heap 315228160 228196352 -27.61% sys-other 11164304 9297984 -16.72% sys-stack 12451840 12320768 -1.05% sys-total 359594640 264138816 -26.55% time 2422566 2531798 +4.51% virtual-mem 707973120 645025792 -8.89% Since heap size is reduced GCs happen more frequently, this increases total GC time. It's possible to trade memory back for performance. Below is comparison of old vs new+GOGC=165 (which brings memory consumption roughly to the old level). garbage-1 allocated 3052052 2965030 -2.85% allocs 57720 57624 -0.17% cputime 16621510 15395976 -7.37% gc-pause-one 136762062 112110308 -18.03% gc-pause-total 3008765 2017985 -32.93% rss 299483136 285331456 -4.73% sys-gc 17670144 15785984 -10.66% sys-heap 265945088 252313600 -5.13% sys-other 10015504 9198984 -8.15% sys-stack 8912896 9043968 +1.47% sys-total 302543632 286342536 -5.35% time 16621385 15404268 -7.32% virtual-mem 557273088 474525696 -14.85% garbage-2 allocated 3073702 2964237 -3.56% allocs 57738 57623 -0.20% cputime 17307008 15874992 -8.27% gc-pause-one 86934156 85146756 -2.06% gc-pause-total 1999485 1532641 -23.35% rss 309473280 295632896 -4.47% sys-gc 18063360 16244736 -10.07% sys-heap 272236544 259653632 -4.62% sys-other 10280824 9464032 -7.94% sys-stack 9437184 9175040 -2.78% sys-total 310017912 294537440 -4.99% time 8936413 8389483 -6.12% virtual-mem 574255104 491114496 -14.48% garbage-4 allocated 3065954 2956425 -3.57% allocs 57731 57621 -0.19% cputime 18431932 17124132 -7.10% gc-pause-one 44980995 39581480 -12.00% gc-pause-total 989581 692675 -30.00% rss 321982464 314998784 -2.17% sys-gc 18784256 17375232 -7.50% sys-heap 283770880 277479424 -2.22% sys-other 10346984 9989656 -3.45% sys-stack 10223616 9961472 -2.56% sys-total 323125736 314805784 -2.57% time 4685893 4336762 -7.45% virtual-mem 737247232 595279872 -19.26% garbage-8 allocated 3041382 2956830 -2.78% allocs 57712 57622 -0.16% cputime 19078905 18206227 -4.57% gc-pause-one 25034791 22108232 -11.69% gc-pause-total 500695 364785 -27.14% rss 357806080 345243648 -3.51% sys-gc 20750336 19107840 -7.92% sys-heap 315228160 304742400 -3.33% sys-other 11163856 10510576 -5.85% sys-stack 12451840 12189696 -2.11% sys-total 359594192 346550512 -3.63% time 2419061 2291795 -5.26% virtual-mem 808402944 727433216 -10.02%
Sign in to reply to this message.
What are the benchmarks? It looks like they are the test/bench/garbage benchmarks? Those are not terribly interesting overall. What is the effect on real code like the test/bench/go1 benchmarks?
Sign in to reply to this message.
On 2014/07/10 18:42:35, rsc wrote: > What are the benchmarks? It looks like they are the test/bench/garbage > benchmarks? Those are not terribly interesting overall. What is the effect > on real code like the test/bench/go1 benchmarks? The benchmark is garbage benchmark from go.benchmarks repo (which is roughly the same as test/bench/garbage). go1 benchmarks: benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 3882774233 3544278312 -8.72% BenchmarkFannkuch11 2760029132 2618294904 -5.14% BenchmarkFmtFprintfEmpty 71.6 72.7 +1.54% BenchmarkFmtFprintfString 231 219 -5.19% BenchmarkFmtFprintfInt 186 164 -11.83% BenchmarkFmtFprintfIntInt 256 257 +0.39% BenchmarkFmtFprintfPrefixedInt 257 242 -5.84% BenchmarkFmtFprintfFloat 460 354 -23.04% BenchmarkFmtManyArgs 1364 1072 -21.41% BenchmarkGobDecode 16333122 13887405 -14.97% BenchmarkGobEncode 10785547 9556318 -11.40% BenchmarkGzip 438134110 406536964 -7.21% BenchmarkGunzip 113861506 104544186 -8.18% BenchmarkHTTPClientServer 92163 89593 -2.79% BenchmarkJSONEncode 21279690 20524117 -3.55% BenchmarkJSONDecode 81470709 71832238 -11.83% BenchmarkMandelbrot200 5294830 4101358 -22.54% BenchmarkGoParse 5982219 5655285 -5.47% BenchmarkRegexpMatchEasy0_32 155 152 -1.94% BenchmarkRegexpMatchEasy0_1K 303 294 -2.97% BenchmarkRegexpMatchEasy1_32 144 149 +3.47% BenchmarkRegexpMatchEasy1_1K 798 826 +3.51% BenchmarkRegexpMatchMedium_32 232 247 +6.47% BenchmarkRegexpMatchMedium_1K 63225 75730 +19.78% BenchmarkRegexpMatchHard_32 3701 4377 +18.27% BenchmarkRegexpMatchHard_1K 143579 136470 -4.95% BenchmarkRevcomp 1059133929 697414142 -34.15% BenchmarkTemplate 132634561 115347401 -13.03% BenchmarkTimeParse 389 385 -1.03% BenchmarkTimeFormat 397 394 -0.76% benchmark old MB/s new MB/s speedup BenchmarkGobDecode 46.99 55.27 1.18x BenchmarkGobEncode 71.16 80.32 1.13x BenchmarkGzip 44.29 47.73 1.08x BenchmarkGunzip 170.42 185.61 1.09x BenchmarkJSONEncode 91.19 94.55 1.04x BenchmarkJSONDecode 23.82 27.01 1.13x BenchmarkGoParse 9.68 10.24 1.06x BenchmarkRegexpMatchEasy0_32 205.66 209.52 1.02x BenchmarkRegexpMatchEasy0_1K 2485.81 2260.24 0.91x BenchmarkRegexpMatchEasy1_32 222.08 214.21 0.96x BenchmarkRegexpMatchEasy1_1K 930.48 798.08 0.86x BenchmarkRegexpMatchMedium_32 4.31 4.05 0.94x BenchmarkRegexpMatchMedium_1K 15.74 11.53 0.73x BenchmarkRegexpMatchHard_32 8.65 7.31 0.85x BenchmarkRegexpMatchHard_1K 7.13 7.50 1.05x BenchmarkRevcomp 239.98 364.44 1.52x BenchmarkTemplate 14.63 16.82 1.15x BenchmarkRegexpMatchHard_32 with +18.27% degradation looks like phantom code movement issue, profile do not include anything GC/malloc-related, but the numbers for unrelated functions are slightly different: before: 43.73% go1.test go1.test [.] regexp.(*machine).add 17.52% go1.test go1.test [.] regexp.(*machine).step 7.98% go1.test go1.test [.] _/ssd/src/go1/test/bench/go1.fastaRandom 7.49% go1.test go1.test [.] regexp.(*machine).match 3.93% go1.test go1.test [.] regexp/syntax.(*Inst).MatchRunePos 3.57% go1.test go1.test [.] regexp/syntax.EmptyOpContext 2.63% go1.test go1.test [.] regexp.(*inputBytes).step 1.94% go1.test go1.test [.] regexp.(*machine).alloc 1.32% go1.test go1.test [.] compress/flate.(*compressor).findMatch 1.19% go1.test go1.test [.] regexp/syntax.(*Inst).MatchRune 1.18% go1.test go1.test [.] compress/flate.(*compressor).deflate 0.48% go1.test go1.test [.] hash/crc32.update 0.40% go1.test go1.test [.] sync/atomic.AddUint32 0.31% go1.test go1.test [.] sync/atomic.CompareAndSwapUint32 after: 46.94% go1.test go1.test [.] regexp.(*machine).add 16.17% go1.test go1.test [.] regexp.(*machine).step 7.16% go1.test go1.test [.] regexp.(*machine).match 7.05% go1.test go1.test [.] _/ssd/src/go2/test/bench/go1.fastaRandom 6.04% go1.test go1.test [.] regexp/syntax.EmptyOpContext 3.38% go1.test go1.test [.] regexp/syntax.(*Inst).MatchRunePos 2.14% go1.test go1.test [.] regexp.(*machine).alloc 2.02% go1.test go1.test [.] regexp.(*inputBytes).step 1.27% go1.test go1.test [.] compress/flate.(*compressor).deflate 1.06% go1.test go1.test [.] compress/flate.(*compressor).findMatch 0.87% go1.test go1.test [.] regexp/syntax.(*Inst).MatchRune 0.42% go1.test go1.test [.] hash/crc32.update 0.32% go1.test go1.test [.] sync/atomic.AddUint32 0.29% go1.test go1.test [.] sync/atomic.CompareAndSwapUint32 go1.test binary is 6.32% smaller. Still it's somewhat apples to oranges, because RSS for e.g. BenchmarkGoParse stabilizes at 700m before and 584m after (-16.6%). And improved memory consumption negatively affects total GC time (because GCs are more frequent with default GOGC setting). And I want to emphasize that the main point of this CL is not direct performance benefits, but rather long term benefits (e.g. it opens road for 1-bit-per-word type info that is required for concurrent GC; and for other optimizations).
Sign in to reply to this message.
On 2014/07/11 09:49:49, dvyukov wrote: > go1 benchmarks: Have you executed those benchmarks with the old GOGC=100 value or with GOGC=165? The last one looks like a fairer comparison given the decreased heap size. I know, those numbers mean nothing and the simpler logic is far more important. Anyway, amazing work!
Sign in to reply to this message.
On 2014/07/11 20:23:53, tux21b wrote: > On 2014/07/11 09:49:49, dvyukov wrote: > > go1 benchmarks: > > Have you executed those benchmarks with the old GOGC=100 value or > with GOGC=165? The last one looks like a fairer comparison given > the decreased heap size. old is always running with GOGC=100 the first set of numbers is versus GOGC=100 new, the second -- GOGC=165 new > I know, those numbers mean nothing and the simpler logic is far more > important. Anyway, amazing work! thanks!
Sign in to reply to this message.
Thanks for working on this. The doc linked in the CL description is not really a design doc for this change. It is a sketch of a bunch of possible changes, only some of which are in this CL, and it is very light on actual details. I would like to see a separate doc that is more detailed and describes only the changes being made in this CL. It should also discuss the implications of the change. One implication is that the heap dumper will be dramatically less useful because it will not have type information to dump. Another concern is the size of the new tables in the generated object file. I don't see any kind of compression that takes care of large arrays or structs containing large arrays. More importantly, I don't see any discussion of that in the design doc. We need to weigh those sorts of considerations too. Thanks. Russ
Sign in to reply to this message.
On 2014/07/15 20:24:52, rsc wrote: > Thanks for working on this. > > The doc linked in the CL description is not really a design doc for this > change. It is a sketch of a bunch of possible changes, only some of which > are in this CL, and it is very light on actual details. I would like to see > a separate doc that is more detailed and describes only the changes being > made in this CL. It should also discuss the implications of the change. One > implication is that the heap dumper will be dramatically less useful > because it will not have type information to dump. Another concern is the > size of the new tables in the generated object file. I don't see any kind > of compression that takes care of large arrays or structs containing large > arrays. More importantly, I don't see any discussion of that in the design > doc. We need to weigh those sorts of considerations too. > > Thanks. > Russ Here we go: https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_uNQw3DfSY-ztoA-4wWbK...
Sign in to reply to this message.
On 2014/07/16 12:07:42, dvyukov wrote: > On 2014/07/15 20:24:52, rsc wrote: > > Thanks for working on this. > > > > The doc linked in the CL description is not really a design doc for this > > change. It is a sketch of a bunch of possible changes, only some of which > > are in this CL, and it is very light on actual details. I would like to see > > a separate doc that is more detailed and describes only the changes being > > made in this CL. It should also discuss the implications of the change. One > > implication is that the heap dumper will be dramatically less useful > > because it will not have type information to dump. Another concern is the > > size of the new tables in the generated object file. I don't see any kind > > of compression that takes care of large arrays or structs containing large > > arrays. More importantly, I don't see any discussion of that in the design > > doc. We need to weigh those sorts of considerations too. > > > > Thanks. > > Russ > > > > Here we go: > https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_uNQw3DfSY-ztoA-4wWbK... Also replaced the link in the description.
Sign in to reply to this message.
As far as the heap dumper goes, we might be able to reconstruct types. The dwarf info provides types for all the stack frames and globals, and we can propagate types into the heap like GC does (or soon-to-be used to). The only trouble is unsafe.Pointer, so there may be some objects that end up with no type, but probably not many. It's a fair amount of work in the dump reader, but probably worth it. As a bonus propagating types will correctly type no-pointer objects that currently don't have a type in the heap dump. On Wed, Jul 16, 2014 at 5:58 AM, <dvyukov@google.com> wrote: > On 2014/07/16 12:07:42, dvyukov wrote: > > On 2014/07/15 20:24:52, rsc wrote: >> > Thanks for working on this. >> > >> > The doc linked in the CL description is not really a design doc for >> > this > >> > change. It is a sketch of a bunch of possible changes, only some of >> > which > >> > are in this CL, and it is very light on actual details. I would like >> > to see > >> > a separate doc that is more detailed and describes only the changes >> > being > >> > made in this CL. It should also discuss the implications of the >> > change. One > >> > implication is that the heap dumper will be dramatically less useful >> > because it will not have type information to dump. Another concern >> > is the > >> > size of the new tables in the generated object file. I don't see any >> > kind > >> > of compression that takes care of large arrays or structs containing >> > large > >> > arrays. More importantly, I don't see any discussion of that in the >> > design > >> > doc. We need to weigh those sorts of considerations too. >> > >> > Thanks. >> > Russ >> > > > > Here we go: >> > > https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_ > uNQw3DfSY-ztoA-4wWbKcg/pub > > Also replaced the link in the description. > > https://codereview.appspot.com/106260045/ >
Sign in to reply to this message.
Nice another option Btw, Keith, please take a look at the heapdump changes. It compiles, writes something and does not crash. That's all I can say. I suspect that it requires more changes (maybe in a separate cl). For example, as far as I understand it dumps type descriptors to later associate them with objects, but we don't have Type* for objects anymore; so analyzer probably won't be able to do anything useful with type descriptors... On Wed, Jul 16, 2014 at 8:20 PM, Keith Randall <khr@google.com> wrote: > As far as the heap dumper goes, we might be able to reconstruct types. The > dwarf info provides types for all the stack frames and globals, and we can > propagate types into the heap like GC does (or soon-to-be used to). The > only trouble is unsafe.Pointer, so there may be some objects that end up > with no type, but probably not many. It's a fair amount of work in the dump > reader, but probably worth it. As a bonus propagating types will correctly > type no-pointer objects that currently don't have a type in the heap dump. > > > > On Wed, Jul 16, 2014 at 5:58 AM, <dvyukov@google.com> wrote: >> >> On 2014/07/16 12:07:42, dvyukov wrote: >> >>> On 2014/07/15 20:24:52, rsc wrote: >>> > Thanks for working on this. >>> > >>> > The doc linked in the CL description is not really a design doc for >> >> this >>> >>> > change. It is a sketch of a bunch of possible changes, only some of >> >> which >>> >>> > are in this CL, and it is very light on actual details. I would like >> >> to see >>> >>> > a separate doc that is more detailed and describes only the changes >> >> being >>> >>> > made in this CL. It should also discuss the implications of the >> >> change. One >>> >>> > implication is that the heap dumper will be dramatically less useful >>> > because it will not have type information to dump. Another concern >> >> is the >>> >>> > size of the new tables in the generated object file. I don't see any >> >> kind >>> >>> > of compression that takes care of large arrays or structs containing >> >> large >>> >>> > arrays. More importantly, I don't see any discussion of that in the >> >> design >>> >>> > doc. We need to weigh those sorts of considerations too. >>> > >>> > Thanks. >>> > Russ >> >> >> >> >>> Here we go: >> >> >> >> https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_uNQw3DfSY-ztoA-4wWbK... >> >> Also replaced the link in the description. >> >> https://codereview.appspot.com/106260045/ > >
Sign in to reply to this message.
The heapdump changes look ok for now. It might need a fixup after I implement the type propagation. We'll still need the type entries, just to map Type* (read from Eface.type words) to dwarf types. V1 would use the type name as the key, although I'd like to implement a more robust map at some point. On Wed, Jul 16, 2014 at 9:26 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > Nice > another option > > Btw, Keith, please take a look at the heapdump changes. It compiles, > writes something and does not crash. That's all I can say. I suspect > that it requires more changes (maybe in a separate cl). For example, > as far as I understand it dumps type descriptors to later associate > them with objects, but we don't have Type* for objects anymore; so > analyzer probably won't be able to do anything useful with type > descriptors... > > > On Wed, Jul 16, 2014 at 8:20 PM, Keith Randall <khr@google.com> wrote: > > As far as the heap dumper goes, we might be able to reconstruct types. > The > > dwarf info provides types for all the stack frames and globals, and we > can > > propagate types into the heap like GC does (or soon-to-be used to). The > > only trouble is unsafe.Pointer, so there may be some objects that end up > > with no type, but probably not many. It's a fair amount of work in the > dump > > reader, but probably worth it. As a bonus propagating types will > correctly > > type no-pointer objects that currently don't have a type in the heap > dump. > > > > > > > > On Wed, Jul 16, 2014 at 5:58 AM, <dvyukov@google.com> wrote: > >> > >> On 2014/07/16 12:07:42, dvyukov wrote: > >> > >>> On 2014/07/15 20:24:52, rsc wrote: > >>> > Thanks for working on this. > >>> > > >>> > The doc linked in the CL description is not really a design doc for > >> > >> this > >>> > >>> > change. It is a sketch of a bunch of possible changes, only some of > >> > >> which > >>> > >>> > are in this CL, and it is very light on actual details. I would like > >> > >> to see > >>> > >>> > a separate doc that is more detailed and describes only the changes > >> > >> being > >>> > >>> > made in this CL. It should also discuss the implications of the > >> > >> change. One > >>> > >>> > implication is that the heap dumper will be dramatically less useful > >>> > because it will not have type information to dump. Another concern > >> > >> is the > >>> > >>> > size of the new tables in the generated object file. I don't see any > >> > >> kind > >>> > >>> > of compression that takes care of large arrays or structs containing > >> > >> large > >>> > >>> > arrays. More importantly, I don't see any discussion of that in the > >> > >> design > >>> > >>> > doc. We need to weigh those sorts of considerations too. > >>> > > >>> > Thanks. > >>> > Russ > >> > >> > >> > >> > >>> Here we go: > >> > >> > >> > >> > https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_uNQw3DfSY-ztoA-4wWbK... > >> > >> Also replaced the link in the description. > >> > >> https://codereview.appspot.com/106260045/ > > > > >
Sign in to reply to this message.
In the longer term I want to pass less information to new, so that in particular the type information passed to new does not include a method table, to reduce binary bloat. So you can't necessarily assume that the Type*s are in the binary. Using the DWARF information makes sense, and we could at least pass the type string to new. Russ
Sign in to reply to this message.
It's not the Type*s to new that matter, it is the Type*s in Eface.type and in Iface.tab.type that matter. Those should always be in the binary. As long as I can map from either of those to a dwarf type, I'm happy. On Wed, Jul 16, 2014 at 10:20 AM, Russ Cox <rsc@golang.org> wrote: > In the longer term I want to pass less information to new, so that in > particular the type information passed to new does not include a method > table, to reduce binary bloat. So you can't necessarily assume that the > Type*s are in the binary. Using the DWARF information makes sense, and we > could at least pass the type string to new. > > Russ > >
Sign in to reply to this message.
On Wed, Jul 16, 2014 at 8:07 AM, <dvyukov@google.com> wrote: > Here we go: > https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_ > uNQw3DfSY-ztoA-4wWbKcg/pub Thanks. My two concerns are: 1) Not losing the heap dumper functionality. If khr is happy, then I'm happy. 2) The generated object file must be at most proportional to the size of the input source file, not to the numbers in the source file. If I write a program that contains but does not execute: p := new([1<<40]*byte) q := new(struct{x float64; y [1<<40]*byte; z []string}) Then the object file needs to be roughly the same size as if those two lines were not present. The 50000 bytes that are generated for [100000]unsafe.Pointer are an instance of this problem. Russ
Sign in to reply to this message.
On 2014/07/18 16:35:41, rsc wrote: > On Wed, Jul 16, 2014 at 8:07 AM, <mailto:dvyukov@google.com> wrote: > > > Here we go: > > https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_ > > uNQw3DfSY-ztoA-4wWbKcg/pub > > > Thanks. My two concerns are: > > 1) Not losing the heap dumper functionality. If khr is happy, then I'm > happy. > > 2) The generated object file must be at most proportional to the size of > the input source file, not to the numbers in the source file. If I write a > program that contains but does not execute: > > p := new([1<<40]*byte) > q := new(struct{x float64; y [1<<40]*byte; z []string}) > > Then the object file needs to be roughly the same size as if those two > lines were not present. The 50000 bytes that are generated for > [100000]unsafe.Pointer are an instance of this problem. Keith, can you confirm that heapdump is under your control after this change? I will handle the second concern.
Sign in to reply to this message.
Yes, I'll handle the heap dump. On Sat, Jul 19, 2014 at 11:19 AM, <dvyukov@google.com> wrote: > On 2014/07/18 16:35:41, rsc wrote: > >> On Wed, Jul 16, 2014 at 8:07 AM, <mailto:dvyukov@google.com> wrote: >> > > > Here we go: >> > https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_ >> > uNQw3DfSY-ztoA-4wWbKcg/pub >> > > > Thanks. My two concerns are: >> > > 1) Not losing the heap dumper functionality. If khr is happy, then I'm >> happy. >> > > 2) The generated object file must be at most proportional to the size >> > of > >> the input source file, not to the numbers in the source file. If I >> > write a > >> program that contains but does not execute: >> > > p := new([1<<40]*byte) >> q := new(struct{x float64; y [1<<40]*byte; z []string}) >> > > Then the object file needs to be roughly the same size as if those two >> lines were not present. The 50000 bytes that are generated for >> [100000]unsafe.Pointer are an instance of this problem. >> > > > > Keith, can you confirm that heapdump is under your control after this > change? > I will handle the second concern. > > > > https://codereview.appspot.com/106260045/ >
Sign in to reply to this message.
great! thanks! On Sat, Jul 19, 2014 at 10:29 PM, Keith Randall <khr@google.com> wrote: > Yes, I'll handle the heap dump. > > > > On Sat, Jul 19, 2014 at 11:19 AM, <dvyukov@google.com> wrote: >> >> On 2014/07/18 16:35:41, rsc wrote: >>> >>> On Wed, Jul 16, 2014 at 8:07 AM, <mailto:dvyukov@google.com> wrote: >> >> >>> > Here we go: >>> > https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_ >>> > uNQw3DfSY-ztoA-4wWbKcg/pub >> >> >> >>> Thanks. My two concerns are: >> >> >>> 1) Not losing the heap dumper functionality. If khr is happy, then I'm >>> happy. >> >> >>> 2) The generated object file must be at most proportional to the size >> >> of >>> >>> the input source file, not to the numbers in the source file. If I >> >> write a >>> >>> program that contains but does not execute: >> >> >>> p := new([1<<40]*byte) >>> q := new(struct{x float64; y [1<<40]*byte; z []string}) >> >> >>> Then the object file needs to be roughly the same size as if those two >>> lines were not present. The 50000 bytes that are generated for >>> [100000]unsafe.Pointer are an instance of this problem. >> >> >> >> >> Keith, can you confirm that heapdump is under your control after this >> change? >> I will handle the second concern. >> >> >> >> https://codereview.appspot.com/106260045/ > >
Sign in to reply to this message.
On 2014/07/18 16:35:41, rsc wrote: > On Wed, Jul 16, 2014 at 8:07 AM, <mailto:dvyukov@google.com> wrote: > > > Here we go: > > https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_ > > uNQw3DfSY-ztoA-4wWbKcg/pub > > > Thanks. My two concerns are: > > 1) Not losing the heap dumper functionality. If khr is happy, then I'm > happy. > > 2) The generated object file must be at most proportional to the size of > the input source file, not to the numbers in the source file. If I write a > program that contains but does not execute: > > p := new([1<<40]*byte) > q := new(struct{x float64; y [1<<40]*byte; z []string}) > > Then the object file needs to be roughly the same size as if those two > lines were not present. The 50000 bytes that are generated for > [100000]unsafe.Pointer are an instance of this problem. Russ, I've updated the design doc to handle this aspect. See the first "Type info" section, it describes type programs.
Sign in to reply to this message.
On 2014/07/21 12:10:06, dvyukov wrote: > On 2014/07/18 16:35:41, rsc wrote: > > On Wed, Jul 16, 2014 at 8:07 AM, <mailto:dvyukov@google.com> wrote: > > > > > Here we go: > > > https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_ > > > uNQw3DfSY-ztoA-4wWbKcg/pub > > > > > > Thanks. My two concerns are: > > > > 1) Not losing the heap dumper functionality. If khr is happy, then I'm > > happy. > > > > 2) The generated object file must be at most proportional to the size of > > the input source file, not to the numbers in the source file. If I write a > > program that contains but does not execute: > > > > p := new([1<<40]*byte) > > q := new(struct{x float64; y [1<<40]*byte; z []string}) > > > > Then the object file needs to be roughly the same size as if those two > > lines were not present. The 50000 bytes that are generated for > > [100000]unsafe.Pointer are an instance of this problem. > > > Russ, I've updated the design doc to handle this aspect. > See the first "Type info" section, it describes type programs. I've separated few separable pieces from this CL to reduces number of touched files. And I think they are positive regardless of this bigger change: https://codereview.appspot.com/116060043/ https://codereview.appspot.com/117000043/ https://codereview.appspot.com/116940043/ https://codereview.appspot.com/116950043/
Sign in to reply to this message.
I am still concerned about the runtime footprint. If I allocate a big fixed-size array and then free it, I don't want to pay the cost of the cached mask in the bss forever. I asked you about this over chat but apparently I did not understand the answer correctly. The doc says "If the mask fits into these 2 words, then compiler generates mask and stores it directly into Type.gc (no indirection). Otherwise, gc[0] points to a symbol in BSS large enough to store the mask + 1 byte; and gc[1] points to the program in RODATA." I do not believe the BSS symbols should be used. It should decode the mask directly into the GC bitmaps. Russ
Sign in to reply to this message.
There is a copy of the mask in GC bitmap as well. Even when the object is freed, the bitmap region is still in memory. I don't wan't to unwind it every time. On 32-bits it's used for objects >64 bytes (256 for 64-bits). So these allocations can be quite frequent. Now there is a very clear separation between in-binary format and in-memory format. Which is, I think, very good because each format concentrates on own thing and e.g. we don't need to optimize programs for runtime performance (as I have them now, they are quite suboptimal from this point of view). I think I can do a slightly different thing which will address your concern: 1. When a huge span is passed to SysUnused, also call SysUnused for the region of GC bitmap. 2. When unrolling a huge program (>>PageSize), remember the Type in a global list. During GC walk that list, mark the program as not yet unwound and mark the mask as SysUnused. This works even if the mask is in BSS. Note that I need at least a word in BSS either way, because I need a fast mapping from Type to the mask during malloc. Sounds good? On Tue, Jul 22, 2014 at 5:31 PM, Russ Cox <rsc@golang.org> wrote: > I am still concerned about the runtime footprint. If I allocate a big > fixed-size array and then free it, I don't want to pay the cost of the > cached mask in the bss forever. I asked you about this over chat but > apparently I did not understand the answer correctly. > > The doc says "If the mask fits into these 2 words, then compiler generates > mask and stores it directly into Type.gc (no indirection). Otherwise, gc[0] > points to a symbol in BSS large enough to store the mask + 1 byte; and gc[1] > points to the program in RODATA." > > I do not believe the BSS symbols should be used. It should decode the mask > directly into the GC bitmaps. > > Russ
Sign in to reply to this message.
On Tue, Jul 22, 2014 at 9:57 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > There is a copy of the mask in GC bitmap as well. Even when the object > is freed, the bitmap region is still in memory. > I don't understand. When the object memory is reused for some other type, that GC bitmap region is reused. The unrolled per-type copy in the BSS is not. > I don't wan't to unwind it every time. On 32-bits it's used for > objects >64 bytes (256 for 64-bits). So these allocations can be quite > frequent. Now there is a very clear separation between in-binary > format and in-memory format. Which is, I think, very good because each > format concentrates on own thing and e.g. we don't need to optimize > programs for runtime performance (as I have them now, they are quite > suboptimal from this point of view). > The program looks absolutely trivial to execute. It should cost no more than a memmove would. In fact it may be faster, because it will use less cache. > I think I can do a slightly different thing which will address your > concern: > 1. When a huge span is passed to SysUnused, also call SysUnused for > the region of GC bitmap. > 2. When unrolling a huge program (>>PageSize), remember the Type in a > global list. During GC walk that list, mark the program as not yet > unwound and mark the mask as SysUnused. This works even if the mask is > in BSS. > Note that I need at least a word in BSS either way, because I need a > fast mapping from Type to the mask during malloc. > You don't need any words in BSS if you replay the program every time. I still believe that can be made just as fast as memmove. You're already zeroing the memory at some point, so filling in the bitmap can't be prohibitive. I don't believe that caching+memmove will be appreciably faster than executing the program, which you've made nice and simple and should be very quick to execute. Do you have numbers? Russ
Sign in to reply to this message.
On Tue, Jul 22, 2014 at 6:10 PM, Russ Cox <rsc@golang.org> wrote: > On Tue, Jul 22, 2014 at 9:57 AM, Dmitry Vyukov <dvyukov@google.com> wrote: >> >> There is a copy of the mask in GC bitmap as well. Even when the object >> is freed, the bitmap region is still in memory. > > > I don't understand. When the object memory is reused for some other type, > that GC bitmap region is reused. The unrolled per-type copy in the BSS is > not. I am think about a situation when you allocate 10GB during startup, and then use 100MB during normal work. GC bitmap will still consume another 512MB. >> I don't wan't to unwind it every time. On 32-bits it's used for >> objects >64 bytes (256 for 64-bits). So these allocations can be quite >> frequent. Now there is a very clear separation between in-binary >> format and in-memory format. Which is, I think, very good because each >> format concentrates on own thing and e.g. we don't need to optimize >> programs for runtime performance (as I have them now, they are quite >> suboptimal from this point of view). > > > The program looks absolutely trivial to execute. It should cost no more than > a memmove would. In fact it may be faster, because it will use less cache. > >> >> I think I can do a slightly different thing which will address your >> concern: >> 1. When a huge span is passed to SysUnused, also call SysUnused for >> the region of GC bitmap. >> 2. When unrolling a huge program (>>PageSize), remember the Type in a >> global list. During GC walk that list, mark the program as not yet >> unwound and mark the mask as SysUnused. This works even if the mask is >> in BSS. >> Note that I need at least a word in BSS either way, because I need a >> fast mapping from Type to the mask during malloc. > > > You don't need any words in BSS if you replay the program every time. You are right, I just always assumed that it must be stored in unwound form in runtime. > I still believe that can be made just as fast as memmove. You're already > zeroing the memory at some point, so filling in the bitmap can't be > prohibitive. > > I don't believe that caching+memmove will be appreciably faster than > executing the program, which you've made nice and simple and should be very > quick to execute. Do you have numbers? I will measure the difference. A good enough solution can be to execute programs directly into GC bitmap when the unwound program would consume > PageSize. However, it still requires special handling of arrays in compiler, linker and runtime (so that unwound program for [1<<20][]byte describes only 1 element -- runtime knows how to multiply type info for slices anyway).
Sign in to reply to this message.
On Tue, Jul 22, 2014 at 10:37 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > On Tue, Jul 22, 2014 at 6:10 PM, Russ Cox <rsc@golang.org> wrote: > > On Tue, Jul 22, 2014 at 9:57 AM, Dmitry Vyukov <dvyukov@google.com> > wrote: > >> > >> There is a copy of the mask in GC bitmap as well. Even when the object > >> is freed, the bitmap region is still in memory. > > > > > > I don't understand. When the object memory is reused for some other type, > > that GC bitmap region is reused. The unrolled per-type copy in the BSS is > > not. > > I am think about a situation when you allocate 10GB during startup, > and then use 100MB during normal work. GC bitmap will still consume > another 512MB. > Ah, yes. I agree that using SysUnused in this case is worthwhile. Russ
Sign in to reply to this message.
On 2014/07/22 14:38:24, dvyukov wrote: > On Tue, Jul 22, 2014 at 6:10 PM, Russ Cox <mailto:rsc@golang.org> wrote: > > On Tue, Jul 22, 2014 at 9:57 AM, Dmitry Vyukov <mailto:dvyukov@google.com> wrote: > >> > >> There is a copy of the mask in GC bitmap as well. Even when the object > >> is freed, the bitmap region is still in memory. > > > > > > I don't understand. When the object memory is reused for some other type, > > that GC bitmap region is reused. The unrolled per-type copy in the BSS is > > not. > > I am think about a situation when you allocate 10GB during startup, > and then use 100MB during normal work. GC bitmap will still consume > another 512MB. > > > >> I don't wan't to unwind it every time. On 32-bits it's used for > >> objects >64 bytes (256 for 64-bits). So these allocations can be quite > >> frequent. Now there is a very clear separation between in-binary > >> format and in-memory format. Which is, I think, very good because each > >> format concentrates on own thing and e.g. we don't need to optimize > >> programs for runtime performance (as I have them now, they are quite > >> suboptimal from this point of view). > > > > > > The program looks absolutely trivial to execute. It should cost no more than > > a memmove would. In fact it may be faster, because it will use less cache. > > > >> > >> I think I can do a slightly different thing which will address your > >> concern: > >> 1. When a huge span is passed to SysUnused, also call SysUnused for > >> the region of GC bitmap. > >> 2. When unrolling a huge program (>>PageSize), remember the Type in a > >> global list. During GC walk that list, mark the program as not yet > >> unwound and mark the mask as SysUnused. This works even if the mask is > >> in BSS. > >> Note that I need at least a word in BSS either way, because I need a > >> fast mapping from Type to the mask during malloc. > > > > > > You don't need any words in BSS if you replay the program every time. > > You are right, I just always assumed that it must be stored in unwound > form in runtime. > > > > I still believe that can be made just as fast as memmove. You're already > > zeroing the memory at some point, so filling in the bitmap can't be > > prohibitive. > > > > I don't believe that caching+memmove will be appreciably faster than > > executing the program, which you've made nice and simple and should be very > > quick to execute. Do you have numbers? > > > I will measure the difference. > > A good enough solution can be to execute programs directly into GC > bitmap when the unwound program would consume > PageSize. However, it > still requires special handling of arrays in compiler, linker and > runtime (so that unwound program for [1<<20][]byte describes only 1 > element -- runtime knows how to multiply type info for slices anyway). On the following benchmark: type LargeStruct struct { x [40]*byte } func BenchmarkMallocLargeStruct(b *testing.B) { var x uintptr for i := 0; i < b.N; i++ { p := new(LargeStruct) x ^= uintptr(unsafe.Pointer(p)) } mallocSink = x } Direct copy is 172 ns/op, program execution is 581 ns/op (+338%). Profiles are: 39.00% runtime.test runtime.test [.] runtime.markallocated 12.00% runtime.test runtime.test [.] runtime.mallocgc 9.53% runtime.test runtime.test [.] runtime.MSpan_Sweep 7.57% runtime.test runtime.test [.] runtime.memclr 5.52% runtime.test runtime.test [.] runtime.MSpanList_IsEmpty 3.13% runtime.test runtime.test [.] scanblock 3.11% runtime.test runtime.test [.] MHeap_AllocSpanLocked vs: 79.84% runtime.test runtime.test [.] unrollgcprog1 4.14% runtime.test runtime.test [.] runtime.mallocgc 2.60% runtime.test runtime.test [.] runtime.MSpan_Sweep 2.29% runtime.test runtime.test [.] runtime.memclr 1.87% runtime.test runtime.test [.] runtime.MSpanList_IsEmpty 1.75% runtime.test runtime.test [.] runtime.markallocated The code is uploaded (however somewhat dirty for now).
Sign in to reply to this message.
On 2014/07/22 16:56:36, dvyukov wrote: > On 2014/07/22 14:38:24, dvyukov wrote: > > On Tue, Jul 22, 2014 at 6:10 PM, Russ Cox <mailto:rsc@golang.org> wrote: > > > On Tue, Jul 22, 2014 at 9:57 AM, Dmitry Vyukov <mailto:dvyukov@google.com> > wrote: > > >> > > >> There is a copy of the mask in GC bitmap as well. Even when the object > > >> is freed, the bitmap region is still in memory. > > > > > > > > > I don't understand. When the object memory is reused for some other type, > > > that GC bitmap region is reused. The unrolled per-type copy in the BSS is > > > not. > > > > I am think about a situation when you allocate 10GB during startup, > > and then use 100MB during normal work. GC bitmap will still consume > > another 512MB. > > > > > > >> I don't wan't to unwind it every time. On 32-bits it's used for > > >> objects >64 bytes (256 for 64-bits). So these allocations can be quite > > >> frequent. Now there is a very clear separation between in-binary > > >> format and in-memory format. Which is, I think, very good because each > > >> format concentrates on own thing and e.g. we don't need to optimize > > >> programs for runtime performance (as I have them now, they are quite > > >> suboptimal from this point of view). > > > > > > > > > The program looks absolutely trivial to execute. It should cost no more than > > > a memmove would. In fact it may be faster, because it will use less cache. > > > > > >> > > >> I think I can do a slightly different thing which will address your > > >> concern: > > >> 1. When a huge span is passed to SysUnused, also call SysUnused for > > >> the region of GC bitmap. > > >> 2. When unrolling a huge program (>>PageSize), remember the Type in a > > >> global list. During GC walk that list, mark the program as not yet > > >> unwound and mark the mask as SysUnused. This works even if the mask is > > >> in BSS. > > >> Note that I need at least a word in BSS either way, because I need a > > >> fast mapping from Type to the mask during malloc. > > > > > > > > > You don't need any words in BSS if you replay the program every time. > > > > You are right, I just always assumed that it must be stored in unwound > > form in runtime. > > > > > > > I still believe that can be made just as fast as memmove. You're already > > > zeroing the memory at some point, so filling in the bitmap can't be > > > prohibitive. > > > > > > I don't believe that caching+memmove will be appreciably faster than > > > executing the program, which you've made nice and simple and should be very > > > quick to execute. Do you have numbers? > > > > > > I will measure the difference. > > > > A good enough solution can be to execute programs directly into GC > > bitmap when the unwound program would consume > PageSize. However, it > > still requires special handling of arrays in compiler, linker and > > runtime (so that unwound program for [1<<20][]byte describes only 1 > > element -- runtime knows how to multiply type info for slices anyway). > > > > On the following benchmark: > > type LargeStruct struct { > x [40]*byte > } > func BenchmarkMallocLargeStruct(b *testing.B) { > var x uintptr > for i := 0; i < b.N; i++ { > p := new(LargeStruct) > x ^= uintptr(unsafe.Pointer(p)) > } > mallocSink = x > } > > Direct copy is 172 ns/op, program execution is 581 ns/op (+338%). > > Profiles are: > > 39.00% runtime.test runtime.test [.] runtime.markallocated > 12.00% runtime.test runtime.test [.] runtime.mallocgc > 9.53% runtime.test runtime.test [.] runtime.MSpan_Sweep > 7.57% runtime.test runtime.test [.] runtime.memclr > 5.52% runtime.test runtime.test [.] runtime.MSpanList_IsEmpty > 3.13% runtime.test runtime.test [.] scanblock > 3.11% runtime.test runtime.test [.] MHeap_AllocSpanLocked > > vs: > > 79.84% runtime.test runtime.test [.] unrollgcprog1 > 4.14% runtime.test runtime.test [.] runtime.mallocgc > 2.60% runtime.test runtime.test [.] runtime.MSpan_Sweep > 2.29% runtime.test runtime.test [.] runtime.memclr > 1.87% runtime.test runtime.test [.] runtime.MSpanList_IsEmpty > 1.75% runtime.test runtime.test [.] runtime.markallocated > > The code is uploaded (however somewhat dirty for now). Are you sure it is important to address right now? For this to happen, user must (1) have a ridiculously large type (not just a large slice of small objects), (2) allocate it (which is not the case for e.g. reflect Itab fake), (3) discard the object and never allocate it again. There is significant number of issues that affect real programs, e.g. dead G's are never discarded (we actually have user reports about that), or defers are slow, or closures cause unnecessary memory allocations.
Sign in to reply to this message.
I would rather have this code and not need it than need it and not have it. I think it is fine to play the program every time and then worry about whether it is too slow. I am sure it is still much faster than old allocations. Russ
Sign in to reply to this message.
On 2014/07/22 17:49:31, rsc wrote: > I would rather have this code and not need it than need it and not have it. The downside is not just additional code. This code is slow. So there are chances that we will need to look at it still and do something with the slowness of that code. And if we will start optimizing it, it will be a wrong path. From this point of view, it's better to not have that code. I think that chances to hit slowness of that code are higher then chances of hitting a weird type allocated once. And note that we are not talking about program crashing due to OOM, we are talking only about residual excessive memory consumption which is at most 1/32 of the maximum program memory consumption in the past. > I think it is fine to play the program every time and then worry about > whether it is too slow. I am sure it is still much faster than old > allocations. No, it's considerably slower. The current code just stores one word with type info.
Sign in to reply to this message.
Okay, well then it needs to be made faster. We can't leave this code out. People use Go for big things. Russ
Sign in to reply to this message.
On Tue, Jul 22, 2014 at 10:32 PM, Russ Cox <rsc@golang.org> wrote: > Okay, well then it needs to be made faster. We can't leave this code out. > People use Go for big things. Big slices and map are not a problem. Also if you keep the big thing in memory (or even several things), then it's also OK, because the mask overhead is negligible. It's only very weird cases that are not super optimal. But we can't make all weird cases optimal. Have you seen at least one such case?
Sign in to reply to this message.
PTAL I've added GC programs (still stored in bss for every allocated type).
Sign in to reply to this message.
On Tue, Jul 22, 2014 at 2:35 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > Big slices and map are not a problem. > Also if you keep the big thing in memory (or even several things), > then it's also OK, because the mask overhead is negligible. > It's only very weird cases that are not super optimal. But we can't > make all weird cases optimal. Have you seen at least one such case? > It doesn't matter if I've seen the case. This is a compiler for a general language. It needs to generate binaries and code with the right asymptotic properties. That means that there are no corner case blow-ups like this. It is okay to focus on the raw performance too, but not at the cost of the asymptotics. Russ
Sign in to reply to this message.
On 2014/07/22 20:18:43, rsc wrote: > On Tue, Jul 22, 2014 at 2:35 PM, Dmitry Vyukov <mailto:dvyukov@google.com> wrote: > > > Big slices and map are not a problem. > > Also if you keep the big thing in memory (or even several things), > > then it's also OK, because the mask overhead is negligible. > > It's only very weird cases that are not super optimal. But we can't > > make all weird cases optimal. Have you seen at least one such case? > > > > It doesn't matter if I've seen the case. This is a compiler for a general > language. It needs to generate binaries and code with the right asymptotic > properties. That means that there are no corner case blow-ups like this. It > is okay to focus on the raw performance too, but not at the cost of the > asymptotics. > > Russ Done PTAL What asymptotic properties do you mean here? What function of what?
Sign in to reply to this message.
On Wed, Jul 23, 2014 at 4:59 AM, <dvyukov@google.com> wrote: > Done > PTAL > > What asymptotic properties do you mean here? What function of what? > It's not even a function in this case. The memory reclaimed by garbage collection should eventually be 100% of the unused memory. Caching unrolled programs in the BSS violates that. Russ
Sign in to reply to this message.
Can you please update the doc? It still talks about caching in the BSS.
Sign in to reply to this message.
On 2014/07/23 12:03:17, rsc wrote: > Can you please update the doc? It still talks about caching in the BSS. Done Note that BSS is still used for normal types. Only huge types execute program directly into GC bitmap.
Sign in to reply to this message.
Did not read runtime yet. The compiler changes look mostly reasonable. https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c File src/cmd/gc/reflect.c (right): https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:807: ot = duintptr(s, ot, ((uvlong*)gcmask)[0]); gcmask is [16]uint8 and this code is interpreting it as [2]uintptr. Effectively the code is assuming that the byte order of the compiler matches the byte order of the target system. That is not a valid assumption. Please correct this. If you are writing bytes, use duint8. If you are writing uintptrs, make gengcmask fill in integers. https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1275: size *= 2; // repeated twice // repeated (repeated twice would be 3 times) https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1304: // Unford the mask for the GC bitmap format: Unfold https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1305: // 4 bits per word, 2 high bits encode pointer info. For the record, once stack maps are down to 1 bit per word instead of 2, I am going to object to the waste of pre-expanding this 4x. I am willing to stomach 2x for now. https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1310: // If number of words is odd, repeat the mask twice. s/ twice// https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1313: for(i=0; i<nptr; i+=1) { i++ https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1314: bits = ((uint8*)vec->b)[i*BitsPerPointer/8]; Use the accessor functions, not direct bitmap manipulation. They will translate better. bvset/bvreset/bvget. bits = bvget(vec, i*2)<<1 | bvget(vec, i*2+1) https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1443: if(size <= MaxGCMask) { Please make this if(0) { // if(size <= MaxGCMask) to enable BSS caching I do not want to use ANY space in the bss until we understand how fast we can make the non-cached version. You have MaxGCMask set to 8 kB which is still much more profligate than I would like. Once this change is in with no BSS caching, we can do a follow-up round trying to improve performance and understanding what caching is worthwhile. But the initial checkin should focus on not using unnecessary memory. Memory is not always as cheap as you think. https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1520: } else if(t1->width<widthptr || !haspointers(t1)) { Simplify: } else if(!haspointers(t1)) { n = t->width; n -= -*xoffset&(widthptr-1); // skip to next ptr boundary proggenarray(g, (n+widthptr-1)/widthptr); proggendata(g, BitsScalar); proggenarrayend(g); *xoffset += t->width; } Note the variable is named 'n'. o is for offsets, and this isn't one. https://codereview.appspot.com/106260045/diff/880001/src/pkg/runtime/mgc0.h File src/pkg/runtime/mgc0.h (right): https://codereview.appspot.com/106260045/diff/880001/src/pkg/runtime/mgc0.h#n... src/pkg/runtime/mgc0.h:22: WordsPerByte = 8/BitsPerPointer, Please call this PointersPerByte. Words is too generic. The name is confusing no matter what, but PointersPerByte sounds more like a reminder of BitsPerPointer.
Sign in to reply to this message.
PTAL https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c File src/cmd/gc/reflect.c (right): https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:807: ot = duintptr(s, ot, ((uvlong*)gcmask)[0]); On 2014/07/23 13:34:02, rsc wrote: > gcmask is [16]uint8 and this code is interpreting it as [2]uintptr. Effectively > the code is assuming that the byte order of the compiler matches the byte order > of the target system. That is not a valid assumption. Please correct this. > If you are writing bytes, use duint8. > If you are writing uintptrs, make gengcmask fill in integers. > Done. https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1275: size *= 2; // repeated twice On 2014/07/23 13:34:02, rsc wrote: > // repeated > (repeated twice would be 3 times) Done. https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1304: // Unford the mask for the GC bitmap format: On 2014/07/23 13:34:02, rsc wrote: > Unfold Done. https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1305: // 4 bits per word, 2 high bits encode pointer info. On 2014/07/23 13:34:02, rsc wrote: > For the record, once stack maps are down to 1 bit per word instead of 2, I am > going to object to the waste of pre-expanding this 4x. I am willing to stomach > 2x for now. ack fwiw now it's 24x https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1310: // If number of words is odd, repeat the mask twice. On 2014/07/23 13:34:02, rsc wrote: > s/ twice// Done. https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1313: for(i=0; i<nptr; i+=1) { On 2014/07/23 13:34:02, rsc wrote: > i++ Done. https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1314: bits = ((uint8*)vec->b)[i*BitsPerPointer/8]; On 2014/07/23 13:34:02, rsc wrote: > Use the accessor functions, not direct bitmap manipulation. They will translate > better. > bvset/bvreset/bvget. > > bits = bvget(vec, i*2)<<1 | bvget(vec, i*2+1) Done. https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1443: if(size <= MaxGCMask) { On 2014/07/23 13:34:02, rsc wrote: > Please make this > > if(0) { // if(size <= MaxGCMask) to enable BSS caching > > I do not want to use ANY space in the bss until we understand how fast we can > make the non-cached version. You have MaxGCMask set to 8 kB which is still much > more profligate than I would like. > > Once this change is in with no BSS caching, we can do a follow-up round trying > to improve performance and understanding what caching is worthwhile. But the > initial checkin should focus on not using unnecessary memory. > > Memory is not always as cheap as you think. Russ, you are going to extremes. During http serving godoc consumes 25MB of memory. Total size of unwound masks is 1365 bytes. One thousand three hundred sixty five B-Y-T-E-S. There are just 39 types with unwound masks. Even if you have 10 weird types with mask size 4K, that's just 40K of memory. And if you actually allocate them, then the objects occupy some megabytes of memory. I can show you 10 places where we waste 10 times more memory. The masks are negligible and unimportant. https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1520: } else if(t1->width<widthptr || !haspointers(t1)) { On 2014/07/23 13:34:02, rsc wrote: > Simplify: > > } else if(!haspointers(t1)) { > n = t->width; > n -= -*xoffset&(widthptr-1); // skip to next ptr boundary > proggenarray(g, (n+widthptr-1)/widthptr); > proggendata(g, BitsScalar); > proggenarrayend(g); > *xoffset += t->width; > } > > Note the variable is named 'n'. o is for offsets, and this isn't one. Done. https://codereview.appspot.com/106260045/diff/880001/src/pkg/runtime/mgc0.h File src/pkg/runtime/mgc0.h (right): https://codereview.appspot.com/106260045/diff/880001/src/pkg/runtime/mgc0.h#n... src/pkg/runtime/mgc0.h:22: WordsPerByte = 8/BitsPerPointer, On 2014/07/23 13:34:02, rsc wrote: > Please call this PointersPerByte. Words is too generic. The name is confusing no > matter what, but PointersPerByte sounds more like a reminder of BitsPerPointer. Done.
Sign in to reply to this message.
https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c File src/cmd/gc/reflect.c (right): https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1443: if(size <= MaxGCMask) { No. Waste is waste. And this particular waste is premature optimization. I am okay with leaving the code in, disabled, so that we can revisit the optimization in a second round. But this initial version does not need it. I don't have time to continue this debate in this CL.
Sign in to reply to this message.
https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c File src/cmd/gc/reflect.c (right): https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1305: // 4 bits per word, 2 high bits encode pointer info. On 2014/07/23 15:35:59, dvyukov wrote: > On 2014/07/23 13:34:02, rsc wrote: > > For the record, once stack maps are down to 1 bit per word instead of 2, I am > > going to object to the waste of pre-expanding this 4x. I am willing to stomach > > 2x for now. > > ack > fwiw now it's 24x How do you get 24x?
Sign in to reply to this message.
I got about halfway through mgc0. I hope Keith will look at the runtime changes too. https://codereview.appspot.com/106260045/diff/920001/src/cmd/gc/reflect.c File src/cmd/gc/reflect.c (right): https://codereview.appspot.com/106260045/diff/920001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1319: //bits = ((uint8*)vec->b)[i*BitsPerPointer/8]; Delete debugging dregs. https://codereview.appspot.com/106260045/diff/920001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1440: please add g.type = t; https://codereview.appspot.com/106260045/diff/920001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1524: } if(t->bound*t1->width < 32*widthptr) { } else if(t->bound <= 1 || t->bound*t1->width < 32*widthptr) { Adding the t->bound <= 1 will avoid generating array instructions for array size 0/1. Making sure that the array instructions only happen for 2 or more elements bounds the possible nesting that the runtime recursion will have to deal with to at most the number of addressable bits. The runtime certainly has room for 64 frames on the stack. But without the condition someone could overflow the runtime stack by creating a [1][1][1][1][1][1]...*byte. https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/chan.h File src/pkg/runtime/chan.h (right): https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/chan.h#n... src/pkg/runtime/chan.h:33: byte* buf; Can this change to channel representation be separated out? This CL should be about GC type information. The doc in the description says nothing about channels changing. https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/heapdump.c File src/pkg/runtime/heapdump.c (right): https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/heapdump... src/pkg/runtime/heapdump.c:788: // appear in such an entry. The following routine accomplish that. accomplishes https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/malloc.goc File src/pkg/runtime/malloc.goc (right): https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/malloc.g... src/pkg/runtime/malloc.goc:301: runtime·unmarkspan(v, s->npages<<PageShift); Why is this changing? https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (left): https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/mgc0.c#o... src/pkg/runtime/mgc0.c:1762: if(cl == 0) { seems like a gratuitous change https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (right): https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/mgc0.c#n... src/pkg/runtime/mgc0.c:1355: // This is required while we explicitly free objects and have imprecise GC. I thought that once the select change was in you could take out free. What's the status on that? Please update golang.org/s/goruntime. https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/mgc0.h File src/pkg/runtime/mgc0.h (right): https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/mgc0.h#n... src/pkg/runtime/mgc0.h:50: #define bitMiddle ((uintptr)0) // middle of an object These no longer need to be macros. Please make this an enum. enum { bitMiddle = 0, // middle of an object bitBoundary = 1, // boundary on a ... ... https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/mgc0.h#n... src/pkg/runtime/mgc0.h:56: #define bitPtrMask ((uintptr)12) Explain 12 in the code.
Sign in to reply to this message.
https://codereview.appspot.com/106260045/diff/920001/src/cmd/gc/reflect.c File src/cmd/gc/reflect.c (right): https://codereview.appspot.com/106260045/diff/920001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1440: On 2014/07/23 16:23:19, rsc wrote: > please add > > g.type = t; Ignore. This was a dreg from a suggestion to detect deeply nested arrays. Checking for >=2 elements makes the detection unnecessary.
Sign in to reply to this message.
PTAL https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c File src/cmd/gc/reflect.c (right): https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1305: // 4 bits per word, 2 high bits encode pointer info. On 2014/07/23 15:55:59, rsc wrote: > On 2014/07/23 15:35:59, dvyukov wrote: > > On 2014/07/23 13:34:02, rsc wrote: > > > For the record, once stack maps are down to 1 bit per word instead of 2, I > am > > > going to object to the waste of pre-expanding this 4x. I am willing to > stomach > > > 2x for now. > > > > ack > > fwiw now it's 24x > > How do you get 24x? It uses up to 3 words per word (GC_PTR OFF TYPE), so in your terminology it's 24x. https://codereview.appspot.com/106260045/diff/880001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1443: if(size <= MaxGCMask) { On 2014/07/23 15:55:13, rsc wrote: > No. Waste is waste. And this particular waste is premature optimization. > I am okay with leaving the code in, disabled, so that we can revisit the > optimization in a second round. > But this initial version does not need it. > I don't have time to continue this debate in this CL. This is insanity. But done. I've set MaxGCMask to 0. https://codereview.appspot.com/106260045/diff/920001/src/cmd/gc/reflect.c File src/cmd/gc/reflect.c (right): https://codereview.appspot.com/106260045/diff/920001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1319: //bits = ((uint8*)vec->b)[i*BitsPerPointer/8]; On 2014/07/23 16:23:20, rsc wrote: > Delete debugging dregs. Done. https://codereview.appspot.com/106260045/diff/920001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1440: On 2014/07/23 16:24:17, rsc wrote: > On 2014/07/23 16:23:19, rsc wrote: > > please add > > > > g.type = t; > > Ignore. This was a dreg from a suggestion to detect deeply nested arrays. > Checking for >=2 elements makes the detection unnecessary. Acknowledged. https://codereview.appspot.com/106260045/diff/920001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1524: } if(t->bound*t1->width < 32*widthptr) { On 2014/07/23 16:23:20, rsc wrote: > } else if(t->bound <= 1 || t->bound*t1->width < 32*widthptr) { > > Adding the t->bound <= 1 will avoid generating array instructions for array size > 0/1. > > Making sure that the array instructions only happen for 2 or more elements > bounds the possible nesting that the runtime recursion will have to deal with > to at most the number of addressable bits. The runtime certainly has room for > 64 frames on the stack. > > But without the condition someone could overflow the runtime stack by creating > a [1][1][1][1][1][1]...*byte. > Done. https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/chan.h File src/pkg/runtime/chan.h (right): https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/chan.h#n... src/pkg/runtime/chan.h:33: byte* buf; On 2014/07/23 16:23:20, rsc wrote: > Can this change to channel representation be separated out? > This CL should be about GC type information. > The doc in the description says nothing about channels changing. I am deleting GC_CHAN instruction, so w/o this change channel will be w/o type info (scanned conservatively). So it belongs to this change. I can temporary leave chans w/o type info; and then send a follow up CL if you wish. https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/heapdump.c File src/pkg/runtime/heapdump.c (right): https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/heapdump... src/pkg/runtime/heapdump.c:788: // appear in such an entry. The following routine accomplish that. On 2014/07/23 16:23:20, rsc wrote: > accomplishes Done. https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/malloc.goc File src/pkg/runtime/malloc.goc (right): https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/malloc.g... src/pkg/runtime/malloc.goc:301: runtime·unmarkspan(v, s->npages<<PageShift); On 2014/07/23 16:23:20, rsc wrote: > Why is this changing? previously we used only first 4 bits of GC bitmap for object, so clearing 1 page was enough now we use whole object region in GC bitmap for type info, so we need to clear it when the span is freed https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (left): https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/mgc0.c#o... src/pkg/runtime/mgc0.c:1762: if(cl == 0) { On 2014/07/23 16:23:20, rsc wrote: > seems like a gratuitous change reverted https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (right): https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/mgc0.c#n... src/pkg/runtime/mgc0.c:1355: // This is required while we explicitly free objects and have imprecise GC. On 2014/07/23 16:23:20, rsc wrote: > I thought that once the select change was in you could take out free. What's the > status on that? Please update golang.org/s/goruntime. I still want to do this. But I am waiting for this CL to land, because the overlap is very large. When I get rid of free, I will remove this as well. https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/mgc0.h File src/pkg/runtime/mgc0.h (right): https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/mgc0.h#n... src/pkg/runtime/mgc0.h:50: #define bitMiddle ((uintptr)0) // middle of an object It does not work. We shift them beyond 32 bits. Compiler seems to give them uint32 type, so them become zeros after shift. I've updated the comment as to why they are defines. https://codereview.appspot.com/106260045/diff/920001/src/pkg/runtime/mgc0.h#n... src/pkg/runtime/mgc0.h:56: #define bitPtrMask ((uintptr)12) On 2014/07/23 16:23:20, rsc wrote: > Explain 12 in the code. Done.
Sign in to reply to this message.
On 2014/07/23 16:23:20, rsc wrote: > I got about halfway through mgc0. I hope Keith will look at the runtime changes > too. Keith, please take a look too.
Sign in to reply to this message.
I did a first pass. I'm really worried that this is too much to go in all at once. I have a hard time keeping it all in my brain. Can we split it up somehow? It would be more work, but probably safer if we can split it into individually reviewable and testable pieces. Some ideas for separable CLs: 1) Split channels into header and array 2) Use just GC programs and then introduce bit vectors in a separate CL. Or the other way around. 3) Save some optimizations for separate CLs, like repeating the bits twice for odd-size objects. https://codereview.appspot.com/106260045/diff/980001/src/cmd/gc/reflect.c File src/cmd/gc/reflect.c (right): https://codereview.appspot.com/106260045/diff/980001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1279: size = size*PointersPerByte/8; // 4 bits per word The units of this conversion seem strange to me. I want size * BitsPerPointer / 8, as size is already # of pointers. Or size / PointersPerByte, maybe. https://codereview.appspot.com/106260045/diff/980001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1281: // We could use a more elaborate condition, but this seems to work good in practice. s/good/well/ https://codereview.appspot.com/106260045/diff/980001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1426: Is the format of these GC programs described anywhere in a comment? I didn't see it. https://codereview.appspot.com/106260045/diff/980001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1440: size++; // unroll flag in the beginning Where is the unroll flag set? Update: I found it, but this really needs a comment. https://codereview.appspot.com/106260045/diff/980001/src/pkg/reflect/type.go File src/pkg/reflect/type.go (left): https://codereview.appspot.com/106260045/diff/980001/src/pkg/reflect/type.go#... src/pkg/reflect/type.go:1099: if t.kind&kindNoPointers != 0 { Why is there no GC program? Or is the non-existence of a program imply conservative scanning? https://codereview.appspot.com/106260045/diff/980001/src/pkg/reflect/type.go#... src/pkg/reflect/type.go:1485: ch.gc = unsafe.Pointer(&chanGC{ Channel GC program? https://codereview.appspot.com/106260045/diff/980001/src/pkg/reflect/type.go#... src/pkg/reflect/type.go:1540: mt.hmap = hMapOf(mt.bucket) where did this go? Is there just one of these for all maps? https://codereview.appspot.com/106260045/diff/980001/src/pkg/reflect/type.go#... src/pkg/reflect/type.go:1544: mt.gc = unsafe.Pointer(&ptrGC{ gc for this? https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (right): https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.c#n... src/pkg/runtime/mgc0.c:306: // Check is we have reached end of heap. s/heap/span/ https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.c#n... src/pkg/runtime/mgc0.c:324: bits = (bits>>2)&BitsMask; BitsMask vs. bitMask? Very confusing. https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.c#n... src/pkg/runtime/mgc0.c:953: // Garbage! What? https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.c#n... src/pkg/runtime/mgc0.c:1362: flushallmcaches(); If we have to do this, why only in Debug mode? https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.c#n... src/pkg/runtime/mgc0.c:1760: prog1 = unrollgcprog1(mask, prog, &pos, inplace, sparse); This is potentially too recursive to run on the M stack. https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.c#n... src/pkg/runtime/mgc0.c:1815: // Mark word after last as bitAllocated. s/bitAllocated/bitDead/? Actually use bitDead below. https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.c#n... src/pkg/runtime/mgc0.c:1875: return; You don't add bitsDead here. I presume you're assuming bitsDead is 0. You can just add it explicitly, e.g. ((bitAllocated+(bitsDead<<2)) << shift). https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.h File src/pkg/runtime/mgc0.h (right): https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.h#n... src/pkg/runtime/mgc0.h:14: insData = 1, GC program description goes here? https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.h#n... src/pkg/runtime/mgc0.h:20: BitsPerPointer = 2, We need some names for these different types of bitmaps (2 bit/ptr, 4 bit/ptr, maybe soon 1 bit/ptr). BitsPerPointer and friends are no longer clear about which bitmap type they are talking about. https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.h#n... src/pkg/runtime/mgc0.h:35: MaxGCMask = 0, // disabled because wastes several bytes of memory Why are some of these constants capitalized and some aren't? Same for the bit* constants below. https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/type.h File src/pkg/runtime/type.h (right): https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/type.h#n... src/pkg/runtime/type.h:30: uintptr gc[2]; This really needs a long comment describing what gc[0] and gc[1] are.
Sign in to reply to this message.
On 2014/07/23 22:42:04, khr wrote: > I did a first pass. > > I'm really worried that this is too much to go in all at once. I have a hard > time keeping it all in my brain. Can we split it up somehow? It would be more > work, but probably safer if we can split it into individually reviewable and > testable pieces. I am open to suggestions, but it's all very tightly coupled. > Some ideas for separable CLs: > 1) Split channels into header and array I can do this if you are find with temporally leaving channels w/o type info (conservative scan). I've asked Russ the same question yesterday in the chan.h comment. > 2) Use just GC programs and then introduce bit vectors in > a separate CL. Or the other way around. That's how I first mailed it. Patchset 39 contains only masks. But Russ insisted on adding programs in this CL. > 3) Save some optimizations for separate CLs, like repeating the bits twice for > odd-size objects. This was actually intended as simplification, not optimization. Otherwise runtime will have to deal with case when target byte needs to be combined from different bytes of mask.
Sign in to reply to this message.
PTAL https://codereview.appspot.com/106260045/diff/980001/src/cmd/gc/reflect.c File src/cmd/gc/reflect.c (right): https://codereview.appspot.com/106260045/diff/980001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1279: size = size*PointersPerByte/8; // 4 bits per word On 2014/07/23 22:42:03, khr wrote: > The units of this conversion seem strange to me. I want size * BitsPerPointer / > 8, as size is already # of pointers. > > Or size / PointersPerByte, maybe. You are right. It actually must be gcBits (4 -- 4 bits per word in GC bitmap) instead of PointersPerByte (which happened to be 4 as well). https://codereview.appspot.com/106260045/diff/980001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1281: // We could use a more elaborate condition, but this seems to work good in practice. On 2014/07/23 22:42:03, khr wrote: > s/good/well/ Done. https://codereview.appspot.com/106260045/diff/980001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1426: On 2014/07/23 22:42:03, khr wrote: > Is the format of these GC programs described anywhere in a comment? I didn't > see it. Added description to mgc0.h, where instruction constants are defined https://codereview.appspot.com/106260045/diff/980001/src/cmd/gc/reflect.c#new... src/cmd/gc/reflect.c:1440: size++; // unroll flag in the beginning On 2014/07/23 22:42:03, khr wrote: > Where is the unroll flag set? > > Update: I found it, but this really needs a comment. add "used by runtime (see runtime.markallocated)" https://codereview.appspot.com/106260045/diff/980001/src/pkg/reflect/type.go File src/pkg/reflect/type.go (left): https://codereview.appspot.com/106260045/diff/980001/src/pkg/reflect/type.go#... src/pkg/reflect/type.go:1099: if t.kind&kindNoPointers != 0 { On 2014/07/23 22:42:03, khr wrote: > Why is there no GC program? Or is the non-existence of a program imply > conservative scanning? With the type-punned GC info (it does not encode type of the pointee), prototype GC info works just fine. So we don't need to generate a new type info. E.g. if you a Hmap type info, it now works for all maps regardless of types. The same for a single pointer or chan. We only need to generate type info when we create completely new object layout (e.g. map bucket). https://codereview.appspot.com/106260045/diff/980001/src/pkg/reflect/type.go#... src/pkg/reflect/type.go:1485: ch.gc = unsafe.Pointer(&chanGC{ On 2014/07/23 22:42:03, khr wrote: > Channel GC program? the same here https://codereview.appspot.com/106260045/diff/980001/src/pkg/reflect/type.go#... src/pkg/reflect/type.go:1540: mt.hmap = hMapOf(mt.bucket) On 2014/07/23 22:42:03, khr wrote: > where did this go? Is there just one of these for all maps? the same here https://codereview.appspot.com/106260045/diff/980001/src/pkg/reflect/type.go#... src/pkg/reflect/type.go:1544: mt.gc = unsafe.Pointer(&ptrGC{ On 2014/07/23 22:42:03, khr wrote: > gc for this? the same here https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (right): https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.c#n... src/pkg/runtime/mgc0.c:306: // Check is we have reached end of heap. On 2014/07/23 22:42:03, khr wrote: > s/heap/span/ Done. https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.c#n... src/pkg/runtime/mgc0.c:324: bits = (bits>>2)&BitsMask; On 2014/07/23 22:42:04, khr wrote: > BitsMask vs. bitMask? Very confusing. agree :) here we extract type info, so it's BitsMask https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.c#n... src/pkg/runtime/mgc0.c:953: // Garbage! On 2014/07/23 22:42:03, khr wrote: > What? Done. https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.c#n... src/pkg/runtime/mgc0.c:1362: flushallmcaches(); On 2014/07/23 22:42:03, khr wrote: > If we have to do this, why only in Debug mode? mmm... the comment above talks about it https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.c#n... src/pkg/runtime/mgc0.c:1760: prog1 = unrollgcprog1(mask, prog, &pos, inplace, sparse); On 2014/07/23 22:42:03, khr wrote: > This is potentially too recursive to run on the M stack. With Russ' fix for program generation (already applied), each insArray encodes array with at least 2 elements and each element is at least 1 word. So max recursion is 34. https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.c#n... src/pkg/runtime/mgc0.c:1815: // Mark word after last as bitAllocated. On 2014/07/23 22:42:03, khr wrote: > s/bitAllocated/bitDead/? > > Actually use bitDead below. Done. https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.c#n... src/pkg/runtime/mgc0.c:1875: return; On 2014/07/23 22:42:04, khr wrote: > You don't add bitsDead here. I presume you're assuming bitsDead is 0. You can > just add it explicitly, e.g. ((bitAllocated+(bitsDead<<2)) << shift). Done. https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.h File src/pkg/runtime/mgc0.h (right): https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.h#n... src/pkg/runtime/mgc0.h:14: insData = 1, On 2014/07/23 22:42:04, khr wrote: > GC program description goes here? Done. https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.h#n... src/pkg/runtime/mgc0.h:20: BitsPerPointer = 2, On 2014/07/23 22:42:04, khr wrote: > We need some names for these different types of bitmaps (2 bit/ptr, 4 bit/ptr, > maybe soon 1 bit/ptr). BitsPerPointer and friends are no longer clear about > which bitmap type they are talking about. Yes. We need some consistent scheme and naming. Some of them are capitalized and some are not, because that's what we have now (BitsPerPointer vs. bitMask). Do you think we need to do it in this CL? I am just thinking that if we tackle interfaces, then we can go to 1 bit per word. And that will remove whole bunch of constants. So it will be simpler to come up with consistent scheme. https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.h#n... src/pkg/runtime/mgc0.h:35: MaxGCMask = 0, // disabled because wastes several bytes of memory On 2014/07/23 22:42:04, khr wrote: > Why are some of these constants capitalized and some aren't? Same for the bit* > constants below. see above https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/type.h File src/pkg/runtime/type.h (right): https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/type.h#n... src/pkg/runtime/type.h:30: uintptr gc[2]; On 2014/07/23 22:42:04, khr wrote: > This really needs a long comment describing what gc[0] and gc[1] are. Done.
Sign in to reply to this message.
On 2014/07/24 07:57:27, dvyukov wrote: > On 2014/07/23 22:42:04, khr wrote: > > I did a first pass. > > > > I'm really worried that this is too much to go in all at once. I have a hard > > time keeping it all in my brain. Can we split it up somehow? It would be > more > > work, but probably safer if we can split it into individually reviewable and > > testable pieces. > > I am open to suggestions, but it's all very tightly coupled. > > > Some ideas for separable CLs: > > 1) Split channels into header and array > > I can do this if you are find with temporally leaving channels w/o type info > (conservative scan). I've asked Russ the same question yesterday in the chan.h > comment. That's fine with me as a temporary step. > > 2) Use just GC programs and then introduce bit vectors in > > a separate CL. Or the other way around. > > That's how I first mailed it. Patchset 39 contains only masks. But Russ insisted > on adding programs in this CL. I don't want an intermediate state that has bit vectors without any kind of gc program. But you could easily split the compiler/linker and runtime changes (I think maybe this is what Keith was getting at) by doing one step that changes the gc bitmap but keeps using the old gc programs, and then a second CL that introduces the new programs. Russ
Sign in to reply to this message.
I am going to be away from later today until August 4. The compiler changes look good to me now. The runtime changes are overwhelming. I hope you can work with Keith to break the CL up into a few steps. I realize this is slowing you down, but that's what code reviews are for: to make sure we do things deliberately and in a way that multiple people understand the code. It's fine to submit pieces while I'm gone as long as Keith is happy.
Sign in to reply to this message.
ack, thanks On Thu, Jul 24, 2014 at 5:57 PM, Russ Cox <rsc@golang.org> wrote: > I am going to be away from later today until August 4. The compiler changes > look good to me now. The runtime changes are overwhelming. I hope you can > work with Keith to break the CL up into a few steps. I realize this is > slowing you down, but that's what code reviews are for: to make sure we do > things deliberately and in a way that multiple people understand the code. > It's fine to submit pieces while I'm gone as long as Keith is happy. >
Sign in to reply to this message.
On 2014/07/24 07:57:27, dvyukov wrote: > On 2014/07/23 22:42:04, khr wrote: > > I did a first pass. > > > > I'm really worried that this is too much to go in all at once. I have a hard > > time keeping it all in my brain. Can we split it up somehow? It would be > more > > work, but probably safer if we can split it into individually reviewable and > > testable pieces. > > I am open to suggestions, but it's all very tightly coupled. > > > Some ideas for separable CLs: > > 1) Split channels into header and array > > I can do this if you are find with temporally leaving channels w/o type info > (conservative scan). I've asked Russ the same question yesterday in the chan.h > comment. I've removed channels from this change. Frankly I don't see what else can be factored out. Doing something like new type info in runtime, but old programs in compiler will require lots of throw-away code and significant amount of debugging to get it correct which is also wasted effort. Russ reviewed the compiler changes. Most changed files in runtime have few trivial changes. THere is really mostly mgc0, malloc.goc and heapdump.c.
Sign in to reply to this message.
On 2014/07/25 15:38:52, dvyukov wrote: > On 2014/07/24 07:57:27, dvyukov wrote: > > On 2014/07/23 22:42:04, khr wrote: > > > I did a first pass. > > > > > > I'm really worried that this is too much to go in all at once. I have a > hard > > > time keeping it all in my brain. Can we split it up somehow? It would be > > more > > > work, but probably safer if we can split it into individually reviewable and > > > testable pieces. > > > > I am open to suggestions, but it's all very tightly coupled. > > > > > Some ideas for separable CLs: > > > 1) Split channels into header and array > > > > I can do this if you are find with temporally leaving channels w/o type info > > (conservative scan). I've asked Russ the same question yesterday in the chan.h > > comment. > > > I've removed channels from this change. > Frankly I don't see what else can be factored out. > Doing something like new type info in runtime, but old programs in compiler will > require lots of throw-away code and significant amount of debugging to get it > correct which is also wasted effort. > Russ reviewed the compiler changes. Most changed files in runtime have few > trivial changes. THere is really mostly mgc0, malloc.goc and heapdump.c. Keith, when will you be able to take a look?
Sign in to reply to this message.
Later today. Is there a separate CL for the channel change? On Mon, Jul 28, 2014 at 6:51 AM, <dvyukov@google.com> wrote: > On 2014/07/25 15:38:52, dvyukov wrote: > >> On 2014/07/24 07:57:27, dvyukov wrote: >> > On 2014/07/23 22:42:04, khr wrote: >> > > I did a first pass. >> > > >> > > I'm really worried that this is too much to go in all at once. I >> > have a > >> hard >> > > time keeping it all in my brain. Can we split it up somehow? It >> > would be > >> > more >> > > work, but probably safer if we can split it into individually >> > reviewable and > >> > > testable pieces. >> > >> > I am open to suggestions, but it's all very tightly coupled. >> > >> > > Some ideas for separable CLs: >> > > 1) Split channels into header and array >> > >> > I can do this if you are find with temporally leaving channels w/o >> > type info > >> > (conservative scan). I've asked Russ the same question yesterday in >> > the chan.h > >> > comment. >> > > > I've removed channels from this change. >> Frankly I don't see what else can be factored out. >> Doing something like new type info in runtime, but old programs in >> > compiler will > >> require lots of throw-away code and significant amount of debugging to >> > get it > >> correct which is also wasted effort. >> Russ reviewed the compiler changes. Most changed files in runtime have >> > few > >> trivial changes. THere is really mostly mgc0, malloc.goc and >> > heapdump.c. > > > Keith, when will you be able to take a look? > > > https://codereview.appspot.com/106260045/ >
Sign in to reply to this message.
https://codereview.appspot.com/115280043 is the chan change On Mon, Jul 28, 2014 at 7:16 PM, Keith Randall <khr@google.com> wrote: > Later today. Is there a separate CL for the channel change? > > > > On Mon, Jul 28, 2014 at 6:51 AM, <dvyukov@google.com> wrote: >> >> On 2014/07/25 15:38:52, dvyukov wrote: >>> >>> On 2014/07/24 07:57:27, dvyukov wrote: >>> > On 2014/07/23 22:42:04, khr wrote: >>> > > I did a first pass. >>> > > >>> > > I'm really worried that this is too much to go in all at once. I >> >> have a >>> >>> hard >>> > > time keeping it all in my brain. Can we split it up somehow? It >> >> would be >>> >>> > more >>> > > work, but probably safer if we can split it into individually >> >> reviewable and >>> >>> > > testable pieces. >>> > >>> > I am open to suggestions, but it's all very tightly coupled. >>> > >>> > > Some ideas for separable CLs: >>> > > 1) Split channels into header and array >>> > >>> > I can do this if you are find with temporally leaving channels w/o >> >> type info >>> >>> > (conservative scan). I've asked Russ the same question yesterday in >> >> the chan.h >>> >>> > comment. >> >> >> >>> I've removed channels from this change. >>> Frankly I don't see what else can be factored out. >>> Doing something like new type info in runtime, but old programs in >> >> compiler will >>> >>> require lots of throw-away code and significant amount of debugging to >> >> get it >>> >>> correct which is also wasted effort. >>> Russ reviewed the compiler changes. Most changed files in runtime have >> >> few >>> >>> trivial changes. THere is really mostly mgc0, malloc.goc and >> >> heapdump.c. >> >> >> Keith, when will you be able to take a look? >> >> >> https://codereview.appspot.com/106260045/ > >
Sign in to reply to this message.
On Thu, Jul 24, 2014 at 2:35 AM, <dvyukov@google.com> wrote: > PTAL > > > > https://codereview.appspot.com/106260045/diff/980001/src/cmd/gc/reflect.c > File src/cmd/gc/reflect.c (right): > > https://codereview.appspot.com/106260045/diff/980001/src/ > cmd/gc/reflect.c#newcode1279 > src/cmd/gc/reflect.c:1279: size = size*PointersPerByte/8; // 4 bits > per > word > On 2014/07/23 22:42:03, khr wrote: > >> The units of this conversion seem strange to me. I want size * >> > BitsPerPointer / > >> 8, as size is already # of pointers. >> > > Or size / PointersPerByte, maybe. >> > > You are right. It actually must be gcBits (4 -- 4 bits per word in GC > bitmap) instead of PointersPerByte (which happened to be 4 as well). > > > https://codereview.appspot.com/106260045/diff/980001/src/ > cmd/gc/reflect.c#newcode1281 > src/cmd/gc/reflect.c:1281: // We could use a more elaborate condition, > but this seems to work good in practice. > On 2014/07/23 22:42:03, khr wrote: > >> s/good/well/ >> > > Done. > > https://codereview.appspot.com/106260045/diff/980001/src/ > cmd/gc/reflect.c#newcode1426 > src/cmd/gc/reflect.c:1426: > > On 2014/07/23 22:42:03, khr wrote: > >> Is the format of these GC programs described anywhere in a comment? I >> > didn't > >> see it. >> > > Added description to mgc0.h, where instruction constants are defined > > > https://codereview.appspot.com/106260045/diff/980001/src/ > cmd/gc/reflect.c#newcode1440 > src/cmd/gc/reflect.c:1440: size++; // unroll flag in the beginning > On 2014/07/23 22:42:03, khr wrote: > >> Where is the unroll flag set? >> > > Update: I found it, but this really needs a comment. >> > > add "used by runtime (see runtime.markallocated)" > > > https://codereview.appspot.com/106260045/diff/980001/src/ > pkg/reflect/type.go > File src/pkg/reflect/type.go (left): > > https://codereview.appspot.com/106260045/diff/980001/src/ > pkg/reflect/type.go#oldcode1099 > src/pkg/reflect/type.go:1099: if t.kind&kindNoPointers != 0 { > On 2014/07/23 22:42:03, khr wrote: > >> Why is there no GC program? Or is the non-existence of a program >> > imply > >> conservative scanning? >> > > > With the type-punned GC info (it does not encode type of the pointee), > prototype GC info works just fine. So we don't need to generate a new > type info. E.g. if you a Hmap type info, it now works for all maps > regardless of types. The same for a single pointer or chan. > We only need to generate type info when we create completely new object > layout (e.g. map bucket). > > Oh, I see, the gc info is copied from the prototype map/channel/whatever and never overwritten. > > https://codereview.appspot.com/106260045/diff/980001/src/ > pkg/reflect/type.go#oldcode1485 > src/pkg/reflect/type.go:1485: ch.gc = unsafe.Pointer(&chanGC{ > On 2014/07/23 22:42:03, khr wrote: > >> Channel GC program? >> > > the same here > > > https://codereview.appspot.com/106260045/diff/980001/src/ > pkg/reflect/type.go#oldcode1540 > src/pkg/reflect/type.go:1540: mt.hmap = hMapOf(mt.bucket) > On 2014/07/23 22:42:03, khr wrote: > >> where did this go? Is there just one of these for all maps? >> > > the same here > > > https://codereview.appspot.com/106260045/diff/980001/src/ > pkg/reflect/type.go#oldcode1544 > src/pkg/reflect/type.go:1544: mt.gc = unsafe.Pointer(&ptrGC{ > On 2014/07/23 22:42:03, khr wrote: > >> gc for this? >> > > the same here > > > https://codereview.appspot.com/106260045/diff/980001/src/ > pkg/runtime/mgc0.c > File src/pkg/runtime/mgc0.c (right): > > https://codereview.appspot.com/106260045/diff/980001/src/ > pkg/runtime/mgc0.c#newcode306 > src/pkg/runtime/mgc0.c:306: // Check is we have reached end of heap. > On 2014/07/23 22:42:03, khr wrote: > >> s/heap/span/ >> > > Done. > > > https://codereview.appspot.com/106260045/diff/980001/src/ > pkg/runtime/mgc0.c#newcode324 > src/pkg/runtime/mgc0.c:324: bits = (bits>>2)&BitsMask; > On 2014/07/23 22:42:04, khr wrote: > >> BitsMask vs. bitMask? Very confusing. >> > > agree :) > here we extract type info, so it's BitsMask > > > https://codereview.appspot.com/106260045/diff/980001/src/ > pkg/runtime/mgc0.c#newcode953 > src/pkg/runtime/mgc0.c:953: // Garbage! > On 2014/07/23 22:42:03, khr wrote: > >> What? >> > > Done. > > > https://codereview.appspot.com/106260045/diff/980001/src/ > pkg/runtime/mgc0.c#newcode1362 > src/pkg/runtime/mgc0.c:1362: flushallmcaches(); > On 2014/07/23 22:42:03, khr wrote: > >> If we have to do this, why only in Debug mode? >> > > mmm... the comment above talks about it > > I see, the check in scanblock that would fail is also guarded by Debug. I think we should do this flush unconditionally. Otherwise it means we're scanning freed memory. Maybe the check doesn't fail without Debug on, but scanning freed memory can lead to leaks. > > https://codereview.appspot.com/106260045/diff/980001/src/ > pkg/runtime/mgc0.c#newcode1760 > src/pkg/runtime/mgc0.c:1760: prog1 = unrollgcprog1(mask, prog, &pos, > inplace, sparse); > On 2014/07/23 22:42:03, khr wrote: > >> This is potentially too recursive to run on the M stack. >> > > With Russ' fix for program generation (already applied), each insArray > encodes array with at least 2 elements and each element is at least 1 > word. So max recursion is 34. > > Ok. > > https://codereview.appspot.com/106260045/diff/980001/src/ > pkg/runtime/mgc0.c#newcode1815 > src/pkg/runtime/mgc0.c:1815: // Mark word after last as bitAllocated. > On 2014/07/23 22:42:03, khr wrote: > >> s/bitAllocated/bitDead/? >> > > Actually use bitDead below. >> > > Done. > > > https://codereview.appspot.com/106260045/diff/980001/src/ > pkg/runtime/mgc0.c#newcode1875 > src/pkg/runtime/mgc0.c:1875: return; > On 2014/07/23 22:42:04, khr wrote: > >> You don't add bitsDead here. I presume you're assuming bitsDead is 0. >> > You can > >> just add it explicitly, e.g. ((bitAllocated+(bitsDead<<2)) << shift). >> > > Done. > > > https://codereview.appspot.com/106260045/diff/980001/src/ > pkg/runtime/mgc0.h > File src/pkg/runtime/mgc0.h (right): > > https://codereview.appspot.com/106260045/diff/980001/src/ > pkg/runtime/mgc0.h#newcode14 > src/pkg/runtime/mgc0.h:14: insData = 1, > On 2014/07/23 22:42:04, khr wrote: > >> GC program description goes here? >> > > Done. > > > https://codereview.appspot.com/106260045/diff/980001/src/ > pkg/runtime/mgc0.h#newcode20 > src/pkg/runtime/mgc0.h:20: BitsPerPointer = 2, > On 2014/07/23 22:42:04, khr wrote: > >> We need some names for these different types of bitmaps (2 bit/ptr, 4 >> > bit/ptr, > >> maybe soon 1 bit/ptr). BitsPerPointer and friends are no longer clear >> > about > >> which bitmap type they are talking about. >> > > Yes. > We need some consistent scheme and naming. > Some of them are capitalized and some are not, because that's what we > have now (BitsPerPointer vs. bitMask). > > Do you think we need to do it in this CL? > I am just thinking that if we tackle interfaces, then we can go to 1 bit > per word. And that will remove whole bunch of constants. So it will be > simpler to come up with consistent scheme. > > Yes, this can wait. > > https://codereview.appspot.com/106260045/diff/980001/src/ > pkg/runtime/mgc0.h#newcode35 > src/pkg/runtime/mgc0.h:35: MaxGCMask = 0, // disabled because wastes > several bytes of memory > On 2014/07/23 22:42:04, khr wrote: > >> Why are some of these constants capitalized and some aren't? Same for >> > the bit* > >> constants below. >> > > see above > > > https://codereview.appspot.com/106260045/diff/980001/src/ > pkg/runtime/type.h > File src/pkg/runtime/type.h (right): > > https://codereview.appspot.com/106260045/diff/980001/src/ > pkg/runtime/type.h#newcode30 > src/pkg/runtime/type.h:30: uintptr gc[2]; > On 2014/07/23 22:42:04, khr wrote: > >> This really needs a long comment describing what gc[0] and gc[1] are. >> > > Done. > > https://codereview.appspot.com/106260045/ >
Sign in to reply to this message.
LGTM with those fixes. https://codereview.appspot.com/106260045/diff/1020001/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (left): https://codereview.appspot.com/106260045/diff/1020001/src/pkg/runtime/mgc0.c#... src/pkg/runtime/mgc0.c:1273: enqueue1(&wbuf, (Obj){(void*)&spf->fint, PtrSize, 0}); What about these other two fields?
Sign in to reply to this message.
On Tue, Jul 29, 2014 at 3:28 AM, Keith Randall <khr@google.com> wrote: https://codereview.appspot.com/106260045/diff/980001/src/pkg/runtime/mgc0.c#n... >> src/pkg/runtime/mgc0.c:1362: flushallmcaches(); >> On 2014/07/23 22:42:03, khr wrote: >>> >>> If we have to do this, why only in Debug mode? >> >> >> mmm... the comment above talks about it >> > > I see, the check in scanblock that would fail is also guarded by Debug. > I think we should do this flush unconditionally. Otherwise it means we're > scanning freed memory. Maybe the check doesn't fail without Debug on, but > scanning freed memory can lead to leaks. The real problem here is not flushing, but the dangling pointers. If they can point to freed blocks, they can as well point to live blocks. So flushing will not actually fix leaks. The bad pointers must be fixed to fix leaks. Anyway, I am going to get rid of free and this debug flush right when this CL goes in.
Sign in to reply to this message.
https://codereview.appspot.com/106260045/diff/1020001/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (left): https://codereview.appspot.com/106260045/diff/1020001/src/pkg/runtime/mgc0.c#... src/pkg/runtime/mgc0.c:1273: enqueue1(&wbuf, (Obj){(void*)&spf->fint, PtrSize, 0}); On 2014/07/28 23:28:55, khr wrote: > What about these other two fields? Both these fields point to Types. Types are retained by other means (compiler-generated types are persistent and reflect types are stored in a global map). So this is unnecessary. We do not queue types when scan eface as well (which would be much more serious issue, if types can get freed).
Sign in to reply to this message.
*** Submitted as https://code.google.com/p/go/source/detail?r=e1fc05ce4181 *** runtime: simpler and faster GC Implement the design described in: https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_uNQw3DfSY-ztoA-4wWbK... Summary of the changes: GC uses "2-bits per word" pointer type info embed directly into bitmap. Scanning of stacks/data/heap is unified. The old spans types go away. Compiler generates "sparse" 4-bits type info for GC (directly for GC bitmap). Linker generates "dense" 2-bits type info for data/bss (the same as stacks use). Summary of results: -1680 lines of code total (-1000+ in mgc0.c only) -25% memory consumption -3-7% binary size -15% GC pause reduction -7% run time reduction LGTM=khr R=golang-codereviews, rsc, christoph, khr CC=golang-codereviews, rlh https://codereview.appspot.com/106260045
Sign in to reply to this message.
Message was sent while issue was closed.
On 2014/07/29 07:01:14, dvyukov wrote: > *** Submitted as https://code.google.com/p/go/source/detail?r=e1fc05ce4181 *** > > runtime: simpler and faster GC > > Implement the design described in: > https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_uNQw3DfSY-ztoA-4wWbK... > > Summary of the changes: > GC uses "2-bits per word" pointer type info embed directly into bitmap. > Scanning of stacks/data/heap is unified. > The old spans types go away. > Compiler generates "sparse" 4-bits type info for GC (directly for GC bitmap). > Linker generates "dense" 2-bits type info for data/bss (the same as stacks use). > > Summary of results: > -1680 lines of code total (-1000+ in mgc0.c only) > -25% memory consumption > -3-7% binary size > -15% GC pause reduction > -7% run time reduction > > LGTM=khr > R=golang-codereviews, rsc, christoph, khr > CC=golang-codereviews, rlh > https://codereview.appspot.com/106260045 fingers crossed
Sign in to reply to this message.
Message was sent while issue was closed.
On 2014/07/29 07:01:14, dvyukov wrote: > *** Submitted as https://code.google.com/p/go/source/detail?r=e1fc05ce4181 *** > > runtime: simpler and faster GC > > Implement the design described in: > https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_uNQw3DfSY-ztoA-4wWbK... > > Summary of the changes: > GC uses "2-bits per word" pointer type info embed directly into bitmap. > Scanning of stacks/data/heap is unified. > The old spans types go away. > Compiler generates "sparse" 4-bits type info for GC (directly for GC bitmap). > Linker generates "dense" 2-bits type info for data/bss (the same as stacks use). > > Summary of results: > -1680 lines of code total (-1000+ in mgc0.c only) > -25% memory consumption > -3-7% binary size > -15% GC pause reduction > -7% run time reduction > > LGTM=khr > R=golang-codereviews, rsc, christoph, khr > CC=golang-codereviews, rlh > https://codereview.appspot.com/106260045 fingers crossed
Sign in to reply to this message.
Message was sent while issue was closed.
This CL appears to have broken the android-arm-crawshaw builder. See http://build.golang.org/log/840ff0ebd0a41f5acfc8505cf8032495b3dcdc8b
Sign in to reply to this message.
not looking good so far ... On Tue, Jul 29, 2014 at 5:01 PM, dvyukov via golang-codereviews < golang-codereviews@googlegroups.com> wrote: > On 2014/07/29 07:01:14, dvyukov wrote: > >> *** Submitted as >> > https://code.google.com/p/go/source/detail?r=e1fc05ce4181 *** > > runtime: simpler and faster GC >> > > Implement the design described in: >> > > https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_ > uNQw3DfSY-ztoA-4wWbKcg/pub > > Summary of the changes: >> GC uses "2-bits per word" pointer type info embed directly into >> > bitmap. > >> Scanning of stacks/data/heap is unified. >> The old spans types go away. >> Compiler generates "sparse" 4-bits type info for GC (directly for GC >> > bitmap). > >> Linker generates "dense" 2-bits type info for data/bss (the same as >> > stacks use). > > Summary of results: >> -1680 lines of code total (-1000+ in mgc0.c only) >> -25% memory consumption >> -3-7% binary size >> -15% GC pause reduction >> -7% run time reduction >> > > LGTM=khr >> R=golang-codereviews, rsc, christoph, khr >> CC=golang-codereviews, rlh >> https://codereview.appspot.com/106260045 >> > > fingers crossed > > > https://codereview.appspot.com/106260045/ > > -- > You received this message because you are subscribed to the Google Groups > "golang-codereviews" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-codereviews+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. >
Sign in to reply to this message.
|