|
|
Created:
13 years, 6 months ago by rmmh Modified:
12 years, 9 months ago Reviewers:
CC:
dave_cheney.net, minux1, rsc, golang-dev Visibility:
Public. |
Descriptioncompress/flate: shrink decompressor struct for better performance
structs containining pointers get treated as sizeof(struct)/sizeof(void*)
potential pointers by the GC. Replacing embedded arrays with pointers to arrays
shrinks the struct, reducing GC load.
Fixes issue 2703.
Patch Set 1 #Patch Set 2 : diff -r f131dc18714a https://go.googlecode.com/hg/ #Patch Set 3 : diff -r f131dc18714a https://go.googlecode.com/hg/ #
Total comments: 2
Patch Set 4 : diff -r b685816af832 https://go.googlecode.com/hg/ #Patch Set 5 : diff -r b685816af832 https://go.googlecode.com/hg/ #Patch Set 6 : diff -r b685816af832 https://go.googlecode.com/hg/ #MessagesTotal messages: 31
Hello golang-dev@googlegroups.com (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/
Sign in to reply to this message.
wow.
Sign in to reply to this message.
http://codereview.appspot.com/5536078/diff/3001/src/pkg/compress/flate/inflat... File src/pkg/compress/flate/inflate.go (right): http://codereview.appspot.com/5536078/diff/3001/src/pkg/compress/flate/inflat... src/pkg/compress/flate/inflate.go:213: bits []int // size: maxLit + maxDist Has this change been tested on i386?
Sign in to reply to this message.
http://codereview.appspot.com/5536078/diff/3001/src/pkg/compress/flate/inflat... File src/pkg/compress/flate/inflate.go (right): http://codereview.appspot.com/5536078/diff/3001/src/pkg/compress/flate/inflat... src/pkg/compress/flate/inflate.go:217: hist []byte // size: maxHist + maxDist size: maxHist
Sign in to reply to this message.
Hello, I just tried this CL on darwin/386 using the OP's original test case without success.
Sign in to reply to this message.
On 2012/01/20 10:51:59, dfc wrote: > Hello, > > I just tried this CL on darwin/386 using the OP's original test case without > success. I believe issue 2703 cannot be fixed without reprogramming the garbage collector a bit. The first step would be to annotate memory allocations with type information.
Sign in to reply to this message.
On Fri, Jan 20, 2012 at 06:07, <0xE2.0x9A.0x9B@gmail.com> wrote: > I believe issue 2703 cannot be fixed without reprogramming the garbage > collector a bit. The first step would be to annotate memory allocations > with type information. The garbage collector does skip over allocated blocks that do not contain pointers. Thus I would expect this change to have the desired effect. Am I missing something? Russ
Sign in to reply to this message.
I think this change looks good, but could you please add a test that both confirms the fix and makes sure we don't break it again in the future? Thanks. Russ
Sign in to reply to this message.
Woops, this doesn't work completely... It just slows the leak by about 15%. (I wasn't testing with large enough zip files) Gopprof indicates that bytes.(*Buffer).ReadFrom (bytes/buffer.go:136) is the source of the memory that's eventually leaked. I'm not sure where the other false pointers are coming from. The proper fix to this is making Go's GC precise. I know it's going to get replaced eventually, but it might not be a difficult change. Don't objects already carry type information for reflection? Could we put is-a-pointer member information into that?
Sign in to reply to this message.
It is too close to Go 1 to be making fundamental changes to core pieces of the runtime. What is calling ReadFrom? That reads until EOF, so it is easy to believe it would be an unbounded amount of data. It might not be a GC bug, just buggy Go code. Russ
Sign in to reply to this message.
On 2012/01/20 19:04:56, rsc wrote: > It is too close to Go 1 to be making fundamental changes > to core pieces of the runtime. > > What is calling ReadFrom? That reads until EOF, so it > is easy to believe it would be an unbounded amount of > data. It might not be a GC bug, just buggy Go code. > > Russ Locally, I implemented an experimental version of a GC that can (in many cases, not in all cases) track types of structs and arrays allocated on the heap. Conclusions: 1. The modified GC enables the ZIP decompression program (issue 2703) to run "infinitely" on i386. However, the RSS at which memory consumption stabilizes varies from run to run (such as: 180MB, 100MB, ...). The program may still run out of memory - but with a lower probability than in case of the unmodified GC. 2. In the modified GC (as well as in the unmodified GC), large memory allocations (such as: make([]byte, 16*1e6)) have a very high probability of being referred to from integer values in the program's .data and .bss sections. This is caused by http://code.google.com/p/go/source/browse/src/pkg/runtime/mgc0.c?name=weekly....
Sign in to reply to this message.
On Sun, Jan 22, 2012 at 6:50 PM, <0xE2.0x9A.0x9B@gmail.com> wrote: > On 2012/01/20 19:04:56, rsc wrote: > >> It is too close to Go 1 to be making fundamental changes >> to core pieces of the runtime. >> > > What is calling ReadFrom? That reads until EOF, so it >> is easy to believe it would be an unbounded amount of >> data. It might not be a GC bug, just buggy Go code. >> > > Russ >> > > Locally, I implemented an experimental version of a GC that can (in many > cases, not in all cases) track types of structs and arrays allocated on > the heap. > > Conclusions: > > 1. The modified GC enables the ZIP decompression program (issue 2703) to > run "infinitely" on i386. However, the RSS at which memory consumption > stabilizes varies from run to run (such as: 180MB, 100MB, ...). The > program may still run out of memory - but with a lower probability than > in case of the unmodified GC. > > 2. In the modified GC (as well as in the unmodified GC), large memory > allocations (such as: make([]byte, 16*1e6)) have a very high probability > of being referred to from integer values in the program's .data and .bss > sections. This is caused by > http://code.google.com/p/go/**source/browse/src/pkg/runtime/** > mgc0.c?name=weekly.2012-01-20#**635<http://code.google.com/p/go/source/browse/src/pkg/runtime/mgc0.c?name=weekly.2012-01-20#635> > > Ian Taylor proposed to split .data into 2 parts: one that has no pointers (like [1024]float64) and one that can potentially contain pointers. It should improve situation: http://code.google.com/p/go/issues/detail?id=2297&q=386%20gc&colspec=ID%20Sta... How does your patch affects performance? It should not considerably slowdown 64bit.
Sign in to reply to this message.
On 2012/01/23 06:40:19, dvyukov wrote: > Ian Taylor proposed to split .data into 2 parts: one that has no pointers > (like [1024]float64) and one that can potentially contain pointers. It > should improve situation: > http://code.google.com/p/go/issues/detail?id=2297 That is an interesting idea. But it cannot handle structs containing pointers and integer values. In my opinion, the right solution would be for the linker to emit an array of (offset,type) and put the array in .rodata section. > How does your patch affect performance? > It should not considerably slowdown 64bit. I didn't optimize it yet. It has O(N^2) time complexity if the type is a struct (N is the number of fields in the struct). Once optimized, performance should be comparable to the current implementation. A major problem is how would I publish the patch for inclusion in Go.
Sign in to reply to this message.
On Mon, Jan 23, 2012 at 01:40, Dmitry Vyukov <dvyukov@google.com> wrote: > Ian Taylor proposed to split .data into 2 parts: one that has no pointers > (like [1024]float64) and one that can potentially contain pointers. That only helps with static data, not allocated data.
Sign in to reply to this message.
On Mon, Jan 23, 2012 at 3:50 AM, <0xE2.0x9A.0x9B@gmail.com> wrote: > > A major problem is how would I publish the patch for inclusion in Go. > http://golang.org/doc/contribute.html Or is the problem more from your employer's reservations?
Sign in to reply to this message.
On 2012/01/23 19:12:16, bradfitz wrote: > On Mon, Jan 23, 2012 at 3:50 AM, <mailto:0xE2.0x9A.0x9B@gmail.com> wrote: > > > > > A major problem is how would I publish the patch for inclusion in Go. > > > > http://golang.org/doc/contribute.html I am unable to publish my name (psychological issues) for inclusion in the CONTRIBUTORS file. > Or is the problem more from your employer's reservations? Well, if I was employed I would be having a smaller number of psychological issues. I believe my first jobs were a lie. Working for Google (country=Slovakia) to improve golang would be nice, but I believe/suspect it is impossible.
Sign in to reply to this message.
Hello, As you are now a contributor, would you be interested reviving this CL ? Cheers Dave
Sign in to reply to this message.
Hello, I just benchmarked this CL against tip (b855390a295f) and it provides substantial improvement. Do you plan to submit this CL ? Cheers Dave pando(~/go/src/pkg/compress/flate) % ~/go/misc/benchcmp {old,new}.txt benchmark old ns/op new ns/op delta BenchmarkDecodeDigitsSpeed1e4 3384888 2974609 -12.12% BenchmarkDecodeDigitsSpeed1e5 26885680 23783570 -11.54% BenchmarkDecodeDigitsSpeed1e6 262377800 231668100 -11.70% BenchmarkDecodeDigitsDefault1e4 3263000 2858429 -12.40% BenchmarkDecodeDigitsDefault1e5 24915460 22427060 -9.99% BenchmarkDecodeDigitsDefault1e6 231527700 207492000 -10.38% BenchmarkDecodeDigitsCompress1e4 3126282 2861908 -8.46% BenchmarkDecodeDigitsCompress1e5 24910590 22414550 -10.02% BenchmarkDecodeDigitsCompress1e6 231784100 207601900 -10.43% BenchmarkDecodeTwainSpeed1e4 3254090 3028992 -6.92% BenchmarkDecodeTwainSpeed1e5 25270080 22948920 -9.19% BenchmarkDecodeTwainSpeed1e6 241912900 219360300 -9.32% BenchmarkDecodeTwainDefault1e4 2988098 2765716 -7.44% BenchmarkDecodeTwainDefault1e5 20900880 18910220 -9.52% BenchmarkDecodeTwainDefault1e6 194210800 177212500 -8.75% BenchmarkDecodeTwainCompress1e4 2987182 2671386 -10.57% BenchmarkDecodeTwainCompress1e5 20824580 18831790 -9.57% BenchmarkDecodeTwainCompress1e6 193365500 175839200 -9.06% BenchmarkEncodeDigitsSpeed1e4 12754820 11400760 -10.62% BenchmarkEncodeDigitsSpeed1e5 54029540 52534800 -2.77% BenchmarkEncodeDigitsSpeed1e6 475781400 471545400 -0.89% BenchmarkEncodeDigitsDefault1e4 14909680 13540340 -9.18% BenchmarkEncodeDigitsDefault1e5 195452900 192239400 -1.64% BenchmarkEncodeDigitsDefault1e6 2130768000 2127930000 -0.13% BenchmarkEncodeDigitsCompress1e4 14862670 13577580 -8.65% BenchmarkEncodeDigitsCompress1e5 195816100 192391900 -1.75% BenchmarkEncodeDigitsCompress1e6 2130616000 2127899000 -0.13% BenchmarkEncodeTwainSpeed1e4 12985530 11785880 -9.24% BenchmarkEncodeTwainSpeed1e5 47603160 46506960 -2.30% BenchmarkEncodeTwainSpeed1e6 397772200 392651400 -1.29% BenchmarkEncodeTwainDefault1e4 15671080 14472960 -7.65% BenchmarkEncodeTwainDefault1e5 161242700 158978300 -1.40% BenchmarkEncodeTwainDefault1e6 1661224000 1651611000 -0.58% BenchmarkEncodeTwainCompress1e4 15765990 14627990 -7.22% BenchmarkEncodeTwainCompress1e5 204476900 202880800 -0.78% BenchmarkEncodeTwainCompress1e6 2216370000 2210541000 -0.26% benchmark old MB/s new MB/s speedup BenchmarkDecodeDigitsSpeed1e4 2.95 3.36 1.14x BenchmarkDecodeDigitsSpeed1e5 3.72 4.20 1.13x BenchmarkDecodeDigitsSpeed1e6 3.81 4.32 1.13x BenchmarkDecodeDigitsDefault1e4 3.06 3.50 1.14x BenchmarkDecodeDigitsDefault1e5 4.01 4.46 1.11x BenchmarkDecodeDigitsDefault1e6 4.32 4.82 1.12x BenchmarkDecodeDigitsCompress1e4 3.20 3.49 1.09x BenchmarkDecodeDigitsCompress1e5 4.01 4.46 1.11x BenchmarkDecodeDigitsCompress1e6 4.31 4.82 1.12x BenchmarkDecodeTwainSpeed1e4 3.07 3.30 1.07x BenchmarkDecodeTwainSpeed1e5 3.96 4.36 1.10x BenchmarkDecodeTwainSpeed1e6 4.13 4.56 1.10x BenchmarkDecodeTwainDefault1e4 3.35 3.62 1.08x BenchmarkDecodeTwainDefault1e5 4.78 5.29 1.11x BenchmarkDecodeTwainDefault1e6 5.15 5.64 1.10x BenchmarkDecodeTwainCompress1e4 3.35 3.74 1.12x BenchmarkDecodeTwainCompress1e5 4.80 5.31 1.11x BenchmarkDecodeTwainCompress1e6 5.17 5.69 1.10x BenchmarkEncodeDigitsSpeed1e4 0.78 0.88 1.13x BenchmarkEncodeDigitsSpeed1e5 1.85 1.90 1.03x BenchmarkEncodeDigitsSpeed1e6 2.10 2.12 1.01x BenchmarkEncodeDigitsDefault1e4 0.67 0.74 1.10x BenchmarkEncodeDigitsDefault1e5 0.51 0.52 1.02x BenchmarkEncodeDigitsDefault1e6 0.47 0.47 1.00x BenchmarkEncodeDigitsCompress1e4 0.67 0.74 1.10x BenchmarkEncodeDigitsCompress1e5 0.51 0.52 1.02x BenchmarkEncodeDigitsCompress1e6 0.47 0.47 1.00x BenchmarkEncodeTwainSpeed1e4 0.77 0.85 1.10x BenchmarkEncodeTwainSpeed1e5 2.10 2.15 1.02x BenchmarkEncodeTwainSpeed1e6 2.51 2.55 1.02x BenchmarkEncodeTwainDefault1e4 0.64 0.69 1.08x BenchmarkEncodeTwainDefault1e5 0.62 0.63 1.02x BenchmarkEncodeTwainDefault1e6 0.60 0.61 1.02x BenchmarkEncodeTwainCompress1e4 0.63 0.68 1.08x BenchmarkEncodeTwainCompress1e5 0.49 0.49 1.00x BenchmarkEncodeTwainCompress1e6 0.45 0.45 1.00x
Sign in to reply to this message.
I'll submit the CL if desired. It's currently a bandage to hide how broken the GC is on x86, but might be okay with a "TODO: reconsider this when we get a precise GC"
Sign in to reply to this message.
Can you please run hg mail again to bring this CL current with trunk. On Tue, Aug 14, 2012 at 4:55 AM, <hitchmanr@gmail.com> wrote: > I'll submit the CL if desired. It's currently a bandage to hide how > broken the GC is on x86, but might be okay with a "TODO: reconsider this > when we get a precise GC" > > http://codereview.appspot.com/5536078/
Sign in to reply to this message.
Can you please run hg mail again to bring this CL current with trunk.
Sign in to reply to this message.
Hello dave@cheney.net (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Hello dave@cheney.net (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
On 2012/10/25 06:57:36, rmmh wrote: > Hello mailto:dave@cheney.net (cc: mailto:golang-dev@googlegroups.com), > > Please take another look. pando(~/go/src/pkg/compress/flate) % go test -test.bench=. # compress/flate ./inflate.go:12: imported and not used: "fmt" FAIL compress/flate [build failed]
Sign in to reply to this message.
Hello dave@cheney.net (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Sorry about the spam, keep forgetting how change/mail works. I don't know how to do benchmarks with deltas, or else I'd include that in the description.
Sign in to reply to this message.
r14401 seems to have stolen most of the performance improvements, I'm seeing fractions of a percentage difference on x64, and ~1% speedup on ARM. Code to exhibit the pathological zip reading: http://play.golang.org/p/ujK9UdqNZt
Sign in to reply to this message.
On Oct 25, 2012 5:56 PM, <hitchmanr@gmail.com> wrote: > I don't know how to do benchmarks with deltas, or else I'd include that > in the description. save two copies of benchmark output that you want to compare, and feed them to misc/benchcmp script.
Sign in to reply to this message.
LGTM Thanks.
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=d6e11ab78b7d *** compress/flate: shrink decompressor struct for better performance Helps with issue 2703. R=dave, minux.ma, rsc CC=golang-dev http://codereview.appspot.com/5536078 Committer: Russ Cox <rsc@golang.org>
Sign in to reply to this message.
|