|
|
Created:
11 years, 9 months ago by dfc Modified:
11 years, 9 months ago Reviewers:
CC:
rsc, 0xe2.0x9a.0x9b_gmail.com, aam, golang-dev, minux1 Visibility:
Public. |
Descriptionruntime: use uintptr where possible in malloc stats
linux/arm OMAP4 pandaboard
benchmark old ns/op new ns/op delta
BenchmarkBinaryTree17 68723297000 37026214000 -46.12%
BenchmarkFannkuch11 34962402000 35958435000 +2.85%
BenchmarkGobDecode 137298600 124182150 -9.55%
BenchmarkGobEncode 60717160 60006700 -1.17%
BenchmarkGzip 5647156000 5550873000 -1.70%
BenchmarkGunzip 1196350000 1198670000 +0.19%
BenchmarkJSONEncode 863012800 782898000 -9.28%
BenchmarkJSONDecode 3312989000 2781800000 -16.03%
BenchmarkMandelbrot200 45727540 45703120 -0.05%
BenchmarkParse 74781800 59990840 -19.78%
BenchmarkRevcomp 140043650 139462300 -0.42%
BenchmarkTemplate 6467682000 5832153000 -9.83%
benchmark old MB/s new MB/s speedup
BenchmarkGobDecode 5.59 6.18 1.11x
BenchmarkGobEncode 12.64 12.79 1.01x
BenchmarkGzip 3.44 3.50 1.02x
BenchmarkGunzip 16.22 16.19 1.00x
BenchmarkJSONEncode 2.25 2.48 1.10x
BenchmarkJSONDecode 0.59 0.70 1.19x
BenchmarkParse 0.77 0.97 1.26x
BenchmarkRevcomp 18.15 18.23 1.00x
BenchmarkTemplate 0.30 0.33 1.10x
darwin/386 core duo
benchmark old ns/op new ns/op delta
BenchmarkBinaryTree17 10591616577 9678245733 -8.62%
BenchmarkFannkuch11 10758473315 10749303846 -0.09%
BenchmarkGobDecode 34379785 34121250 -0.75%
BenchmarkGobEncode 23523721 23475750 -0.20%
BenchmarkGzip 2486191492 2446539568 -1.59%
BenchmarkGunzip 444179328 444250293 +0.02%
BenchmarkJSONEncode 221138507 219757826 -0.62%
BenchmarkJSONDecode 1056034428 1048975133 -0.67%
BenchmarkMandelbrot200 19862516 19868346 +0.03%
BenchmarkRevcomp 3742610872 3724821662 -0.48%
BenchmarkTemplate 960283112 944791517 -1.61%
benchmark old MB/s new MB/s speedup
BenchmarkGobDecode 22.33 22.49 1.01x
BenchmarkGobEncode 32.63 32.69 1.00x
BenchmarkGzip 7.80 7.93 1.02x
BenchmarkGunzip 43.69 43.68 1.00x
BenchmarkJSONEncode 8.77 8.83 1.01x
BenchmarkJSONDecode 1.84 1.85 1.01x
BenchmarkRevcomp 67.91 68.24 1.00x
BenchmarkTemplate 2.02 2.05 1.01x
Patch Set 1 #Patch Set 2 : diff -r 9b455eb64690 https://code.google.com/p/go #Patch Set 3 : diff -r 9b455eb64690 https://go.googlecode.com/hg/ #
Total comments: 12
Patch Set 4 : diff -r b720fc58b147 https://code.google.com/p/go #Patch Set 5 : diff -r b720fc58b147 https://go.googlecode.com/hg/ #Patch Set 6 : diff -r b720fc58b147 https://go.googlecode.com/hg/ #Patch Set 7 : diff -r b720fc58b147 https://go.googlecode.com/hg/ #Patch Set 8 : diff -r b720fc58b147 https://go.googlecode.com/hg/ #
Total comments: 2
Patch Set 9 : diff -r b720fc58b147 https://go.googlecode.com/hg/ #Patch Set 10 : diff -r f33da81baac2 https://go.googlecode.com/hg/ #Patch Set 11 : diff -r f33da81baac2 https://go.googlecode.com/hg/ #MessagesTotal messages: 26
I think you can do this, but it will require a few more checks. They are probably cheaper than the 64-bit math, at least on the ARM. http://codereview.appspot.com/6297047/diff/4001/src/pkg/runtime/malloc.h File src/pkg/runtime/malloc.h (right): http://codereview.appspot.com/6297047/diff/4001/src/pkg/runtime/malloc.h#newc... src/pkg/runtime/malloc.h:276: uintptr size; This can be uintptr, since it is the number of active bytes in use. http://codereview.appspot.com/6297047/diff/4001/src/pkg/runtime/malloc.h#newc... src/pkg/runtime/malloc.h:277: intptr local_cachealloc; // bytes allocated (or freed) from cache since last lock of heap You can make this intptr but only if you add code to check for overflow in mcache.c, something like static void runtime.MCache_ResetStats(MCache *c) { runtime.lock(&mheap); runtime.purgecachedstats(c); runtime.unlock(&mheap); } ... c->local_cachealloc += size; c->local_objects--; if(c->local_cachealloc > (1<<30)) runtime.MCache_ResetStats(c); ... c->local_cachealloc -= size; c->local_objects++; if(c->local_cachealloc < (-1<<30)) runtime.MCache_ResetStats(c); I leave it to you to determine whether that's actually a win. http://codereview.appspot.com/6297047/diff/4001/src/pkg/runtime/malloc.h#newc... src/pkg/runtime/malloc.h:278: intptr local_objects; // objects allocated (or freed) from cache since last lock of heap This can be intptr without any checks, since the total number of objects that can exist is bounded by the address size divide by the minimum block size, which is something like 4 or 8 or 16. So this can never overflow. http://codereview.appspot.com/6297047/diff/4001/src/pkg/runtime/malloc.h#newc... src/pkg/runtime/malloc.h:279: intptr local_alloc; // bytes allocated (or freed) since last lock of heap Same thing applies. The code that manipulates local_alloc and local_total_alloc would have to check for overflow. You should be able to check if(c->local_total_alloc > (1<<30) || c->local_alloc < (-1<<30)) You don't need to check c->local_alloc > 1<<30 because that would imply the first half of the if. http://codereview.appspot.com/6297047/diff/4001/src/pkg/runtime/malloc.h#newc... src/pkg/runtime/malloc.h:281: uintptr local_nmalloc; // number of mallocs since last lock of heap This will always be smaller than local_total_alloc, so you don't have to check it explicitly, as long as you do check local_total_alloc. http://codereview.appspot.com/6297047/diff/4001/src/pkg/runtime/malloc.h#newc... src/pkg/runtime/malloc.h:282: uintptr local_nfree; // number of frees since last lock of heap This probably needs an explicit check. http://codereview.appspot.com/6297047/diff/4001/src/pkg/runtime/malloc.h#newc... src/pkg/runtime/malloc.h:283: uintptr local_nlookup; // number of pointer lookups since last lock of heap This changes during garbage collection and during ordinary use. In ordinary use it is bounded by local_nfree, so there's no need for a check. If you call cachestats(nil) after stoptheworld in runtime.gc, then it will be zero on entry to the GC and will be unable to overflow, since each lookup corresponds to an address read from a different location in memory. So if you are already checking local_nfree and add the cachestats call, this can change to uintptr. http://codereview.appspot.com/6297047/diff/4001/src/pkg/runtime/malloc.h#newc... src/pkg/runtime/malloc.h:287: uintptr nmalloc; This is bounded by local_nmalloc, and similarly nfree is bounded by local_nfree. If you make it safe to make those uintptr, these can be uintptr too.
Sign in to reply to this message.
http://codereview.appspot.com/6297047/diff/4001/src/pkg/runtime/malloc.h File src/pkg/runtime/malloc.h (right): http://codereview.appspot.com/6297047/diff/4001/src/pkg/runtime/malloc.h#newc... src/pkg/runtime/malloc.h:273: struct MCache You only need to check these: - local_total_alloc in function runtime·mallocgc. local_total_alloc is always the maximum value from all values in struct MCache, except local_nlookup. - local_nlookup in function runtime·mlookup If any of these two gets greater than (1<<30), call runtime.purgecachedstats(c) like Russ suggested.
Sign in to reply to this message.
I think you do need to check some of the others. You can have local_nfree == 1 but local_total_alloc == 0 if the stats are purged and then the free happens. It is possible that the frees cannot stack up significantly more than that (maybe just freeing of defers?) but the check should be cheap. Russ
Sign in to reply to this message.
On Wed, Jun 6, 2012 at 7:25 PM, Russ Cox <rsc@golang.org> wrote: > I think you do need to check some of the others. > > You can have local_nfree == 1 but local_total_alloc == 0 if the stats > are purged and then the free happens. It is possible that the frees > cannot stack up significantly more than that (maybe just freeing of > defers?) but the check should be cheap. > > Russ 1. Run runtime.purgecachedstats: local_nfree == local_total_alloc == 0. 2. Free 0x7FFFFFFF objects: local_nfree == 0x7FFFFFFF, memory usage is zero. 3. Allocate 0x7FFFFFFF objects. 4. Free 0x7FFFFFFF objects: local_nfree == 0x7FFFFFFF+0x7FFFFFFF, memory usage is zero. 5. Allocate 10 objects. 6. Free 10 objects: local_nfree overflows uintptr. However, (3) will always trigger runtime.purgecachedstats(c), so in step (4) local_nfree == 0x7FFFFFFF. Thus local_nfree cannot overflow.
Sign in to reply to this message.
Nice.
Sign in to reply to this message.
Hello. Thank you for your comments. I've attempted to incorporate Atom's fast checking solution based on local_total_alloc and local_nlookup and it appears to have no observable cost penalty (results for amd64 are unaffected, as you would hope). Please take another look. Having said that, I'm concerned that I'm punching above my weight so would ask if others with more experience would like to adopt this patch. http://codereview.appspot.com/6297047/diff/4001/src/pkg/runtime/malloc.h File src/pkg/runtime/malloc.h (right): http://codereview.appspot.com/6297047/diff/4001/src/pkg/runtime/malloc.h#newc... src/pkg/runtime/malloc.h:273: struct MCache On 2012/06/06 16:58:02, atom wrote: > You only need to check these: > > - local_total_alloc in function runtime·mallocgc. local_total_alloc is always > the maximum value from all values in struct MCache, except local_nlookup. > > - local_nlookup in function runtime·mlookup > > If any of these two gets greater than (1<<30), call runtime.purgecachedstats(c) > like Russ suggested. Done. http://codereview.appspot.com/6297047/diff/4001/src/pkg/runtime/malloc.h#newc... src/pkg/runtime/malloc.h:276: uintptr size; On 2012/06/06 14:55:35, rsc wrote: > This can be uintptr, since it is the number of active bytes in use. Done. http://codereview.appspot.com/6297047/diff/4001/src/pkg/runtime/malloc.h#newc... src/pkg/runtime/malloc.h:278: intptr local_objects; // objects allocated (or freed) from cache since last lock of heap On 2012/06/06 14:55:35, rsc wrote: > This can be intptr without any checks, since the total number of objects that > can exist is bounded by the address size divide by the minimum block size, which > is something like 4 or 8 or 16. So this can never overflow. > Done.
Sign in to reply to this message.
http://codereview.appspot.com/6297047/diff/8003/src/pkg/runtime/malloc.goc File src/pkg/runtime/malloc.goc (right): http://codereview.appspot.com/6297047/diff/8003/src/pkg/runtime/malloc.goc#ne... src/pkg/runtime/malloc.goc:76: if (sizeof(void*) == 4 && c->local_total_alloc > (1<<30)) { s/>/>=/ s/if (/if(/ http://codereview.appspot.com/6297047/diff/8003/src/pkg/runtime/malloc.goc#ne... src/pkg/runtime/malloc.goc:181: if (sizeof(void*) == 4 && m->mcache->local_nlookup > (1<<30)) { s/>/>=/ s/if (/if(/
Sign in to reply to this message.
On Thu, Jun 7, 2012 at 7:08 AM, <dave@cheney.net> wrote: > Hello. Thank you for your comments. I've attempted to incorporate Atom's > fast checking solution based on local_total_alloc and local_nlookup and > it appears to have no observable cost penalty (results for amd64 are > unaffected, as you would hope). Looks good to me. > Having said that, I'm concerned that I'm punching above my weight so > would ask if others with more experience would like to adopt this patch. I appreciate your caution, but the change is still quite small and easy to review. I don't think one of us needs to take it over. Please run hg mail to send it to golang-dev. Thanks! Russ
Sign in to reply to this message.
Hello rsc@golang.org, 0xe2.0x9a.0x9b@gmail.com (cc: golang-dev@googlegroups.com, minux.ma@gmail.com), I'd like you to review this change to https://go.googlecode.com/hg/
Sign in to reply to this message.
>= in both tests please.
Sign in to reply to this message.
Hello rsc@golang.org, 0xe2.0x9a.0x9b@gmail.com (cc: golang-dev@googlegroups.com, minux.ma@gmail.com), I'd like you to review this change to https://go.googlecode.com/hg/
Sign in to reply to this message.
linux/arm GOARM=5 benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 115763682000 69593536000 -39.88% BenchmarkFannkuch11 64647655000 64645548000 -0.00% BenchmarkGobDecode 492308180 404399700 -17.86% BenchmarkGobEncode 178356180 175380790 -1.67% BenchmarkGzip 12400566000 11852504000 -4.42% BenchmarkGunzip 2239261800 2240632500 +0.06% BenchmarkJSONEncode 2333923200 2163537800 -7.30% BenchmarkJSONDecode 8216951000 6840673400 -16.75% BenchmarkMandelbrot200 13968008000 14094478000 +0.91% BenchmarkParse 195200060 168838930 -13.50% BenchmarkTemplate 11135888000 9484298600 -14.83% benchmark old MB/s new MB/s speedup BenchmarkGobDecode 1.56 1.90 1.22x BenchmarkGobEncode 4.30 4.38 1.02x BenchmarkGzip 1.56 1.64 1.05x BenchmarkGunzip 8.67 8.66 1.00x BenchmarkJSONEncode 0.83 0.90 1.08x BenchmarkJSONDecode 0.24 0.28 1.17x BenchmarkParse 0.30 0.34 1.13x BenchmarkTemplate 0.17 0.20 1.18x
Sign in to reply to this message.
linux/386: benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 9404107000 9602360000 +2.11% BenchmarkFannkuch11 10682824000 10813868000 +1.23% BenchmarkGobDecode 31227240 31397420 +0.54% BenchmarkGobEncode 20020010 18816970 -6.01% BenchmarkGzip 3284496000 3273865000 -0.32% BenchmarkGunzip 327753400 331800800 +1.23% BenchmarkJSONEncode 175218800 177365100 +1.22% BenchmarkJSONDecode 911270800 884113600 -2.98% BenchmarkMandelbrot200 14305160 14285810 -0.14% BenchmarkParse 11916440 12743660 +6.94% BenchmarkRevcomp 3063132000 3127920000 +2.12% BenchmarkTemplate 798228600 848137400 +6.25% benchmark old MB/s new MB/s speedup BenchmarkGobDecode 24.58 24.45 0.99x BenchmarkGobEncode 38.34 40.79 1.06x BenchmarkGzip 5.91 5.93 1.00x BenchmarkGunzip 59.21 58.48 0.99x BenchmarkJSONEncode 11.07 10.94 0.99x BenchmarkJSONDecode 2.13 2.19 1.03x BenchmarkParse 4.86 4.55 0.94x BenchmarkRevcomp 82.98 81.26 0.98x BenchmarkTemplate 2.43 2.29 0.94x
Sign in to reply to this message.
i'm very skeptical
Sign in to reply to this message.
> i'm very skeptical about the tests or about the patch?
Sign in to reply to this message.
The patch seems fine. It's hard for me to believe that a new if statement caused 6% overhead in the linux/386 benchmarks. Russ
Sign in to reply to this message.
To make sure I wasn't getting wild swings in test results I reran the before and after tests 5 times each and compared all possible 2-combinations of {old, new}. the results seem statistically plausible at first glance. the widest swings are in GobDecode/GobEncode, but there are orders of magnitude differences cropping elsewhere. The machine isn't quiescent, the load is 0.2 currently. I can't rerun the tests on my "quiet" machine since it's out of order. One can discard these if they're not representative and I'll rerun the tests if I get a hold of my machine and posts the results later. sorry for the noise. linux/386: BenchmarkBinaryTree17 9277363000 9582318000 +3.29% BenchmarkBinaryTree17 9277363000 9752558000 +5.12% BenchmarkBinaryTree17 9278767000 9582318000 +3.27% BenchmarkBinaryTree17 9301428000 9581900000 +3.02% BenchmarkBinaryTree17 9301428000 9582318000 +3.02% BenchmarkBinaryTree17 9301428000 9585110000 +3.05% BenchmarkBinaryTree17 9301428000 9752558000 +4.85% BenchmarkBinaryTree17 9317258000 9581900000 +2.84% BenchmarkBinaryTree17 9317258000 9582318000 +2.84% BenchmarkBinaryTree17 9317258000 9752558000 +4.67% BenchmarkFannkuch11 10685475000 10689056000 +0.03% BenchmarkFannkuch11 10685475000 10706592000 +0.20% BenchmarkFannkuch11 10686551000 10689056000 +0.02% BenchmarkFannkuch11 10686551000 10689520000 +0.03% BenchmarkFannkuch11 10686551000 10706592000 +0.19% BenchmarkFannkuch11 10686551000 10706815000 +0.19% BenchmarkFannkuch11 10687061000 10689056000 +0.02% BenchmarkFannkuch11 10687061000 10689520000 +0.02% BenchmarkFannkuch11 10687061000 10706592000 +0.18% BenchmarkFannkuch11 10705983000 10706592000 +0.01% BenchmarkGobDecode 29382830 31245560 +6.34% BenchmarkGobDecode 29539720 31245560 +5.77% BenchmarkGobDecode 29539720 33493600 +13.38% BenchmarkGobDecode 29540580 31018220 +5.00% BenchmarkGobDecode 29540580 31245560 +5.77% BenchmarkGobDecode 29540580 31871920 +7.89% BenchmarkGobDecode 29540580 33493600 +13.38% BenchmarkGobDecode 29672330 31018220 +4.54% BenchmarkGobDecode 29672330 31245560 +5.30% BenchmarkGobDecode 29672330 33493600 +12.88% BenchmarkGobEncode 18533560 18770980 +1.28% BenchmarkGobEncode 18533560 18813260 +1.51% BenchmarkGobEncode 18533560 18862960 +1.78% BenchmarkGobEncode 18533560 20190290 +8.94% BenchmarkGobEncode 18579000 18813260 +1.26% BenchmarkGobEncode 19017650 18813260 -1.07% BenchmarkGobEncode 19017650 20190290 +6.17% BenchmarkGobEncode 19035020 18770980 -1.39% BenchmarkGobEncode 19035020 18813260 -1.17% BenchmarkGobEncode 19035020 20190290 +6.07% BenchmarkGunzip 326639600 327246200 +0.19% BenchmarkGunzip 326639600 327850000 +0.37% BenchmarkGunzip 326643400 327165000 +0.16% BenchmarkGunzip 326643400 327246200 +0.18% BenchmarkGunzip 326643400 327850000 +0.37% BenchmarkGunzip 326643400 329895800 +1.00% BenchmarkGunzip 326747600 327165000 +0.13% BenchmarkGunzip 326747600 327246200 +0.15% BenchmarkGunzip 326747600 327850000 +0.34% BenchmarkGunzip 326898000 327850000 +0.29% BenchmarkGzip 3273516000 3281478000 +0.24% BenchmarkGzip 3273516000 3288598000 +0.46% BenchmarkGzip 3273516000 3357398000 +2.56% BenchmarkGzip 3275122000 3288598000 +0.41% BenchmarkGzip 3275906000 3281478000 +0.17% BenchmarkGzip 3275906000 3288598000 +0.39% BenchmarkGzip 3276049000 3277060000 +0.03% BenchmarkGzip 3276049000 3281478000 +0.17% BenchmarkGzip 3276049000 3288598000 +0.38% BenchmarkGzip 3276049000 3357398000 +2.48% BenchmarkJSONDecode 886322600 874272200 -1.36% BenchmarkJSONDecode 886322600 875726200 -1.20% BenchmarkJSONDecode 887249200 874272200 -1.46% BenchmarkJSONDecode 887281000 874272200 -1.47% BenchmarkJSONDecode 887281000 875726200 -1.30% BenchmarkJSONDecode 887281000 876922200 -1.17% BenchmarkJSONDecode 889470600 870878000 -2.09% BenchmarkJSONDecode 889470600 874272200 -1.71% BenchmarkJSONDecode 889470600 875726200 -1.55% BenchmarkJSONDecode 889470600 876922200 -1.41% BenchmarkJSONEncode 171035600 173433200 +1.40% BenchmarkJSONEncode 171224900 173127600 +1.11% BenchmarkJSONEncode 171224900 173433200 +1.29% BenchmarkJSONEncode 171224900 178469900 +4.23% BenchmarkJSONEncode 171224900 179335500 +4.74% BenchmarkJSONEncode 171502500 173433200 +1.13% BenchmarkJSONEncode 171502500 178469900 +4.06% BenchmarkJSONEncode 171502500 179335500 +4.57% BenchmarkJSONEncode 171614800 173433200 +1.06% BenchmarkJSONEncode 171614800 178469900 +3.99% BenchmarkMandelbrot200 14287420 14280060 -0.05% BenchmarkMandelbrot200 14287420 14284540 -0.02% BenchmarkMandelbrot200 14287420 14287830 +0.00% BenchmarkMandelbrot200 14287420 14305550 +0.13% BenchmarkMandelbrot200 14287470 14284540 -0.02% BenchmarkMandelbrot200 14287470 14287830 +0.00% BenchmarkMandelbrot200 14287470 14305550 +0.13% BenchmarkMandelbrot200 14290430 14287830 -0.02% BenchmarkMandelbrot200 14290430 14305550 +0.11% BenchmarkMandelbrot200 14292580 14305550 +0.09% BenchmarkParse 11959530 12069280 +0.92% BenchmarkParse 11959530 12075110 +0.97% BenchmarkParse 11959530 12773940 +6.81% BenchmarkParse 11986410 11776140 -1.75% BenchmarkParse 11986410 12069280 +0.69% BenchmarkParse 11986410 12075110 +0.74% BenchmarkParse 11986410 12773940 +6.57% BenchmarkParse 11993180 12069280 +0.63% BenchmarkParse 12023210 12069280 +0.38% BenchmarkParse 12023210 12075110 +0.43% BenchmarkRevcomp 3024124000 3074518000 +1.67% BenchmarkRevcomp 3024124000 3125252000 +3.34% BenchmarkRevcomp 3024124000 3127247000 +3.41% BenchmarkRevcomp 3054404000 3074518000 +0.66% BenchmarkRevcomp 3054404000 3127247000 +2.38% BenchmarkRevcomp 3055270000 3066376000 +0.36% BenchmarkRevcomp 3055270000 3074518000 +0.63% BenchmarkRevcomp 3055270000 3125252000 +2.29% BenchmarkRevcomp 3055270000 3127247000 +2.36% BenchmarkRevcomp 3062357000 3074518000 +0.40% BenchmarkTemplate 795127000 811485400 +2.06% BenchmarkTemplate 795127000 842240400 +5.93% BenchmarkTemplate 795723400 808890200 +1.65% BenchmarkTemplate 795723400 811485400 +1.98% BenchmarkTemplate 795723400 842240400 +5.85% BenchmarkTemplate 795723400 847747400 +6.54% BenchmarkTemplate 795879200 811485400 +1.96% BenchmarkTemplate 795879200 842240400 +5.83% BenchmarkTemplate 795879200 847747400 +6.52% BenchmarkTemplate 796132000 811485400 +1.93%
Sign in to reply to this message.
On Fri, Jun 8, 2012 at 4:46 PM, andrey mirtchovski <mirtchovski@gmail.com> wrote: > BenchmarkParse 11959530 12069280 +0.92% > BenchmarkParse 11959530 12075110 +0.97% > BenchmarkParse 11959530 12773940 +6.81% > BenchmarkParse 11986410 11776140 -1.75% > BenchmarkParse 11986410 12069280 +0.69% > BenchmarkParse 11986410 12075110 +0.74% > BenchmarkParse 11986410 12773940 +6.57% > BenchmarkParse 11993180 12069280 +0.63% > BenchmarkParse 12023210 12069280 +0.38% > BenchmarkParse 12023210 12075110 +0.43% That's interesting.
Sign in to reply to this message.
On an idle Core i7, benchmark old MB/s new MB/s speedup BenchmarkGobDecode 37.85 38.00 1.00x BenchmarkGobEncode 63.15 63.02 1.00x BenchmarkGzip 13.45 13.61 1.01x BenchmarkGunzip 74.22 73.98 1.00x BenchmarkJSONEncode 16.53 16.53 1.00x BenchmarkJSONDecode 3.19 3.19 1.00x BenchmarkParse 6.87 6.80 0.99x BenchmarkRevcomp 127.03 125.38 0.99x BenchmarkTemplate 3.71 3.67 0.99x Russ
Sign in to reply to this message.
LGTM
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=d2d54e5b3317 *** runtime: use uintptr where possible in malloc stats linux/arm OMAP4 pandaboard benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 68723297000 37026214000 -46.12% BenchmarkFannkuch11 34962402000 35958435000 +2.85% BenchmarkGobDecode 137298600 124182150 -9.55% BenchmarkGobEncode 60717160 60006700 -1.17% BenchmarkGzip 5647156000 5550873000 -1.70% BenchmarkGunzip 1196350000 1198670000 +0.19% BenchmarkJSONEncode 863012800 782898000 -9.28% BenchmarkJSONDecode 3312989000 2781800000 -16.03% BenchmarkMandelbrot200 45727540 45703120 -0.05% BenchmarkParse 74781800 59990840 -19.78% BenchmarkRevcomp 140043650 139462300 -0.42% BenchmarkTemplate 6467682000 5832153000 -9.83% benchmark old MB/s new MB/s speedup BenchmarkGobDecode 5.59 6.18 1.11x BenchmarkGobEncode 12.64 12.79 1.01x BenchmarkGzip 3.44 3.50 1.02x BenchmarkGunzip 16.22 16.19 1.00x BenchmarkJSONEncode 2.25 2.48 1.10x BenchmarkJSONDecode 0.59 0.70 1.19x BenchmarkParse 0.77 0.97 1.26x BenchmarkRevcomp 18.15 18.23 1.00x BenchmarkTemplate 0.30 0.33 1.10x darwin/386 core duo benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 10591616577 9678245733 -8.62% BenchmarkFannkuch11 10758473315 10749303846 -0.09% BenchmarkGobDecode 34379785 34121250 -0.75% BenchmarkGobEncode 23523721 23475750 -0.20% BenchmarkGzip 2486191492 2446539568 -1.59% BenchmarkGunzip 444179328 444250293 +0.02% BenchmarkJSONEncode 221138507 219757826 -0.62% BenchmarkJSONDecode 1056034428 1048975133 -0.67% BenchmarkMandelbrot200 19862516 19868346 +0.03% BenchmarkRevcomp 3742610872 3724821662 -0.48% BenchmarkTemplate 960283112 944791517 -1.61% benchmark old MB/s new MB/s speedup BenchmarkGobDecode 22.33 22.49 1.01x BenchmarkGobEncode 32.63 32.69 1.00x BenchmarkGzip 7.80 7.93 1.02x BenchmarkGunzip 43.69 43.68 1.00x BenchmarkJSONEncode 8.77 8.83 1.01x BenchmarkJSONDecode 1.84 1.85 1.01x BenchmarkRevcomp 67.91 68.24 1.00x BenchmarkTemplate 2.02 2.05 1.01x R=rsc, 0xe2.0x9a.0x9b, mirtchovski CC=golang-dev, minux.ma http://codereview.appspot.com/6297047 Committer: Russ Cox <rsc@golang.org>
Sign in to reply to this message.
just a bit more noise. I did full 25 permutations since i realized that 'old' and 'new' are different sets. each result gives mean and stddev of the diff % for the 25 comparisons between pre- and post- CL, always comparing old vs new: linux/386, 4-core opteron 2216: BenchmarkBinaryTree17 5.5224 4.23802 BenchmarkFannkuch11 0.01 0.111319 BenchmarkGobDecode 7.5264 0.889871 BenchmarkGobEncode -0.6116 0.754298 BenchmarkGunzip 0.0108 0.0651718 BenchmarkGzip -0.0452 0.168193 BenchmarkJSONDecode -1.6704 0.28888 BenchmarkJSONEncode 1.672 0.380967 BenchmarkMandelbrot200 0.0204 0.0650218 BenchmarkParse 1.6252 2.74243 BenchmarkRevcomp 0.0352 0.873156 BenchmarkTemplate 2.0496 1.0998 linux/386, core2duo: BenchmarkBinaryTree17 -7.3452 0.167359 BenchmarkFannkuch11 -0.7848 0.0179154 BenchmarkGobDecode 0.4256 0.743048 BenchmarkGobEncode -0.1888 0.0703886 BenchmarkGunzip 4.1372 2.11495 BenchmarkGzip 0.2104 0.572772 BenchmarkJSONDecode -0.4104 0.230104 BenchmarkJSONEncode -0.1228 0.130369 BenchmarkMandelbrot200 0.0008 0.00271293 BenchmarkParse -1.0472 0.119148 BenchmarkRevcomp -0.0776 0.0667101 BenchmarkTemplate -1.2672 0.0665444
Sign in to reply to this message.
Thanks for the thorough testing. Would you be able to try with the two overflow checks removed from malloc.goc and all.bash rerun. I agree with rsc that these should not add significantly to the overhead, but it would be nice to know. Sent from my iPad On 09/06/2012, at 8:03, andrey mirtchovski <mirtchovski@gmail.com> wrote: > just a bit more noise. I did full 25 permutations since i realized > that 'old' and 'new' are different sets. each result gives mean and > stddev of the diff % for the 25 comparisons between pre- and post- CL, > always comparing old vs new: > > linux/386, 4-core opteron 2216: > > BenchmarkBinaryTree17 5.5224 4.23802 > BenchmarkFannkuch11 0.01 0.111319 > BenchmarkGobDecode 7.5264 0.889871 > BenchmarkGobEncode -0.6116 0.754298 > BenchmarkGunzip 0.0108 0.0651718 > BenchmarkGzip -0.0452 0.168193 > BenchmarkJSONDecode -1.6704 0.28888 > BenchmarkJSONEncode 1.672 0.380967 > BenchmarkMandelbrot200 0.0204 0.0650218 > BenchmarkParse 1.6252 2.74243 > BenchmarkRevcomp 0.0352 0.873156 > BenchmarkTemplate 2.0496 1.0998 > > linux/386, core2duo: > > BenchmarkBinaryTree17 -7.3452 0.167359 > BenchmarkFannkuch11 -0.7848 0.0179154 > BenchmarkGobDecode 0.4256 0.743048 > BenchmarkGobEncode -0.1888 0.0703886 > BenchmarkGunzip 4.1372 2.11495 > BenchmarkGzip 0.2104 0.572772 > BenchmarkJSONDecode -0.4104 0.230104 > BenchmarkJSONEncode -0.1228 0.130369 > BenchmarkMandelbrot200 0.0008 0.00271293 > BenchmarkParse -1.0472 0.119148 > BenchmarkRevcomp -0.0776 0.0667101 > BenchmarkTemplate -1.2672 0.0665444
Sign in to reply to this message.
linux/386, core2duo, pre-patch vs CL minus malloc.cgo overflow checks: benchmark old ns/op new ns/op deltaBenchmarkBinaryTree17 7088561000 6787030000 -4.25% BenchmarkFannkuch11 6707543000 6660017000 -0.71% BenchmarkGobDecode 24963420 25433580 +1.88% BenchmarkGobEncode 15112400 15099320 -0.09% BenchmarkGzip 1824614000 1822942000 -0.09% BenchmarkGunzip 292055600 291917200 -0.05% BenchmarkJSONEncode 152174600 151412500 -0.50% BenchmarkJSONDecode 756780400 750679600 -0.81% BenchmarkMandelbrot200 18858590 18860390 +0.01% BenchmarkParse 10527070 10497640 -0.28% BenchmarkRevcomp 2656891000 2657431000 +0.02% BenchmarkTemplate 683705400 672919800 -1.58% benchmark old MB/s new MB/s speedup BenchmarkGobDecode 30.75 30.18 0.98x BenchmarkGobEncode 50.79 50.83 1.00x BenchmarkGzip 10.63 10.64 1.00x BenchmarkGunzip 66.44 66.47 1.00x BenchmarkJSONEncode 12.75 12.82 1.01x BenchmarkJSONDecode 2.56 2.58 1.01x BenchmarkParse 5.50 5.52 1.00x BenchmarkRevcomp 95.66 95.64 1.00x BenchmarkTemplate 2.84 2.88 1.01x linux/386, core2duo, CL applied vs CL minus malloc.cgo overflow checks: benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 6552946000 6787030000 +3.57% BenchmarkFannkuch11 6653561000 6660017000 +0.10% BenchmarkGobDecode 25376790 25433580 +0.22% BenchmarkGobEncode 15073620 15099320 +0.17% BenchmarkGzip 1822452000 1822942000 +0.03% BenchmarkGunzip 306155400 291917200 -4.65% BenchmarkJSONEncode 151789000 151412500 -0.25% BenchmarkJSONDecode 753945000 750679600 -0.43% BenchmarkMandelbrot200 18859190 18860390 +0.01% BenchmarkParse 10402120 10497640 +0.92% BenchmarkRevcomp 2654264000 2657431000 +0.12% BenchmarkTemplate 675202400 672919800 -0.34% benchmark old MB/s new MB/s speedup BenchmarkGobDecode 30.25 30.18 1.00x BenchmarkGobEncode 50.92 50.83 1.00x BenchmarkGzip 10.65 10.64 1.00x BenchmarkGunzip 63.38 66.47 1.05x BenchmarkJSONEncode 12.78 12.82 1.00x BenchmarkJSONDecode 2.57 2.58 1.00x BenchmarkParse 5.57 5.52 0.99x BenchmarkRevcomp 95.76 95.64 1.00x BenchmarkTemplate 2.87 2.88 1.00x linux/386, core2duo, pre-patch vs CL minus malloc.cgo overflow checks, 5 tests performed for each. this shows mean and stddev for the difference percentage (the 'delta' value reported by benchcmp) of all the permutations between the two sets, 25 comparisons per test. BenchmarkBinaryTree17 -4.1684 0.174417 BenchmarkFannkuch11 -0.6772 0.0156256 BenchmarkGobDecode 1.0524 0.519679 BenchmarkGobEncode -0.0972 0.0628344 BenchmarkGunzip 0.1776 0.189658 BenchmarkGzip -0.1232 0.1576 BenchmarkJSONDecode -0.9616 0.172411 BenchmarkJSONEncode -0.6172 0.076185 BenchmarkMandelbrot200 0.008 0.00748331 BenchmarkParse -0.4492 0.0889908 BenchmarkRevcomp -0.016 0.0766812 BenchmarkTemplate -1.7688 0.11194 linux/386, core2duo, CL applied vs CL minus overflow checks: BenchmarkBinaryTree17 3.4272 0.102392 BenchmarkFannkuch11 0.1084 0.00783837 BenchmarkGobDecode 0.6288 0.629948 BenchmarkGobEncode 0.0912 0.0449284 BenchmarkGunzip -3.762 2.01043 BenchmarkGzip -0.3304 0.576989 BenchmarkJSONDecode -0.5512 0.166405 BenchmarkJSONEncode -0.4952 0.124615 BenchmarkMandelbrot200 0.0052 0.0064 BenchmarkParse 0.6048 0.137 BenchmarkRevcomp 0.062 0.0928224 BenchmarkTemplate -0.5088 0.096263
Sign in to reply to this message.
This smells like more memory layout nonsense. The tweaks are changing which lines of code sit where in memory and causing strange changes in tight loops. Russ
Sign in to reply to this message.
Thanks Russ and Andrey. On Sat, Jun 9, 2012 at 8:43 AM, Russ Cox <rsc@golang.org> wrote: > This smells like more memory layout nonsense. The tweaks are changing > which lines of code sit where in memory and causing strange changes in > tight loops. > > Russ
Sign in to reply to this message.
|