Issue 5570044: code review 5570044: math/big: slight improvement to algorithm used for inte...

Can't Edit
Can't Publish+Mail
Start Review

Created:
13 years, 5 months ago by dga

Modified:
13 years, 5 months ago

Reviewers:
brtzsnr, gri

CC:
golang-dev, dave_cheney.net, eds2, gri

Visibility:
Public.

Description

math/big: slight improvement to algorithm used for internal bitLen function The bitLen function currently shifts out blocks of 8 bits at a time. This change replaces this sorta-linear algorithm with a log(N) one (shift out 16 bits, then 8, then 4, then 2, then 1). I left the start of it linear at 16 bits at a time so that the function continues to work with 32 or 64 bit values without any funkiness. The algorithm is similar to several of the nlz ("number of leading zeros") algorithms from "Hacker's Delight" or the "bit twiddling hacks" pages. Doesn't make a big difference to the existing benchmarks, but I'm using the code in a different context that calls bitLen much more often, so it seemed worthwhile making the existing codebase faster so that it's a better building block. Microbenchmark results on a 64-bit Macbook Pro using 6g from weekly.2012-01-20: benchmark old ns/op new ns/op delta big.BenchmarkBitLen0 4 6 +50.12% big.BenchmarkBitLen1 4 6 +33.91% big.BenchmarkBitLen2 6 6 +3.05% big.BenchmarkBitLen3 7 6 -19.05% big.BenchmarkBitLen4 9 6 -30.19% big.BenchmarkBitLen5 11 6 -42.23% big.BenchmarkBitLen8 16 6 -61.78% big.BenchmarkBitLen9 5 6 +18.29% big.BenchmarkBitLen16 18 7 -60.99% big.BenchmarkBitLen17 7 6 -4.64% big.BenchmarkBitLen31 19 7 -62.49% On an ARM machine (with the previous weekly): benchmark old ns/op new ns/op delta big.BenchmarkBitLen0 37 50 +36.56% big.BenchmarkBitLen1 59 51 -13.69% big.BenchmarkBitLen2 74 59 -20.40% big.BenchmarkBitLen3 92 60 -34.89% big.BenchmarkBitLen4 110 59 -46.09% big.BenchmarkBitLen5 127 60 -52.68% big.BenchmarkBitLen8 181 59 -67.24% big.BenchmarkBitLen9 78 60 -23.05% big.BenchmarkBitLen16 199 69 -65.13% big.BenchmarkBitLen17 91 70 -23.17% big.BenchmarkBitLen31 210 95 -54.43%

Patch Set 1 #

Patch Set 2 : diff -r 9f2be4fbbf69 https://go.googlecode.com/hg/ #

Patch Set 3 : diff -r 63a6abde14b1 https://go.googlecode.com/hg/ #

Patch Set 4 : diff -r 63a6abde14b1 https://go.googlecode.com/hg/ #

Total comments: 2

Patch Set 5 : diff -r 63a6abde14b1 https://go.googlecode.com/hg/ #

Created: 13 years, 5 months ago

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+36 lines, -2 lines)			Patch
	M	src/pkg/math/big/arith.go	View	1	1 chunk	+14 lines, -2 lines	0 comments	Download
	M	src/pkg/math/big/arith_test.go	View	1 2 3 4	1 chunk	+22 lines, -0 lines	0 comments	Download

Messages

Total messages: 18

Expand All Messages | Collapse All Messages

dga

Hello golang-dev@googlegroups.com (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/

13 years, 5 months ago (2012-01-23 06:32:23 UTC) #1

dga

Must be close to bedtime. In the microbenchmark results, Bitl2 is the *old* implementation, obviously.

13 years, 5 months ago (2012-01-23 06:35:57 UTC) #2

dave_cheney.net

Excellent, very pleased to see the results on arm are also improved.

13 years, 5 months ago (2012-01-23 07:37:59 UTC) #3

eds2

If this makes a big difference for you, you could implement an i386/amd64 assembly version ...

13 years, 5 months ago (2012-01-23 09:33:23 UTC) #4

gri

This looks good, but I remember having done exactly the same when I tweaked the ...

13 years, 5 months ago (2012-01-23 17:20:51 UTC) #5

dga

What range of numbers do you mean for "small?" The benchmark results I posted above ...

13 years, 5 months ago (2012-01-23 18:25:44 UTC) #6

gri

On Mon, Jan 23, 2012 at 10:25 AM, <dave.andersen@gmail.com> wrote: > What range of numbers ...

13 years, 5 months ago (2012-01-23 18:53:33 UTC) #7

On Mon, Jan 23, 2012 at 10:25 AM,  <dave.andersen@gmail.com> wrote:
> What range of numbers do you mean for "small?"  The benchmark results I
> posted above considered small as 0 - 32, and large as picked at random

Sounds good. Please just add a benchmark so we can track this over time.

> between 0 - 1B.  Both of those ranges were substantially faster on both
> 64bit mac and 32bit ARM.
>
> Note that a lot of the speedup is apparent by examination.  The new code

I am not disagreeing with you.

> executes 5 conditional operations regardless of the number being
> examined, and [0-5]*2 arithmetic ops.  The old code executed between
> 0-12 conditional and arithmetic operations:
>
> Number  Old code ops    New code ops
> 0             2C, 0A              5C, 0A
> 1             3C, 1A              5C, 1A
> 2-3         4C, 2A              5C, 1A   <-- win some, lose some, arch
> dep.
> 4-7         5C, 3A              5C, 2A   <-- win
> 8-15       6C, 4A              5C, 1A   <-- huge win.
> 16-31     7C, 5A              5C, 2A   <-- as above.
> 2^32-1   14C, 12A          6C, 6A
>
> The only reason I'm asking is that this is a very specific benchmark,
> but it takes a fair bit of evaluation to comprehensively see where the
> differences arise.  People seem to not like having too much time taken
> up by the benchmark results.  I'm happy to contribute the benchmarks,
> since I've already written them. :)

It seems to me that the benchmark should just test one number for each
length; so it's just a loop calling bitLen for 32 or 64 numbers.

- gri
>
>
> On 2012/01/23 17:20:51, gri wrote:
>>
>> This looks good, but I remember having done exactly the same when I
>
> tweaked the
>>
>> bitLen implementation a while ago with not so good results for small
>
> numbers (it
>>
>> got significantly slower).
>
>
>> These kind of benchmarks can show significant differences on different
>
> machines.
>
>> For a start, can you please add a good benchmark to arith_test.go?
>
> Ideally, the
>>
>> code is faster for all number ranges affected. If you write the
>
> benchmark
>>
>> results into files, you can use $GOROOT/misc/benchcmp for a nice
>
> side-by-side
>>
>> comparison.
>
>
>> Thanks.
>
>
>
>
> http://codereview.appspot.com/5570044/

dga

Before I add another CL, is this roughly the benchmark you hoped for? [Results from ...

13 years, 5 months ago (2012-01-23 19:24:31 UTC) #8

gri

Yes, this looks good. You probably don't need that many different ones. Given your new ...

13 years, 5 months ago (2012-01-23 19:30:11 UTC) #9

Yes, this looks good. You probably don't need that many different
ones. Given your new code, probably one for each power of two maybe.
There's no need for a new CL, please just add it to the current CL.
- gri

On Mon, Jan 23, 2012 at 11:24 AM,  <dave.andersen@gmail.com> wrote:
> Before I add another CL, is this roughly the benchmark you hoped for?
> [Results from 64bit mbp, but I'm really asking about the scope of the
> benchmark]
> Old version:
>
> big.BenchmarkBitLen0    500000000                4.32 ns/op
> big.BenchmarkBitLen1    500000000                4.95 ns/op
> big.BenchmarkBitLen3    200000000                8.93 ns/op
> big.BenchmarkBitLen5    100000000               13.0 ns/op
> big.BenchmarkBitLen8    100000000               19.4 ns/op
> big.BenchmarkBitLen13   100000000               13.5 ns/op
> big.BenchmarkBitLen15   100000000               18.0 ns/op
> big.BenchmarkBitLen16   100000000               19.8 ns/op
> big.BenchmarkBitLen17   500000000                6.76 ns/op
> big.BenchmarkBitLen29   100000000               17.3 ns/op
> big.BenchmarkBitLen30   100000000               19.1 ns/op
> big.BenchmarkBitLen31   100000000               21.4 ns/op
>
> New version:
> big.BenchmarkBitLen0    500000000                6.46 ns/op
> big.BenchmarkBitLen1    500000000                5.84 ns/op
> big.BenchmarkBitLen3    500000000                6.24 ns/op
> big.BenchmarkBitLen5    500000000                6.12 ns/op
> big.BenchmarkBitLen8    500000000                6.78 ns/op
> big.BenchmarkBitLen13   500000000                6.44 ns/op
> big.BenchmarkBitLen15   500000000                6.77 ns/op
> big.BenchmarkBitLen16   500000000                7.00 ns/op
> big.BenchmarkBitLen17   500000000                6.19 ns/op
> big.BenchmarkBitLen29   500000000                6.91 ns/op
> big.BenchmarkBitLen30   200000000                8.01 ns/op
> big.BenchmarkBitLen31   500000000                7.10 ns/op
>
> http://codereview.appspot.com/5570044/

dga

Hello golang-dev@googlegroups.com, dave@cheney.net, edsrzf@gmail.com, gri@golang.org (cc: golang-dev@googlegroups.com), Please take another look.

13 years, 5 months ago (2012-01-23 20:03:46 UTC) #10

gri

Looks good. A couple of minor suggestions. Please also update the CL description by running ...

13 years, 5 months ago (2012-01-23 20:37:03 UTC) #11

dga

On 2012/01/23 20:37:03, gri wrote: > Looks good. A couple of minor suggestions. > > ...

13 years, 5 months ago (2012-01-23 20:47:29 UTC) #12

dga

Hello golang-dev@googlegroups.com, dave@cheney.net, edsrzf@gmail.com, gri@golang.org (cc: golang-dev@googlegroups.com), Please take another look.

13 years, 5 months ago (2012-01-23 21:00:07 UTC) #13

gri

*** Submitted as 408491a4bde1 *** math/big: slight improvement to algorithm used for internal bitLen function ...

13 years, 5 months ago (2012-01-23 21:46:30 UTC) #15

*** Submitted as 408491a4bde1 ***

math/big: slight improvement to algorithm used for internal bitLen function

The bitLen function currently shifts out blocks of 8 bits at a time.
This change replaces this sorta-linear algorithm with a log(N)
one (shift out 16 bits, then 8, then 4, then 2, then 1).
I left the start of it linear at 16 bits at a time so that
the function continues to work with 32 or 64 bit values
without any funkiness.
The algorithm is similar to several of the nlz ("number of
leading zeros") algorithms from "Hacker's Delight" or the
"bit twiddling hacks" pages.

Doesn't make a big difference to the existing benchmarks, but
I'm using the code in a different context that calls bitLen
much more often, so it seemed worthwhile making the existing
codebase faster so that it's a better building block.

Microbenchmark results on a 64-bit Macbook Pro using 6g from weekly.2012-01-20:

benchmark                old ns/op    new ns/op    delta
big.BenchmarkBitLen0             4            6  +50.12%
big.BenchmarkBitLen1             4            6  +33.91%
big.BenchmarkBitLen2             6            6   +3.05%
big.BenchmarkBitLen3             7            6  -19.05%
big.BenchmarkBitLen4             9            6  -30.19%
big.BenchmarkBitLen5            11            6  -42.23%
big.BenchmarkBitLen8            16            6  -61.78%
big.BenchmarkBitLen9             5            6  +18.29%
big.BenchmarkBitLen16           18            7  -60.99%
big.BenchmarkBitLen17            7            6   -4.64%
big.BenchmarkBitLen31           19            7  -62.49%

On an ARM machine (with the previous weekly):

benchmark                old ns/op    new ns/op    delta
big.BenchmarkBitLen0            37           50  +36.56%
big.BenchmarkBitLen1            59           51  -13.69%
big.BenchmarkBitLen2            74           59  -20.40%
big.BenchmarkBitLen3            92           60  -34.89%
big.BenchmarkBitLen4           110           59  -46.09%
big.BenchmarkBitLen5           127           60  -52.68%
big.BenchmarkBitLen8           181           59  -67.24%
big.BenchmarkBitLen9            78           60  -23.05%
big.BenchmarkBitLen16          199           69  -65.13%
big.BenchmarkBitLen17           91           70  -23.17%
big.BenchmarkBitLen31          210           95  -54.43%

R=golang-dev, dave, edsrzf, gri
CC=golang-dev
http://codereview.appspot.com/5570044

Committer: Robert Griesemer <gri@golang.org>

dave_cheney.net

That's a great win for arm, thanks a lot. Sent from my iPhone On 24/01/2012, ...

13 years, 5 months ago (2012-01-23 22:26:03 UTC) #16

That's a great win for arm, thanks a lot. 

Sent from my iPhone

On 24/01/2012, at 8:46, gri@golang.org wrote:

> *** Submitted as 408491a4bde1 ***
> 
> math/big: slight improvement to algorithm used for internal bitLen
> function
> 
> The bitLen function currently shifts out blocks of 8 bits at a time.
> This change replaces this sorta-linear algorithm with a log(N)
> one (shift out 16 bits, then 8, then 4, then 2, then 1).
> I left the start of it linear at 16 bits at a time so that
> the function continues to work with 32 or 64 bit values
> without any funkiness.
> The algorithm is similar to several of the nlz ("number of
> leading zeros") algorithms from "Hacker's Delight" or the
> "bit twiddling hacks" pages.
> 
> Doesn't make a big difference to the existing benchmarks, but
> I'm using the code in a different context that calls bitLen
> much more often, so it seemed worthwhile making the existing
> codebase faster so that it's a better building block.
> 
> Microbenchmark results on a 64-bit Macbook Pro using 6g from
> weekly.2012-01-20:
> 
> benchmark                old ns/op    new ns/op    delta
> big.BenchmarkBitLen0             4            6  +50.12%
> big.BenchmarkBitLen1             4            6  +33.91%
> big.BenchmarkBitLen2             6            6   +3.05%
> big.BenchmarkBitLen3             7            6  -19.05%
> big.BenchmarkBitLen4             9            6  -30.19%
> big.BenchmarkBitLen5            11            6  -42.23%
> big.BenchmarkBitLen8            16            6  -61.78%
> big.BenchmarkBitLen9             5            6  +18.29%
> big.BenchmarkBitLen16           18            7  -60.99%
> big.BenchmarkBitLen17            7            6   -4.64%
> big.BenchmarkBitLen31           19            7  -62.49%
> 
> On an ARM machine (with the previous weekly):
> 
> benchmark                old ns/op    new ns/op    delta
> big.BenchmarkBitLen0            37           50  +36.56%
> big.BenchmarkBitLen1            59           51  -13.69%
> big.BenchmarkBitLen2            74           59  -20.40%
> big.BenchmarkBitLen3            92           60  -34.89%
> big.BenchmarkBitLen4           110           59  -46.09%
> big.BenchmarkBitLen5           127           60  -52.68%
> big.BenchmarkBitLen8           181           59  -67.24%
> big.BenchmarkBitLen9            78           60  -23.05%
> big.BenchmarkBitLen16          199           69  -65.13%
> big.BenchmarkBitLen17           91           70  -23.17%
> big.BenchmarkBitLen31          210           95  -54.43%
> 
> R=golang-dev, dave, edsrzf, gri
> CC=golang-dev
> http://codereview.appspot.com/5570044
> 
> Committer: Robert Griesemer <gri@golang.org>
> 
> 
> http://codereview.appspot.com/5570044/

brtzsnr

Sorry for replying late, but if have already you check the algorithms from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogLookup On ...

13 years, 5 months ago (2012-01-24 21:05:35 UTC) #17

Sorry for replying late, but if have already you check the algorithms from
http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogLookup

On 2012/01/23 22:26:03, dfc wrote:
> That's a great win for arm, thanks a lot. 
> 
> Sent from my iPhone
> 
> On 24/01/2012, at 8:46, mailto:gri@golang.org wrote:
> 
> > *** Submitted as 408491a4bde1 ***
> > 
> > math/big: slight improvement to algorithm used for internal bitLen
> > function
> > 
> > The bitLen function currently shifts out blocks of 8 bits at a time.
> > This change replaces this sorta-linear algorithm with a log(N)
> > one (shift out 16 bits, then 8, then 4, then 2, then 1).
> > I left the start of it linear at 16 bits at a time so that
> > the function continues to work with 32 or 64 bit values
> > without any funkiness.
> > The algorithm is similar to several of the nlz ("number of
> > leading zeros") algorithms from "Hacker's Delight" or the
> > "bit twiddling hacks" pages.
> > 
> > Doesn't make a big difference to the existing benchmarks, but
> > I'm using the code in a different context that calls bitLen
> > much more often, so it seemed worthwhile making the existing
> > codebase faster so that it's a better building block.
> > 
> > Microbenchmark results on a 64-bit Macbook Pro using 6g from
> > weekly.2012-01-20:
> > 
> > benchmark                old ns/op    new ns/op    delta
> > big.BenchmarkBitLen0             4            6  +50.12%
> > big.BenchmarkBitLen1             4            6  +33.91%
> > big.BenchmarkBitLen2             6            6   +3.05%
> > big.BenchmarkBitLen3             7            6  -19.05%
> > big.BenchmarkBitLen4             9            6  -30.19%
> > big.BenchmarkBitLen5            11            6  -42.23%
> > big.BenchmarkBitLen8            16            6  -61.78%
> > big.BenchmarkBitLen9             5            6  +18.29%
> > big.BenchmarkBitLen16           18            7  -60.99%
> > big.BenchmarkBitLen17            7            6   -4.64%
> > big.BenchmarkBitLen31           19            7  -62.49%
> > 
> > On an ARM machine (with the previous weekly):
> > 
> > benchmark                old ns/op    new ns/op    delta
> > big.BenchmarkBitLen0            37           50  +36.56%
> > big.BenchmarkBitLen1            59           51  -13.69%
> > big.BenchmarkBitLen2            74           59  -20.40%
> > big.BenchmarkBitLen3            92           60  -34.89%
> > big.BenchmarkBitLen4           110           59  -46.09%
> > big.BenchmarkBitLen5           127           60  -52.68%
> > big.BenchmarkBitLen8           181           59  -67.24%
> > big.BenchmarkBitLen9            78           60  -23.05%
> > big.BenchmarkBitLen16          199           69  -65.13%
> > big.BenchmarkBitLen17           91           70  -23.17%
> > big.BenchmarkBitLen31          210           95  -54.43%
> > 
> > R=golang-dev, dave, edsrzf, gri
> > CC=golang-dev
> > http://codereview.appspot.com/5570044
> > 
> > Committer: Robert Griesemer <mailto:gri@golang.org>
> > 
> > 
> > http://codereview.appspot.com/5570044/

gri

13 years, 5 months ago (2012-01-24 21:58:54 UTC) #18

It's not clear that a lookup table approach is much faster; also, in
the meantime we have an assembly version CL to be reviewed. At the end
of the day, measurement is the only way to find out.

Thanks for pointing this out.
- gri

On Tue, Jan 24, 2012 at 1:05 PM,  <brtzsnr@gmail.com> wrote:
> Sorry for replying late, but if have already you check the algorithms
> from
> http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogLookup
>
>
> On 2012/01/23 22:26:03, dfc wrote:
>>
>> That's a great win for arm, thanks a lot.
>
>
>> Sent from my iPhone
>
>
>> On 24/01/2012, at 8:46, mailto:gri@golang.org wrote:
>
>
>> > *** Submitted as 408491a4bde1 ***
>> >
>> > math/big: slight improvement to algorithm used for internal bitLen
>> > function
>> >
>> > The bitLen function currently shifts out blocks of 8 bits at a time.
>> > This change replaces this sorta-linear algorithm with a log(N)
>> > one (shift out 16 bits, then 8, then 4, then 2, then 1).
>> > I left the start of it linear at 16 bits at a time so that
>> > the function continues to work with 32 or 64 bit values
>> > without any funkiness.
>> > The algorithm is similar to several of the nlz ("number of
>> > leading zeros") algorithms from "Hacker's Delight" or the
>> > "bit twiddling hacks" pages.
>> >
>> > Doesn't make a big difference to the existing benchmarks, but
>> > I'm using the code in a different context that calls bitLen
>> > much more often, so it seemed worthwhile making the existing
>> > codebase faster so that it's a better building block.
>> >
>> > Microbenchmark results on a 64-bit Macbook Pro using 6g from
>> > weekly.2012-01-20:
>> >
>> > benchmark                old ns/op    new ns/op    delta
>> > big.BenchmarkBitLen0             4            6  +50.12%
>> > big.BenchmarkBitLen1             4            6  +33.91%
>> > big.BenchmarkBitLen2             6            6   +3.05%
>> > big.BenchmarkBitLen3             7            6  -19.05%
>> > big.BenchmarkBitLen4             9            6  -30.19%
>> > big.BenchmarkBitLen5            11            6  -42.23%
>> > big.BenchmarkBitLen8            16            6  -61.78%
>> > big.BenchmarkBitLen9             5            6  +18.29%
>> > big.BenchmarkBitLen16           18            7  -60.99%
>> > big.BenchmarkBitLen17            7            6   -4.64%
>> > big.BenchmarkBitLen31           19            7  -62.49%
>> >
>> > On an ARM machine (with the previous weekly):
>> >
>> > benchmark                old ns/op    new ns/op    delta
>> > big.BenchmarkBitLen0            37           50  +36.56%
>> > big.BenchmarkBitLen1            59           51  -13.69%
>> > big.BenchmarkBitLen2            74           59  -20.40%
>> > big.BenchmarkBitLen3            92           60  -34.89%
>> > big.BenchmarkBitLen4           110           59  -46.09%
>> > big.BenchmarkBitLen5           127           60  -52.68%
>> > big.BenchmarkBitLen8           181           59  -67.24%
>> > big.BenchmarkBitLen9            78           60  -23.05%
>> > big.BenchmarkBitLen16          199           69  -65.13%
>> > big.BenchmarkBitLen17           91           70  -23.17%
>> > big.BenchmarkBitLen31          210           95  -54.43%
>> >
>> > R=golang-dev, dave, edsrzf, gri
>> > CC=golang-dev
>> > http://codereview.appspot.com/5570044
>> >
>> > Committer: Robert Griesemer <mailto:gri@golang.org>
>> >
>> >
>> > http://codereview.appspot.com/5570044/
>
>
>
>
> http://codereview.appspot.com/5570044/

Expand All Messages | Collapse All Messages