|
|
Descriptionbig: Improved speed of big-to-string conversion for printing
Three optimizations: First, special-case power of two bases
that partion a Word(), bases 2, 4, 16, and 256. These can
be moved directly from internal Word() storage to the output
without multiprecision operations. Next, same approach for
the other power-of-two bases, 8, 32, 64, and 128. These
don't fill a Word() evenly, so special handling is needed
for those cases where input spans the high-bits of one Word
and the low bis of the next one. Finally, implement the
general case for others bases in 2 <= base <= 256 using
superbases, the largest power of base representable in a
Word(). For base ten, this is 9 digits and a superbase of
10^9 for 32-bit Words and 19 digits and 10^19 for 64-bit
compiles. This way we do just 1/9th or 1/19th of the expensive
multiprecision divisions, unpacking superdigits using fast
native machine arithmetic.
Speedups compared to the previous version range from 200x
the rate (hexadadecimal) down to 17x for decimal in 64 bit,
and faster still (relatively) in 32-bit since the multi-
precision code is not as tuned there.
Patch Set 1 #Patch Set 2 : diff -r fa2bae213af8 https://go.googlecode.com/hg/ #
Total comments: 57
MessagesTotal messages: 3
Very nice. Lots of comments, but mostly on the use of variable names. I tried to be fairly consistent in the use of variable names in this code because there are often many variables, and it's easy to get confused. The conventions I have used: m, n for the lengths of vectors len(x), len(y) i, j, k for indices d, r for digits dd, w for super-digits (what you call a group), or a word Please add a benchmark to nat_test.go so we can track the performance. For an example, look at nat_test.go:355 . Regarding the faster scanning code you sent: Let's just do this CL first, and then we'll gave another look at that code. At that point it may even make sense to move all this conversion code into a separate file (say conv.go) as it is becoming fairly substantial. Also a separate CL. Finally: Please keep two empty lines between functions for improved readability. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go File src/pkg/big/nat.go (right): http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode114 src/pkg/big/nat.go:114: func (z nat) appendBits(bits uint, w Word) nat { this function doen't appear to be used http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode628 src/pkg/big/nat.go:628: const MaxBase = ('z' - 'a' + 1) + 10 // = hexValue('z') + 1 If you want to add parentheses, please group as: ('z' - 'a' + 10) + 1 since that matches hexValue('z') + 1 http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode643 src/pkg/big/nat.go:643: // scan returns the natural number corresponding to the longest Please leave this comment unchanged (looks like you merged in the old comment again). http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode756 src/pkg/big/nat.go:756: // handle power-of-two bases where digits partition fill Word s/partition fill/completely partition/ http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode758 src/pkg/big/nat.go:758: switch { change to: switch base { case 2: shift = 1 case 4: shift = 2 case 16: shift = 4 case 256: shift = 8 } http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode769 src/pkg/big/nat.go:769: words := len(x) s/words/m/ The convention in this file is to use m for len(x), and n for len(y). This convention is used fairly rigorously, please use it, too. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode770 src/pkg/big/nat.go:770: digits := int(_W / shift) s/digits/ndigits/ see also comment below (line 845), for consistency http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode771 src/pkg/big/nat.go:771: mask := Word((1 << shift) - 1) mask := Word(1) << shift - 1 - no need for extra parens, << is a multiplicative operator in Go - also, make sure that 1 is of type Word http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode773 src/pkg/big/nat.go:773: for w := 0; w < words-1; w++ { for _, w := range x[0:m-1] { ... (with m = len(x) from above) and use w instead of group http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode774 src/pkg/big/nat.go:774: group := x[w] get rid of this http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode775 src/pkg/big/nat.go:775: for d := 0; d < digits; d++ { s/d/j/ (you use j for the equivalent loop in the general case below) http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode781 src/pkg/big/nat.go:781: group := x[words-1] w := x[m-1] http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode782 src/pkg/big/nat.go:782: for d := 0; d < digits && group != 0; d++ { // most-significant word (omit leading zeros) s/d/j/ http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode803 src/pkg/big/nat.go:803: words := len(x) same here: s/words/m/ Also, can do this once in the beginning, for all code. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode804 src/pkg/big/nat.go:804: mask := Word((1 << shift) - 1) mask := Word(1) << shift - 1 http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode805 src/pkg/big/nat.go:805: bits := uint(_W) // unconverted bits in group // remaining bits in word w ?? http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode809 src/pkg/big/nat.go:809: for w := 0; w < words-1; { for j, w := range x[0:m-1] { ... and use j where you use w below http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode842 src/pkg/big/nat.go:842: // compute maximum number of base 'base' digits which fit in a Word without filling it s/without filling it/entirely/ (it would be ok if they fill the word, except those cases are already handled above) http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode845 src/pkg/big/nat.go:845: for t := Word(_M); t >= base; t /= base { This can be done w/o repeated division: ndigits := 0 for max := _M/base; groupBase <= max; groupBase *= base { ndigits++ } Also: no need for groupDigits to be of type Word; int is fine, and you can get rid of a whole bunch of conversions below. In the scan code above, I call the base "b" and the groupBase "bb". Perhaps use the same names? And can we call groupDigits "ndigits" instead? (groupDigits sounds like it is the digits of the group, but it's really the number of digits = ndigits). http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode852 src/pkg/big/nat.go:852: r := Word(0) var r Word (this is the style used elsewhere where we don't want to make a point that the initial value matters) http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode855 src/pkg/big/nat.go:855: if base == 10 { // hard-coding for ten here speeds this up by 1.25x s/ten/10/ to match the "base == 10" http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode858 src/pkg/big/nat.go:858: q, r = q.divW(q, groupBase) // N.B. >82% of time is here. Optimize divWVW s/divWVW/divW/ http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode860 src/pkg/big/nat.go:860: group := r get rid of this declaration and just use r instead of group below http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode863 src/pkg/big/nat.go:863: for j := Word(0); j < groupDigits && group != 0; j++ { j := 0 http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode869 src/pkg/big/nat.go:869: for j := Word(0); j < groupDigits; j++ { j := 0 http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode879 src/pkg/big/nat.go:879: q, r = q.divW(q, groupBase) // N.B. >82% of time is here. Optimize divWVW s/divWVW/divW/ http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode881 src/pkg/big/nat.go:881: group := r get rid of this declaration and just use r instead of group below http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode884 src/pkg/big/nat.go:884: for j := Word(0); j < groupDigits && group != 0; j++ { j := 0 http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode890 src/pkg/big/nat.go:890: for j := Word(0); j < groupDigits; j++ { j := 0
Sign in to reply to this message.
Wow, so wonderfully assistive! Thanks! I did all but three of the suggested changes. Details of the three are in the edits below, but summary is: 1. Loop construct in second instance is a little bit subtle 2. Comment about bb loop needs some kind of caveat 3. My general struggle with Go int size conformance. Did everything else, and did as much of these as I understood to be right and clear (little of #1, most of #2, none of #3.) Ready for more feedback. ;-) http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go File src/pkg/big/nat.go (right): http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode114 src/pkg/big/nat.go:114: func (z nat) appendBits(bits uint, w Word) nat { On 2011/06/03 22:56:46, gri wrote: > this function doen't appear to be used Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode628 src/pkg/big/nat.go:628: const MaxBase = ('z' - 'a' + 1) + 10 // = hexValue('z') + 1 On 2011/06/03 22:56:46, gri wrote: > If you want to add parentheses, please group as: > > ('z' - 'a' + 10) + 1 > > since that matches hexValue('z') + 1 Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode643 src/pkg/big/nat.go:643: // scan returns the natural number corresponding to the longest On 2011/06/03 22:56:46, gri wrote: > Please leave this comment unchanged (looks like you merged in the old comment > again). Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode756 src/pkg/big/nat.go:756: // handle power-of-two bases where digits partition fill Word On 2011/06/03 22:56:46, gri wrote: > s/partition fill/completely partition/ Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode758 src/pkg/big/nat.go:758: switch { On 2011/06/03 22:56:46, gri wrote: > change to: > > switch base { > case 2: > shift = 1 > case 4: > shift = 2 > case 16: > shift = 4 > case 256: > shift = 8 > } Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode769 src/pkg/big/nat.go:769: words := len(x) On 2011/06/03 22:56:46, gri wrote: > s/words/m/ > > The convention in this file is to use m for len(x), and n for len(y). This > convention is used fairly rigorously, please use it, too. Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode770 src/pkg/big/nat.go:770: digits := int(_W / shift) On 2011/06/03 22:56:46, gri wrote: > s/digits/ndigits/ > > see also comment below (line 845), for consistency Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode773 src/pkg/big/nat.go:773: for w := 0; w < words-1; w++ { On 2011/06/03 22:56:46, gri wrote: > for _, w := range x[0:m-1] { ... > > (with m = len(x) from above) > > and use w instead of group Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode775 src/pkg/big/nat.go:775: for d := 0; d < digits; d++ { On 2011/06/03 22:56:46, gri wrote: > s/d/j/ > > (you use j for the equivalent loop in the general case below) Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode781 src/pkg/big/nat.go:781: group := x[words-1] On 2011/06/03 22:56:46, gri wrote: > w := x[m-1] Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode782 src/pkg/big/nat.go:782: for d := 0; d < digits && group != 0; d++ { // most-significant word (omit leading zeros) On 2011/06/03 22:56:46, gri wrote: > s/d/j/ Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode803 src/pkg/big/nat.go:803: words := len(x) On 2011/06/03 22:56:46, gri wrote: > same here: s/words/m/ > > Also, can do this once in the beginning, for all code. Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode804 src/pkg/big/nat.go:804: mask := Word((1 << shift) - 1) On 2011/06/03 22:56:46, gri wrote: > mask := Word(1) << shift - 1 Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode805 src/pkg/big/nat.go:805: bits := uint(_W) // unconverted bits in group On 2011/06/03 22:56:46, gri wrote: > // remaining bits in word w > > ?? changed comment: nbits := uint(_W) // number of unprocessed bits in w http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode805 src/pkg/big/nat.go:805: bits := uint(_W) // unconverted bits in group On 2011/06/03 22:56:46, gri wrote: > // remaining bits in word w > > ?? Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode809 src/pkg/big/nat.go:809: for w := 0; w < words-1; { On 2011/06/03 22:56:46, gri wrote: > for j, w := range x[0:m-1] { ... > > and use j where you use w below changed variable names but not loop construct. also moved the index increment up to the loop construct. This is different than the loop above. Instead of feeding words to a parser we have a parser that consumes words and changes the word it is peeking at. I think it would be strange to modify the input number in situ just so that the range-based loop could habd the same Word back to me on the next iteration...so...I want to say no to this change. No ego about it, happy to have a better arrangement, better code style, better use of Go idioms, etc. Just need to express idea that a bb-sized window is scanning across the huge array of words from right to left What is expressed is "for window starts at right, LSD side; while window has not reached the word containing the MSD; keep sliding. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode842 src/pkg/big/nat.go:842: // compute maximum number of base 'base' digits which fit in a Word without filling it On 2011/06/03 22:56:46, gri wrote: > s/without filling it/entirely/ > > (it would be ok if they fill the word, except those cases are already handled > above) true, but my code and yours both undercount by one in those specific cases. I did not want to burn in effigy later when someone figured that out. ;-) Try this... package main import ("fmt") const _M = 1<<64 - 1 func main() { for b := uint64(2); b <= 256; b++ { bb := uint64(1) ndigits := 0 for max := uint64(_M/b); bb <= max; bb *= b { ndigits++ } fmt.Printf("base %3d: %2d %20d\n", b, ndigits, bb) switch b { case 2, 4, 16, 256: fmt.Printf(" NOTE: should be %3d: %2d and bb = 2^64\n", b, ndigits+1) } } } ...I will gladly change the comment as long as it remains true. You can see if what I did works for you. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode845 src/pkg/big/nat.go:845: for t := Word(_M); t >= base; t /= base { On 2011/06/03 22:56:46, gri wrote: > This can be done w/o repeated division: > > ndigits := 0 > for max := _M/base; groupBase <= max; groupBase *= base { > ndigits++ > } > > Also: no need for groupDigits to be of type Word; int is fine, and you can get > rid of a whole bunch of conversions below. > > In the scan code above, I call the base "b" and the groupBase "bb". Perhaps use > the same names? And can we call groupDigits "ndigits" instead? (groupDigits > sounds like it is the digits of the group, but it's really the number of digits > = ndigits). am i assured that int and uintptr (aka, type Word) are of the same size in 32 bit and 64 bit modes? http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode852 src/pkg/big/nat.go:852: r := Word(0) On 2011/06/03 22:56:46, gri wrote: > var r Word > > (this is the style used elsewhere where we don't want to make a point that the > initial value matters) Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode855 src/pkg/big/nat.go:855: if base == 10 { // hard-coding for ten here speeds this up by 1.25x On 2011/06/03 22:56:46, gri wrote: > s/ten/10/ > > to match the "base == 10" Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode858 src/pkg/big/nat.go:858: q, r = q.divW(q, groupBase) // N.B. >82% of time is here. Optimize divWVW On 2011/06/03 22:56:46, gri wrote: > s/divWVW/divW/ Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode860 src/pkg/big/nat.go:860: group := r On 2011/06/03 22:56:46, gri wrote: > get rid of this declaration and just use r instead of group below Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode863 src/pkg/big/nat.go:863: for j := Word(0); j < groupDigits && group != 0; j++ { On 2011/06/03 22:56:46, gri wrote: > j := 0 Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode869 src/pkg/big/nat.go:869: for j := Word(0); j < groupDigits; j++ { On 2011/06/03 22:56:46, gri wrote: > j := 0 Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode879 src/pkg/big/nat.go:879: q, r = q.divW(q, groupBase) // N.B. >82% of time is here. Optimize divWVW On 2011/06/03 22:56:46, gri wrote: > s/divWVW/divW/ Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode881 src/pkg/big/nat.go:881: group := r On 2011/06/03 22:56:46, gri wrote: > get rid of this declaration and just use r instead of group below Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode884 src/pkg/big/nat.go:884: for j := Word(0); j < groupDigits && group != 0; j++ { On 2011/06/03 22:56:46, gri wrote: > j := 0 Done. http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode890 src/pkg/big/nat.go:890: for j := Word(0); j < groupDigits; j++ { On 2011/06/03 22:56:46, gri wrote: > j := 0 Done.
Sign in to reply to this message.
I also added benchmark tests for bases 2, 8, 10, and 16. Here are the results with the old code: mtj-macbookpro:big mtj$ make bench gotest -test.bench=. -test.run="Do not run tests" rm -f _test/big.a rm -f _test/big.a gopack grc _test/big.a _gotest_.6 arith_amd64.6 PASS big.BenchmarkHilbert 500 5833338 ns/op big.BenchmarkBitset 20000000 132 ns/op big.BenchmarkBitsetNeg 10000000 240 ns/op big.BenchmarkBitsetOrig 5000000 606 ns/op big.BenchmarkBitsetNegOrig 1000000 1029 ns/op big.BenchmarkMul 500 3193490 ns/op big.BenchmarkScanPi 10000 243586 ns/op big.BenchmarkString2 10 137214700 ns/op big.BenchmarkString8 50 61214940 ns/op big.BenchmarkString10 50 56151740 ns/op big.BenchmarkString16 50 47801960 ns/op ...and with the new... mtj-macbookpro:big mtj$ make bench gotest -test.bench=. -test.run="Do not run tests" rm -f _test/big.a rm -f _test/big.a gopack grc _test/big.a _gotest_.6 arith_amd64.6 PASS big.BenchmarkHilbert 500 5823440 ns/op big.BenchmarkBitset 20000000 133 ns/op big.BenchmarkBitsetNeg 10000000 238 ns/op big.BenchmarkBitsetOrig 5000000 601 ns/op big.BenchmarkBitsetNegOrig 1000000 1029 ns/op big.BenchmarkMul 500 3190486 ns/op big.BenchmarkScanPi 10000 243989 ns/op big.BenchmarkString2 10000 203773 ns/op big.BenchmarkString8 50000 72232 ns/op big.BenchmarkString10 500 3124502 ns/op big.BenchmarkString16 50000 53168 ns/op ...the speedups are notable. "%b" = 673x "%o" = 847x "%d" = 17x "%x" = 899x
Sign in to reply to this message.
|