Issue 4573041: big: Improved speed of big-to-string conversion for printing

Can't Edit
Can't Publish+Mail
Start Review

Created:
14 years, 1 month ago by mtj

Modified:
14 years, 1 month ago

Reviewers:
gri

CC:
golang-dev

Visibility:
Public.

Description

big: Improved speed of big-to-string conversion for printing Three optimizations: First, special-case power of two bases that partion a Word(), bases 2, 4, 16, and 256. These can be moved directly from internal Word() storage to the output without multiprecision operations. Next, same approach for the other power-of-two bases, 8, 32, 64, and 128. These don't fill a Word() evenly, so special handling is needed for those cases where input spans the high-bits of one Word and the low bis of the next one. Finally, implement the general case for others bases in 2 <= base <= 256 using superbases, the largest power of base representable in a Word(). For base ten, this is 9 digits and a superbase of 10^9 for 32-bit Words and 19 digits and 10^19 for 64-bit compiles. This way we do just 1/9th or 1/19th of the expensive multiprecision divisions, unpacking superdigits using fast native machine arithmetic. Speedups compared to the previous version range from 200x the rate (hexadadecimal) down to 17x for decimal in 64 bit, and faster still (relatively) in 32-bit since the multi- precision code is not as tuned there.

Patch Set 1 #

Patch Set 2 : diff -r fa2bae213af8 https://go.googlecode.com/hg/ #

Total comments: 57

Created: 14 years, 1 month ago

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+185 lines, -47 lines)			Patch
	M	src/pkg/big/nat.go	View	1	7 chunks	+185 lines, -47 lines	57 comments	Download

Messages

Total messages: 3

Expand All Messages | Collapse All Messages

gri

Very nice. Lots of comments, but mostly on the use of variable names. I tried ...

14 years, 1 month ago (2011-06-03 22:56:46 UTC) #1

mtj

Wow, so wonderfully assistive! Thanks! I did all but three of the suggested changes. Details ...

14 years, 1 month ago (2011-06-04 08:21:42 UTC) #2

Wow, so wonderfully assistive! Thanks!

I did all but three of the suggested changes. Details of the three are in the
edits below, but summary is: 

1. Loop construct in second instance is a little bit subtle
2. Comment about bb loop needs some kind of caveat
3. My general struggle with Go int size conformance. 

Did everything else, and did as much of these as I understood to be right and
clear (little of #1, most of #2, none of #3.) Ready for more feedback. ;-)

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go
File src/pkg/big/nat.go (right):

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode114
src/pkg/big/nat.go:114: func (z nat) appendBits(bits uint, w Word) nat {
On 2011/06/03 22:56:46, gri wrote:
> this function doen't appear to be used

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode628
src/pkg/big/nat.go:628: const MaxBase = ('z' - 'a' + 1) + 10 // = hexValue('z')
+ 1
On 2011/06/03 22:56:46, gri wrote:
> If you want to add parentheses, please group as:
> 
> ('z' - 'a' + 10) + 1
> 
> since that matches hexValue('z') + 1

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode643
src/pkg/big/nat.go:643: // scan returns the natural number corresponding to the
longest
On 2011/06/03 22:56:46, gri wrote:
> Please leave this comment unchanged (looks like you merged in the old comment
> again).

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode756
src/pkg/big/nat.go:756: // handle power-of-two bases where digits partition fill
Word
On 2011/06/03 22:56:46, gri wrote:
> s/partition fill/completely partition/

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode758
src/pkg/big/nat.go:758: switch {
On 2011/06/03 22:56:46, gri wrote:
> change to:
> 
> switch base {
> case 2:
>    shift = 1
> case 4:
>    shift = 2
> case 16:
>    shift = 4
> case 256:
>    shift = 8
> }

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode769
src/pkg/big/nat.go:769: words := len(x)
On 2011/06/03 22:56:46, gri wrote:
> s/words/m/
> 
> The convention in this file is to use m for len(x), and n for len(y). This
> convention is used fairly rigorously, please use it, too.

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode770
src/pkg/big/nat.go:770: digits := int(_W / shift)
On 2011/06/03 22:56:46, gri wrote:
> s/digits/ndigits/
> 
> see also comment below (line 845), for consistency

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode773
src/pkg/big/nat.go:773: for w := 0; w < words-1; w++ {
On 2011/06/03 22:56:46, gri wrote:
> for _, w := range x[0:m-1] { ...
> 
> (with m = len(x) from above)
> 
> and use w instead of group

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode775
src/pkg/big/nat.go:775: for d := 0; d < digits; d++ {
On 2011/06/03 22:56:46, gri wrote:
> s/d/j/
> 
> (you use j for the equivalent loop in the general case below)

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode781
src/pkg/big/nat.go:781: group := x[words-1]
On 2011/06/03 22:56:46, gri wrote:
> w := x[m-1]

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode782
src/pkg/big/nat.go:782: for d := 0; d < digits && group != 0; d++ { //
most-significant word (omit leading zeros)
On 2011/06/03 22:56:46, gri wrote:
> s/d/j/

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode803
src/pkg/big/nat.go:803: words := len(x)
On 2011/06/03 22:56:46, gri wrote:
> same here: s/words/m/
> 
> Also, can do this once in the beginning, for all code.

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode804
src/pkg/big/nat.go:804: mask := Word((1 << shift) - 1)
On 2011/06/03 22:56:46, gri wrote:
> mask := Word(1) << shift - 1

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode805
src/pkg/big/nat.go:805: bits := uint(_W) // unconverted bits in group
On 2011/06/03 22:56:46, gri wrote:
> // remaining bits in word w
> 
> ??

changed comment:
nbits := uint(_W) // number of unprocessed bits in w

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode805
src/pkg/big/nat.go:805: bits := uint(_W) // unconverted bits in group
On 2011/06/03 22:56:46, gri wrote:
> // remaining bits in word w
> 
> ??

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode809
src/pkg/big/nat.go:809: for w := 0; w < words-1; {
On 2011/06/03 22:56:46, gri wrote:
> for j, w := range x[0:m-1] { ...
> 
> and use j where you use w below

changed variable names but not loop construct. also moved the index increment up
to the loop construct. This is different than the loop above. Instead of feeding
words to a parser we have a parser that consumes words and changes the word it
is peeking at. I think it would be strange to modify the input number in situ
just so that the range-based loop could habd the same Word back to me on the
next iteration...so...I want to say no to this change. No ego about it, happy to
have a better arrangement, better code style, better use of Go idioms, etc. Just
need to express idea that a bb-sized window is scanning across the huge array of
words from right to left What is expressed is "for window starts at right, LSD
side; while window has not reached the word containing the MSD; keep sliding.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode842
src/pkg/big/nat.go:842: // compute maximum number of base 'base' digits which
fit in a Word without filling it
On 2011/06/03 22:56:46, gri wrote:
> s/without filling it/entirely/
> 
> (it would be ok if they fill the word, except those cases are already handled
> above)

true, but my code and yours both undercount by one in those specific cases. I
did not want to burn in effigy later when someone figured that out. ;-)

Try this...


package main
import ("fmt")

const _M = 1<<64 - 1

func main() {
    for b := uint64(2); b <= 256; b++ {
	bb := uint64(1)
	ndigits := 0
	for max := uint64(_M/b); bb <= max; bb *= b {
	    ndigits++
	}
	fmt.Printf("base %3d: %2d %20d\n", b, ndigits, bb)
    	switch b {
	case 2, 4, 16, 256:
	    fmt.Printf("    NOTE: should be %3d: %2d and bb = 2^64\n", b, ndigits+1)
	}
    }
}

...I will gladly change the comment as long as it remains true. You can see if
what I did works for you.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode845
src/pkg/big/nat.go:845: for t := Word(_M); t >= base; t /= base {
On 2011/06/03 22:56:46, gri wrote:
> This can be done w/o repeated division:
> 
> ndigits := 0
> for max := _M/base; groupBase <= max; groupBase *= base {
> 	ndigits++
> }
> 
> Also: no need for groupDigits to be of type Word; int is fine, and you can get
> rid of a whole bunch of conversions below.
> 
> In the scan code above, I call the base "b" and the groupBase "bb". Perhaps
use
> the same names? And can we call groupDigits "ndigits" instead?  (groupDigits
> sounds like it is the digits of the group, but it's really the number of
digits
> = ndigits).

am i assured that int and uintptr (aka, type Word) are of the same size in 32
bit and 64 bit modes?

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode852
src/pkg/big/nat.go:852: r := Word(0)
On 2011/06/03 22:56:46, gri wrote:
> var r Word
> 
> (this is the style used elsewhere where we don't want to make a point that the
> initial value matters)

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode855
src/pkg/big/nat.go:855: if base == 10 { // hard-coding for ten here speeds this
up by 1.25x
On 2011/06/03 22:56:46, gri wrote:
> s/ten/10/
> 
> to match the "base == 10"

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode858
src/pkg/big/nat.go:858: q, r = q.divW(q, groupBase) // N.B. >82% of time is
here. Optimize divWVW
On 2011/06/03 22:56:46, gri wrote:
> s/divWVW/divW/

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode860
src/pkg/big/nat.go:860: group := r
On 2011/06/03 22:56:46, gri wrote:
> get rid of this declaration and just use r instead of group below

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode863
src/pkg/big/nat.go:863: for j := Word(0); j < groupDigits && group != 0; j++ {
On 2011/06/03 22:56:46, gri wrote:
> j := 0

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode869
src/pkg/big/nat.go:869: for j := Word(0); j < groupDigits; j++ {
On 2011/06/03 22:56:46, gri wrote:
> j := 0

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode879
src/pkg/big/nat.go:879: q, r = q.divW(q, groupBase) // N.B. >82% of time is
here. Optimize divWVW
On 2011/06/03 22:56:46, gri wrote:
> s/divWVW/divW/

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode881
src/pkg/big/nat.go:881: group := r
On 2011/06/03 22:56:46, gri wrote:
> get rid of this declaration and just use r instead of group below

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode884
src/pkg/big/nat.go:884: for j := Word(0); j < groupDigits && group != 0; j++ {
On 2011/06/03 22:56:46, gri wrote:
> j := 0

Done.

http://codereview.appspot.com/4573041/diff/2001/src/pkg/big/nat.go#newcode890
src/pkg/big/nat.go:890: for j := Word(0); j < groupDigits; j++ {
On 2011/06/03 22:56:46, gri wrote:
> j := 0

Done.

mtj

14 years, 1 month ago (2011-06-04 19:19:53 UTC) #3

I also added benchmark tests for bases 2, 8, 10, and 16. Here are the results
with the old code:


mtj-macbookpro:big mtj$ make bench
gotest -test.bench=. -test.run="Do not run tests"
rm -f _test/big.a
rm -f _test/big.a
gopack grc _test/big.a _gotest_.6 arith_amd64.6 
PASS
big.BenchmarkHilbert	     500	   5833338 ns/op
big.BenchmarkBitset	20000000	       132 ns/op
big.BenchmarkBitsetNeg	10000000	       240 ns/op
big.BenchmarkBitsetOrig	 5000000	       606 ns/op
big.BenchmarkBitsetNegOrig	 1000000	      1029 ns/op
big.BenchmarkMul	     500	   3193490 ns/op
big.BenchmarkScanPi	   10000	    243586 ns/op
big.BenchmarkString2	      10	 137214700 ns/op
big.BenchmarkString8	      50	  61214940 ns/op
big.BenchmarkString10	      50	  56151740 ns/op
big.BenchmarkString16	      50	  47801960 ns/op

...and with the new...


mtj-macbookpro:big mtj$ make bench
gotest -test.bench=. -test.run="Do not run tests"
rm -f _test/big.a
rm -f _test/big.a
gopack grc _test/big.a _gotest_.6 arith_amd64.6 
PASS
big.BenchmarkHilbert	     500	   5823440 ns/op
big.BenchmarkBitset	20000000	       133 ns/op
big.BenchmarkBitsetNeg	10000000	       238 ns/op
big.BenchmarkBitsetOrig	 5000000	       601 ns/op
big.BenchmarkBitsetNegOrig	 1000000	      1029 ns/op
big.BenchmarkMul	     500	   3190486 ns/op
big.BenchmarkScanPi	   10000	    243989 ns/op
big.BenchmarkString2	   10000	    203773 ns/op
big.BenchmarkString8	   50000	     72232 ns/op
big.BenchmarkString10	     500	   3124502 ns/op
big.BenchmarkString16	   50000	     53168 ns/op

...the speedups are notable.

"%b" = 673x
"%o" = 847x
"%d" = 17x
"%x" = 899x

Expand All Messages | Collapse All Messages