Issue 1004042: code review 1004042: big: implemented Karatsuba multiplication

Issue 1004042: code review 1004042: big: implemented Karatsuba multiplication (Closed)

Can't Edit
Can't Publish+Mail
Start Review

Created:
13 years, 12 months ago by gri

Modified:
4 years, 2 months ago

Reviewers:
rsc1, adonovan

CC:
rsc, golang-dev

Visibility:
Public.

Description

big: implemented Karatsuba multiplication Plus: - basic Fact benchmark - extra multiplication tests - some unrelated cleanups This improves performance of 1e5! by almost a factor of 2 (largest multiply has > 500000bit operands) gotest -benchmarks=Fact big.BenchmarkFact 5 329760400 ns/op big.BenchmarkFact 5 651693600 ns/op (w/o Karatsuba) There's no impact on pidigits for -n=10000 or -n=20000 because the multiplies are too small. At the moment Karatsuba multiplies kick in if both operands have at least 256 words (16384bits on a 64bit machine); this can probably be improved quite a bit.

Patch Set 1 #

Patch Set 2 : code review 1004042: big: implemented Karatsuba multiplication #

Patch Set 3 : code review 1004042: big: implemented Karatsuba multiplication #

Total comments: 22

Patch Set 4 : code review 1004042: big: implemented Karatsuba multiplication #

Patch Set 5 : code review 1004042: big: implemented Karatsuba multiplication #

Total comments: 1

Patch Set 6 : code review 1004042: big: implemented Karatsuba multiplication #

Patch Set 7 : code review 1004042: big: implemented Karatsuba multiplication #

Patch Set 8 : code review 1004042: big: implemented Karatsuba multiplication #

Patch Set 9 : code review 1004042: big: implemented Karatsuba multiplication #

Patch Set 10 : code review 1004042: big: implemented Karatsuba multiplication #

Total comments: 10

Patch Set 11 : code review 1004042: big: implemented Karatsuba multiplication #

Patch Set 12 : code review 1004042: big: implemented Karatsuba multiplication #

Patch Set 13 : code review 1004042: big: implemented Karatsuba multiplication #

Patch Set 14 : code review 1004042: big: implemented Karatsuba multiplication #

Patch Set 15 : code review 1004042: big: implemented Karatsuba multiplication #

Total comments: 1

Patch Set 16 : code review 1004042: big: implemented Karatsuba multiplication #

Total comments: 2

Created: 13 years, 12 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+491 lines, -92 lines)			Patch
A	src/pkg/big/calibrate_test.go	View	13 14 15	1 chunk	+91 lines, -0 lines	0 comments	Download
M	src/pkg/big/int.go	View	1 2 3 4 5 6	3 chunks	+3 lines, -3 lines	0 comments	Download
M	src/pkg/big/int_test.go	View	1 2 3 4 5	5 chunks	+47 lines, -30 lines	0 comments	Download
M	src/pkg/big/nat.go	View	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15	26 chunks	+292 lines, -59 lines	2 comments	Download
M	src/pkg/big/nat_test.go	View	1 2 3 4 5 6 7 8 9 10	1 chunk	+58 lines, -0 lines	0 comments	Download

Messages

Total messages: 14

Expand All Messages | Collapse All Messages

gri

Hello rsc (cc: golang-dev@googlegroups.com), I'd like you to review this change.

13 years, 12 months ago (2010-04-25 23:37:48 UTC) #1

rsc

Have you measured where the crossover is to sanity check the choice of 256? Does ...

13 years, 12 months ago (2010-04-25 23:44:29 UTC) #2

rsc

On Sun, Apr 25, 2010 at 16:44, Russ Cox <rsc@golang.org> wrote: > Have you measured ...

13 years, 12 months ago (2010-04-25 23:47:04 UTC) #3

rsc1

Very neat. http://codereview.appspot.com/1004042/diff/6001/7003 File src/pkg/big/nat.go (right): http://codereview.appspot.com/1004042/diff/6001/7003#newcode228 src/pkg/big/nat.go:228: const karatsubaThreshold = 245 CL desc says ...

13 years, 12 months ago (2010-04-26 04:40:07 UTC) #4

gri

On Sun, Apr 25, 2010 at 4:47 PM, Russ Cox <rsc@golang.org> wrote: > On Sun, ...

13 years, 12 months ago (2010-04-26 04:41:08 UTC) #5

gri

I also updated the CL description (removed Int.Fact). http://codereview.appspot.com/1004042/diff/6001/7001 File src/pkg/big/int.go (right): http://codereview.appspot.com/1004042/diff/6001/7001#newcode320 src/pkg/big/int.go:320: // ...

13 years, 12 months ago (2010-04-26 05:32:39 UTC) #6

I also updated the CL description (removed Int.Fact).

http://codereview.appspot.com/1004042/diff/6001/7001
File src/pkg/big/int.go (right):

http://codereview.appspot.com/1004042/diff/6001/7001#newcode320
src/pkg/big/int.go:320: // Fact sets z to n! (factorial of n).
Removed.

http://codereview.appspot.com/1004042/diff/6001/7003
File src/pkg/big/nat.go (right):

http://codereview.appspot.com/1004042/diff/6001/7003#newcode228
src/pkg/big/nat.go:228: const karatsubaThreshold = 245
On 2010/04/26 04:40:07, rsc1 wrote:
> CL desc says 256.
> 

See larger note in mul. For instance, say n = 250: this is above the threshold
and thus will trigger the karatsuba path. The largest power of 2 not greater
than 250 is k = 128. This will lead to a split of x and y into digits such that
x0, y0 have length 128. These large digits are multiplied which in turn requires
4 mul calls, all of which operating on digits smaller than 245 and so karatsuba
is not invoked. Only if n >= 256, karatsuba is invoked; i.e. for karatsuba the
effective threshold is a power of 2. Interestingly, even though karatsuba is
not invoked in this case, it nevertheless runs faster than if the the threshold
is set to 256, which is why I left it. One could split up the two numbers but
that doesn't make it easier to tune nor does it change the characteristics.

http://codereview.appspot.com/1004042/diff/6001/7003#newcode230
src/pkg/big/nat.go:230: func init() {
neat! done.

http://codereview.appspot.com/1004042/diff/6001/7003#newcode237
src/pkg/big/nat.go:237: func karatsubaAdd(z, x nat, n int) {
On 2010/04/26 04:40:07, rsc1 wrote:
> Add a comment saying what this routine does and why
> it's split out.  Is it the same as
> 
> z[0:n+n>>1].add(z[0:n], x[0:n])
> 
> ?
> 

Done.

http://codereview.appspot.com/1004042/diff/6001/7003#newcode244
src/pkg/big/nat.go:244: func karatsubaSub(z, x nat, n int) {
On 2010/04/26 04:40:07, rsc1 wrote:
> Same.
> 

Done.

http://codereview.appspot.com/1004042/diff/6001/7003#newcode252
src/pkg/big/nat.go:252: // Both x and y must have the same length and n must be
a
On 2010/04/26 04:40:07, rsc1 wrote:
> s/same length/same length n/
> 

Done.

http://codereview.appspot.com/1004042/diff/6001/7003#newcode267
src/pkg/big/nat.go:267: z[n+i] = addMulVVW(&z[i], &x[0], d, n)
On 2010/04/26 04:40:07, rsc1 wrote:
> I'm a little scared by all of the calls to things like addMulVVW here.
> There are a lot of array bounds to check.
> No need to change any code, but I wonder if there's anything
> we can do in the future to make errors here less likely.
> 

Yes; I am aware of it. I can fix some of these, perhaps in another CL.

http://codereview.appspot.com/1004042/diff/6001/7003#newcode320
src/pkg/big/nat.go:320: //            adding it to z. This should remove quite a
bit of code.
On 2010/04/26 04:40:07, rsc1 wrote:
> Seems worth doing.
> May change the threshold too.
> 

I've played with it a bit. So far no impact on the threshold; the reason why I
have not implemented it is that I have not convinced myself yet that it is
correct (even though the tests pass). Which is why there's a TODO. Prefer
to do in a 2nd CL.

http://codereview.appspot.com/1004042/diff/6001/7003#newcode395
src/pkg/big/nat.go:395: // m >= n && m > 1 && n > 1
On 2010/04/26 04:40:07, rsc1 wrote:
> // m >= n > 1
> ?
> 

Done.

http://codereview.appspot.com/1004042/diff/6001/7003#newcode444
src/pkg/big/nat.go:444: copy(z[2*k:], natZero.mul(x1, y1)) // z may not be
normalized!
good point! done. (speedup is minor, though)

On 2010/04/26 04:40:07, rsc1 wrote:
> You'd probably get a speedup from reusing the garbage here.
> 
> tmp := natZero.mul(x1, y1)
> copy(z[2*k:], tmp)
> tmp = tmp.mul(x1, y0)
> addAt(z, tmp, k)
> ...
> 
> etc.

http://codereview.appspot.com/1004042/diff/6001/7003#newcode1004
src/pkg/big/nat.go:1004: nm1 := natZero.sub(n, natOne)
On 2010/04/26 04:40:07, rsc1 wrote:
> I am not sure about all the natZero.sub stuff.
> nat(nil) may be clearer.  I keep wondering why
> zero is involved in the computation and the
> answer is that it's not.
> 

Done.

gri

Hello rsc (cc: golang-dev@googlegroups.com), I'd like you to review this change.

13 years, 12 months ago (2010-04-27 03:41:02 UTC) #7

rsc1

LGTM http://codereview.appspot.com/1004042/diff/28001/29001 File src/pkg/big/calibrate.go (right): http://codereview.appspot.com/1004042/diff/28001/29001#newcode5 src/pkg/big/calibrate.go:5: // The calibrate package computes the Karatsuba threshold. ...

13 years, 12 months ago (2010-04-27 06:15:42 UTC) #8

gri

PTAL. http://codereview.appspot.com/1004042/diff/28001/29001 File src/pkg/big/calibrate.go (right): http://codereview.appspot.com/1004042/diff/28001/29001#newcode5 src/pkg/big/calibrate.go:5: // The calibrate package computes the Karatsuba threshold. ...

13 years, 12 months ago (2010-04-28 00:22:07 UTC) #9

PTAL.

http://codereview.appspot.com/1004042/diff/28001/29001
File src/pkg/big/calibrate.go (right):

http://codereview.appspot.com/1004042/diff/28001/29001#newcode5
src/pkg/big/calibrate.go:5: // The calibrate package computes the Karatsuba
threshold.
On 2010/04/27 06:15:42, rsc1 wrote:
> ?
> this is package main
> 

Part of package big now; but excluded from build. To calibrate, simply include
in build.

http://codereview.appspot.com/1004042/diff/28001/29001#newcode50
src/pkg/big/calibrate.go:50: big.SetKaratsubaThreshold(n) // enable karatsuba
On 2010/04/27 06:15:42, rsc1 wrote:
> Because the threshold gets used both to decide
> whether to do karatsuba at all and also to decide
> when to stop doing it during the recursion, 
> doesn't setting the threshold to n for an n*n multiply
> have a different effect from, say, setting it to n/2 for
> an n*n multiply?
> 
> That is, it seems like there is a two-dimensional space
> here and the test is only exploring a single line in that
> space.

Changed work load to a set of multiplies. The threshold is now at 30.

http://codereview.appspot.com/1004042/diff/28001/29004
File src/pkg/big/nat.go (right):

http://codereview.appspot.com/1004042/diff/28001/29004#newcode234
src/pkg/big/nat.go:234: func init() {
On 2010/04/27 06:15:42, rsc1 wrote:
> Why not just set it to 52 above?
> That would avoid the need for run-time init code here.
> (=52 can be done by the linker.)
> 

Removed init() routine (make calibrate part of big package). Also removed init()
verification of the threshold - made rest of the code robust (has no impact on
performance and makes it easier to understand).

http://codereview.appspot.com/1004042/diff/28001/29004#newcode258
src/pkg/big/nat.go:258: func gradeschool(z, x, y nat) {
On 2010/04/27 06:15:42, rsc1 wrote:
> gradschoolMul?
> 

Done.

http://codereview.appspot.com/1004042/diff/28001/29004#newcode329
src/pkg/big/nat.go:329: // TODO(gri): In the following we carefully avoid
underflow
On 2010/04/27 06:15:42, rsc1 wrote:
> Delete TODO?
> 
> Last time I asked about implementing the TODO and
> you said you didn't think it was actually correct.
> I think you're right - it's not correct.
> 
> If one of the two is negative and was left 2s-complement
> you'd end up multiplying (2^n2 + xd) * yd which
> would end up being off by 2^n2 * yd in the product.
> The doubled precision of the multiply means that
> it is necessary to track the sign separately.

Done.

rsc1

http://codereview.appspot.com/1004042/diff/25004/28002 File src/pkg/big/calibrate_test.go (right): http://codereview.appspot.com/1004042/diff/25004/28002#newcode53 src/pkg/big/calibrate_test.go:53: x[i] = makeNumber(n * (i + 1)) I still ...

13 years, 12 months ago (2010-04-28 02:08:12 UTC) #10

gri

*** Submitted as http://code.google.com/p/go/source/detail?r=b75b37bee7ee *** big: implemented Karatsuba multiplication Plus: - calibration "test" - include ...

13 years, 12 months ago (2010-04-28 02:16:12 UTC) #12

adonovan

https://codereview.appspot.com/1004042/diff/23004/src/pkg/big/nat.go File src/pkg/big/nat.go (right): https://codereview.appspot.com/1004042/diff/23004/src/pkg/big/nat.go#newcode461 src/pkg/big/nat.go:461: func (z nat) mulRange(a, b uint64) nat { What's ...

4 years, 2 months ago (2020-02-12 14:46:30 UTC) #13

gri

4 years, 2 months ago (2020-02-19 00:51:25 UTC) #14

Message was sent while issue was closed.

https://codereview.appspot.com/1004042/diff/23004/src/pkg/big/nat.go
File src/pkg/big/nat.go (right):

https://codereview.appspot.com/1004042/diff/23004/src/pkg/big/nat.go#newcode461
src/pkg/big/nat.go:461: func (z nat) mulRange(a, b uint64) nat {
On 2020/02/12 14:46:30, adonovan wrote:
> What's the purpose of treating the sequence as leaves of a binary tree and
doing
> recursive pairwise additions, instead of using a simple loop? It's the same
> number of multiplications, but I would expect the loop to allocate much fewer
> temporary variables.

The recursive approach appears faster in practice - even w/o Karatsuba. I'm not
100% clear why (I found this empirically and accidentally), but I believe it it
because there is less "internal fragmentation": While the product is getting
bigger and bigger in a loop-based implementation, one factor slowly moves from a
to b and typically is "small" even for a 64bit int - i.e., in each
multiplication step only a few bits (of the 64 bits) of that factor contribute
to the multiplication in each iteration. In contrast, in a recursive approach,
all "inner" digits of a large operand are 64bits in width, thus requiring fewer
multiplications overall. One could probably measure this by counting the 64bit
multiplications required in either case.

If we have Karatsuba, the effect should become even bigger, because in each
multiplication we have two more or less equally sized factors which lend
themselves to Karatsuba multiplication (if they are big enough). In the
iterative approach, one factor is always small (at most 64bits) and thus
Karatsuba won't even kick in.

Expand All Messages | Collapse All Messages