big: implemented Karatsuba multiplication
Plus:
- basic Fact benchmark
- extra multiplication tests
- some unrelated cleanups
This improves performance of 1e5! by
almost a factor of 2 (largest multiply
has > 500000bit operands)
gotest -benchmarks=Fact
big.BenchmarkFact 5 329760400 ns/op
big.BenchmarkFact 5 651693600 ns/op (w/o Karatsuba)
There's no impact on pidigits for -n=10000 or -n=20000
because the multiplies are too small.
At the moment Karatsuba multiplies kick
in if both operands have at least 256 words (16384bits on
a 64bit machine); this can probably be improved quite a bit.

On Sun, Apr 25, 2010 at 16:44, Russ Cox <rsc@golang.org> wrote: > Have you measured ...

10 years, 2 months ago
(2010-04-25 23:47:04 UTC)
#3

On Sun, Apr 25, 2010 at 16:44, Russ Cox <rsc@golang.org> wrote:
> Have you measured where the crossover
> is to sanity check the choice of 256?
Never mind, I see that this is answered in the source code.
Still interested in this one:
> Does test/bench/pidigits.go get any faster?
Russ

On Sun, Apr 25, 2010 at 4:47 PM, Russ Cox <rsc@golang.org> wrote: > On Sun, ...

10 years, 2 months ago
(2010-04-26 04:41:08 UTC)
#5

On Sun, Apr 25, 2010 at 4:47 PM, Russ Cox <rsc@golang.org> wrote:
> On Sun, Apr 25, 2010 at 16:44, Russ Cox <rsc@golang.org> wrote:
> > Have you measured where the crossover
> > is to sanity check the choice of 256?
>
> Never mind, I see that this is answered in the source code.
> Still interested in this one:
>
> > Does test/bench/pidigits.go get any faster?
>
No, the time stays around 6s. I instrumented the code, and Karatsuba
multiplies don't kick in
for pidigits (neither for -n=10000 not -n=20000).
The current threshold is higher than I like, and the code is a bit more
restrictive than necessary,
but it's a first step. It's a strict improvement for large numbers w/o any
negatives for small ones.
>
> Russ
>
>
> --
> Subscription settings:
> http://groups.google.com/group/golang-dev/subscribe?hl=en
>

I also updated the CL description (removed Int.Fact). http://codereview.appspot.com/1004042/diff/6001/7001 File src/pkg/big/int.go (right): http://codereview.appspot.com/1004042/diff/6001/7001#newcode320 src/pkg/big/int.go:320: // ...

10 years, 2 months ago
(2010-04-26 05:32:39 UTC)
#6

I also updated the CL description (removed Int.Fact).
http://codereview.appspot.com/1004042/diff/6001/7001
File src/pkg/big/int.go (right):
http://codereview.appspot.com/1004042/diff/6001/7001#newcode320
src/pkg/big/int.go:320: // Fact sets z to n! (factorial of n).
Removed.
http://codereview.appspot.com/1004042/diff/6001/7003
File src/pkg/big/nat.go (right):
http://codereview.appspot.com/1004042/diff/6001/7003#newcode228
src/pkg/big/nat.go:228: const karatsubaThreshold = 245
On 2010/04/26 04:40:07, rsc1 wrote:
> CL desc says 256.
>
See larger note in mul. For instance, say n = 250: this is above the threshold
and thus will trigger the karatsuba path. The largest power of 2 not greater
than 250 is k = 128. This will lead to a split of x and y into digits such that
x0, y0 have length 128. These large digits are multiplied which in turn requires
4 mul calls, all of which operating on digits smaller than 245 and so karatsuba
is not invoked. Only if n >= 256, karatsuba is invoked; i.e. for karatsuba the
effective threshold is a power of 2. Interestingly, even though karatsuba is
not invoked in this case, it nevertheless runs faster than if the the threshold
is set to 256, which is why I left it. One could split up the two numbers but
that doesn't make it easier to tune nor does it change the characteristics.
http://codereview.appspot.com/1004042/diff/6001/7003#newcode230
src/pkg/big/nat.go:230: func init() {
neat! done.
http://codereview.appspot.com/1004042/diff/6001/7003#newcode237
src/pkg/big/nat.go:237: func karatsubaAdd(z, x nat, n int) {
On 2010/04/26 04:40:07, rsc1 wrote:
> Add a comment saying what this routine does and why
> it's split out. Is it the same as
>
> z[0:n+n>>1].add(z[0:n], x[0:n])
>
> ?
>
Done.
http://codereview.appspot.com/1004042/diff/6001/7003#newcode244
src/pkg/big/nat.go:244: func karatsubaSub(z, x nat, n int) {
On 2010/04/26 04:40:07, rsc1 wrote:
> Same.
>
Done.
http://codereview.appspot.com/1004042/diff/6001/7003#newcode252
src/pkg/big/nat.go:252: // Both x and y must have the same length and n must be
a
On 2010/04/26 04:40:07, rsc1 wrote:
> s/same length/same length n/
>
Done.
http://codereview.appspot.com/1004042/diff/6001/7003#newcode267
src/pkg/big/nat.go:267: z[n+i] = addMulVVW(&z[i], &x[0], d, n)
On 2010/04/26 04:40:07, rsc1 wrote:
> I'm a little scared by all of the calls to things like addMulVVW here.
> There are a lot of array bounds to check.
> No need to change any code, but I wonder if there's anything
> we can do in the future to make errors here less likely.
>
Yes; I am aware of it. I can fix some of these, perhaps in another CL.
http://codereview.appspot.com/1004042/diff/6001/7003#newcode320
src/pkg/big/nat.go:320: // adding it to z. This should remove quite a
bit of code.
On 2010/04/26 04:40:07, rsc1 wrote:
> Seems worth doing.
> May change the threshold too.
>
I've played with it a bit. So far no impact on the threshold; the reason why I
have not implemented it is that I have not convinced myself yet that it is
correct (even though the tests pass). Which is why there's a TODO. Prefer
to do in a 2nd CL.
http://codereview.appspot.com/1004042/diff/6001/7003#newcode395
src/pkg/big/nat.go:395: // m >= n && m > 1 && n > 1
On 2010/04/26 04:40:07, rsc1 wrote:
> // m >= n > 1
> ?
>
Done.
http://codereview.appspot.com/1004042/diff/6001/7003#newcode444
src/pkg/big/nat.go:444: copy(z[2*k:], natZero.mul(x1, y1)) // z may not be
normalized!
good point! done. (speedup is minor, though)
On 2010/04/26 04:40:07, rsc1 wrote:
> You'd probably get a speedup from reusing the garbage here.
>
> tmp := natZero.mul(x1, y1)
> copy(z[2*k:], tmp)
> tmp = tmp.mul(x1, y0)
> addAt(z, tmp, k)
> ...
>
> etc.
http://codereview.appspot.com/1004042/diff/6001/7003#newcode1004
src/pkg/big/nat.go:1004: nm1 := natZero.sub(n, natOne)
On 2010/04/26 04:40:07, rsc1 wrote:
> I am not sure about all the natZero.sub stuff.
> nat(nil) may be clearer. I keep wondering why
> zero is involved in the computation and the
> answer is that it's not.
>
Done.

LGTM http://codereview.appspot.com/1004042/diff/28001/29001 File src/pkg/big/calibrate.go (right): http://codereview.appspot.com/1004042/diff/28001/29001#newcode5 src/pkg/big/calibrate.go:5: // The calibrate package computes the Karatsuba threshold. ...

10 years, 2 months ago
(2010-04-27 06:15:42 UTC)
#8

LGTM
http://codereview.appspot.com/1004042/diff/28001/29001
File src/pkg/big/calibrate.go (right):
http://codereview.appspot.com/1004042/diff/28001/29001#newcode5
src/pkg/big/calibrate.go:5: // The calibrate package computes the Karatsuba
threshold.
?
this is package main
http://codereview.appspot.com/1004042/diff/28001/29001#newcode50
src/pkg/big/calibrate.go:50: big.SetKaratsubaThreshold(n) // enable karatsuba
Because the threshold gets used both to decide
whether to do karatsuba at all and also to decide
when to stop doing it during the recursion,
doesn't setting the threshold to n for an n*n multiply
have a different effect from, say, setting it to n/2 for
an n*n multiply?
That is, it seems like there is a two-dimensional space
here and the test is only exploring a single line in that
space.
http://codereview.appspot.com/1004042/diff/28001/29004
File src/pkg/big/nat.go (right):
http://codereview.appspot.com/1004042/diff/28001/29004#newcode234
src/pkg/big/nat.go:234: func init() {
Why not just set it to 52 above?
That would avoid the need for run-time init code here.
(=52 can be done by the linker.)
http://codereview.appspot.com/1004042/diff/28001/29004#newcode258
src/pkg/big/nat.go:258: func gradeschool(z, x, y nat) {
gradschoolMul?
http://codereview.appspot.com/1004042/diff/28001/29004#newcode329
src/pkg/big/nat.go:329: // TODO(gri): In the following we carefully avoid
underflow
Delete TODO?
Last time I asked about implementing the TODO and
you said you didn't think it was actually correct.
I think you're right - it's not correct.
If one of the two is negative and was left 2s-complement
you'd end up multiplying (2^n2 + xd) * yd which
would end up being off by 2^n2 * yd in the product.
The doubled precision of the multiply means that
it is necessary to track the sign separately.

PTAL. http://codereview.appspot.com/1004042/diff/28001/29001 File src/pkg/big/calibrate.go (right): http://codereview.appspot.com/1004042/diff/28001/29001#newcode5 src/pkg/big/calibrate.go:5: // The calibrate package computes the Karatsuba threshold. ...

10 years, 2 months ago
(2010-04-28 00:22:07 UTC)
#9

PTAL.
http://codereview.appspot.com/1004042/diff/28001/29001
File src/pkg/big/calibrate.go (right):
http://codereview.appspot.com/1004042/diff/28001/29001#newcode5
src/pkg/big/calibrate.go:5: // The calibrate package computes the Karatsuba
threshold.
On 2010/04/27 06:15:42, rsc1 wrote:
> ?
> this is package main
>
Part of package big now; but excluded from build. To calibrate, simply include
in build.
http://codereview.appspot.com/1004042/diff/28001/29001#newcode50
src/pkg/big/calibrate.go:50: big.SetKaratsubaThreshold(n) // enable karatsuba
On 2010/04/27 06:15:42, rsc1 wrote:
> Because the threshold gets used both to decide
> whether to do karatsuba at all and also to decide
> when to stop doing it during the recursion,
> doesn't setting the threshold to n for an n*n multiply
> have a different effect from, say, setting it to n/2 for
> an n*n multiply?
>
> That is, it seems like there is a two-dimensional space
> here and the test is only exploring a single line in that
> space.
Changed work load to a set of multiplies. The threshold is now at 30.
http://codereview.appspot.com/1004042/diff/28001/29004
File src/pkg/big/nat.go (right):
http://codereview.appspot.com/1004042/diff/28001/29004#newcode234
src/pkg/big/nat.go:234: func init() {
On 2010/04/27 06:15:42, rsc1 wrote:
> Why not just set it to 52 above?
> That would avoid the need for run-time init code here.
> (=52 can be done by the linker.)
>
Removed init() routine (make calibrate part of big package). Also removed init()
verification of the threshold - made rest of the code robust (has no impact on
performance and makes it easier to understand).
http://codereview.appspot.com/1004042/diff/28001/29004#newcode258
src/pkg/big/nat.go:258: func gradeschool(z, x, y nat) {
On 2010/04/27 06:15:42, rsc1 wrote:
> gradschoolMul?
>
Done.
http://codereview.appspot.com/1004042/diff/28001/29004#newcode329
src/pkg/big/nat.go:329: // TODO(gri): In the following we carefully avoid
underflow
On 2010/04/27 06:15:42, rsc1 wrote:
> Delete TODO?
>
> Last time I asked about implementing the TODO and
> you said you didn't think it was actually correct.
> I think you're right - it's not correct.
>
> If one of the two is negative and was left 2s-complement
> you'd end up multiplying (2^n2 + xd) * yd which
> would end up being off by 2^n2 * yd in the product.
> The doubled precision of the multiply means that
> it is necessary to track the sign separately.
Done.

http://codereview.appspot.com/1004042/diff/25004/28002 File src/pkg/big/calibrate_test.go (right): http://codereview.appspot.com/1004042/diff/25004/28002#newcode53 src/pkg/big/calibrate_test.go:53: x[i] = makeNumber(n * (i + 1)) I still ...

10 years, 2 months ago
(2010-04-28 02:08:12 UTC)
#10

http://codereview.appspot.com/1004042/diff/25004/28002
File src/pkg/big/calibrate_test.go (right):
http://codereview.appspot.com/1004042/diff/25004/28002#newcode53
src/pkg/big/calibrate_test.go:53: x[i] = makeNumber(n * (i + 1))
I still think this is not a comparable test:
the input values should not depend on the choice of n,
because as you vary n you can't tell whether it was
the threshold change that helped or the different size
of number.
Perhaps
x[i] = makeNumber(20*(i+1))
or something like that (maybe do more than 5)
would give a more accurate comparison of
different thresholds.

*** Submitted as http://code.google.com/p/go/source/detail?r=b75b37bee7ee *** big: implemented Karatsuba multiplication Plus: - calibration "test" - include ...

10 years, 2 months ago
(2010-04-28 02:16:12 UTC)
#12

*** Submitted as http://code.google.com/p/go/source/detail?r=b75b37bee7ee ***
big: implemented Karatsuba multiplication
Plus:
- calibration "test" - include in tests with gotest -calibrate
- basic Mul benchmark
- extra multiplication tests
- various cleanups
This change improves multiplication speed of numbers >= 30 words
in length (current threshold; found empirically with calibrate):
The multiplication benchmark (multiplication of a variety of long numbers)
improves by ~35%, individual multiplies can be significantly faster.
gotest -benchmarks=Mul
big.BenchmarkMul 500 6829290 ns/op (w/ Karatsuba)
big.BenchmarkMul 100 10600760 ns/op
There's no impact on pidigits for -n=10000 or -n=20000
because the operands are are too small.
R=rsc
CC=golang-dev
http://codereview.appspot.com/1004042
Committer: Robert Griesemer <gri@golang.org>

https://codereview.appspot.com/1004042/diff/23004/src/pkg/big/nat.go File src/pkg/big/nat.go (right): https://codereview.appspot.com/1004042/diff/23004/src/pkg/big/nat.go#newcode461 src/pkg/big/nat.go:461: func (z nat) mulRange(a, b uint64) nat { On ...

4 months, 2 weeks ago
(2020-02-19 00:51:25 UTC)
#14

Message was sent while issue was closed.

https://codereview.appspot.com/1004042/diff/23004/src/pkg/big/nat.go
File src/pkg/big/nat.go (right):
https://codereview.appspot.com/1004042/diff/23004/src/pkg/big/nat.go#newcode461
src/pkg/big/nat.go:461: func (z nat) mulRange(a, b uint64) nat {
On 2020/02/12 14:46:30, adonovan wrote:
> What's the purpose of treating the sequence as leaves of a binary tree and
doing
> recursive pairwise additions, instead of using a simple loop? It's the same
> number of multiplications, but I would expect the loop to allocate much fewer
> temporary variables.
The recursive approach appears faster in practice - even w/o Karatsuba. I'm not
100% clear why (I found this empirically and accidentally), but I believe it it
because there is less "internal fragmentation": While the product is getting
bigger and bigger in a loop-based implementation, one factor slowly moves from a
to b and typically is "small" even for a 64bit int - i.e., in each
multiplication step only a few bits (of the 64 bits) of that factor contribute
to the multiplication in each iteration. In contrast, in a recursive approach,
all "inner" digits of a large operand are 64bits in width, thus requiring fewer
multiplications overall. One could probably measure this by counting the 64bit
multiplications required in either case.
If we have Karatsuba, the effect should become even bigger, because in each
multiplication we have two more or less equally sized factors which lend
themselves to Karatsuba multiplication (if they are big enough). In the
iterative approach, one factor is always small (at most 64bits) and thus
Karatsuba won't even kick in.

Issue 1004042: code review 1004042: big: implemented Karatsuba multiplication
(Closed)
Created 10 years, 2 months ago by gri
Modified 4 months, 2 weeks ago
Reviewers: adonovan
Base URL:
Comments: 36