|
|
Created:
13 years, 3 months ago by bradfitz Modified:
13 years, 3 months ago Reviewers:
CC:
golang-dev, dsymonds, rsc, r, gri Visibility:
Public. |
Descriptionjson: use strconv.Append variants to avoid allocations in encoding
Before/after, best of 3:
json.BenchmarkCodeEncoder 10 183495300 ns/op 10.58 MB/s
->
json.BenchmarkCodeEncoder 10 133025100 ns/op 14.59 MB/s
But don't get too excited about this. These benchmarks, while
stable at any point of time, fluctuate wildly with any line of
code added or removed anywhere in the path due to stack splitting
issues.
It's currently much faster, though, and this is the API that
doesn't allocate so should always be faster in theory.
Patch Set 1 #Patch Set 2 : diff -r e2a939a05cd7 https://go.googlecode.com/hg/ #Patch Set 3 : diff -r e2a939a05cd7 https://go.googlecode.com/hg/ #Patch Set 4 : diff -r 4a5b249ef23d https://go.googlecode.com/hg/ #Patch Set 5 : diff -r 4a5b249ef23d https://go.googlecode.com/hg/ #
Total comments: 9
Patch Set 6 : diff -r 4a5b249ef23d https://go.googlecode.com/hg/ #
Total comments: 10
Patch Set 7 : diff -r c8f7a5a37d09 https://go.googlecode.com/hg/ #Patch Set 8 : diff -r 56842d9b23d2 https://go.googlecode.com/hg/ #
Total comments: 1
Patch Set 9 : diff -r 56842d9b23d2 https://go.googlecode.com/hg/ #MessagesTotal messages: 27
Hello golang-dev@googlegroups.com (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/
Sign in to reply to this message.
Consider this a feeler CL. I didn't write docs but you can imagine what they'd look like. It seems faster: Before: json.BenchmarkCodeEncoder 10 180316800 ns/op 10.76 MB/s json.BenchmarkCodeMarshal 10 179345300 ns/op 10.82 MB/s json.BenchmarkCodeDecoder 5 407458600 ns/op 4.76 MB/s json.BenchmarkCodeUnmarshal 5 416217800 ns/op 4.66 MB/s json.BenchmarkCodeUnmarshalReuse 5 398989600 ns/op 4.86 MB/s json.BenchmarkSkipValue 50 26343100 ns/op 74.54 MB/s After: json.BenchmarkCodeEncoder 10 162934200 ns/op 11.91 MB/s json.BenchmarkCodeMarshal 10 174220200 ns/op 11.14 MB/s json.BenchmarkCodeDecoder 5 400059600 ns/op 4.85 MB/s json.BenchmarkCodeUnmarshal 5 408517800 ns/op 4.75 MB/s json.BenchmarkCodeUnmarshalReuse 5 396427400 ns/op 4.89 MB/s json.BenchmarkSkipValue 50 27184340 ns/op 72.23 MB/s On Fri, Nov 18, 2011 at 4:27 AM, <bradfitz@golang.org> wrote: > Reviewers: golang-dev_googlegroups.com, > > Message: > Hello golang-dev@googlegroups.com (cc: golang-dev@googlegroups.com), > > I'd like you to review this change to > https://go.googlecode.com/hg/ > > > Description: > strconv: add Ftoa appending variants, to minimize allocations > > Update json to use it. > > Please review this at http://codereview.appspot.com/**5411052/<http://codereview.appspot.com/5411052/> > > Affected files: > M src/pkg/encoding/json/encode.**go > M src/pkg/strconv/ftoa.go > > >
Sign in to reply to this message.
You say there are fewer allocations. Is there a benchmark to demonstrate that?
Sign in to reply to this message.
There is not. But I just added some printlns to BenchmarkEncoder before & after the loop, and it looks like 15% fewer allocations now. On Fri, Nov 18, 2011 at 4:51 AM, David Symonds <dsymonds@golang.org> wrote: > You say there are fewer allocations. Is there a benchmark to demonstrate > that? >
Sign in to reply to this message.
Thanks for trying this. It is not as much faster as I had expected. Can you try running gotest -bench=CodeEncoder -cpuprofile x.prof gopprof 6.out x.prof top10 web whatever and see where the time is going? This will only work on Linux, not on a Mac. Thanks. Russ
Sign in to reply to this message.
On Fri, Nov 18, 2011 at 5:36 AM, Russ Cox <rsc@golang.org> wrote: > Thanks for trying this. It is not as much faster > as I had expected. Can you try running > > gotest -bench=CodeEncoder -cpuprofile x.prof > gopprof 6.out x.prof > $ gotest -benchtime=5 -test.run=XXX -test.bench=CodeEncoder -cpuprofile before.prof rm -f _test/encoding/json.a 6g -p encoding/json -o _gotest_.6 decode.go encode.go indent.go scanner.go stream.go tags.go bench_test.go decode_test.go encode_test.go scanner_test.go stream_test.go tagkey_test.go tags_test.go rm -f _test/encoding/json.a gopack grc _test/encoding/json.a _gotest_.6 PASS json.BenchmarkCodeEncoder mallocs: 241281 mallocs: 12062825 50 183478840 ns/op 10.58 MB/s (pprof) top20 Total: 1000 samples 128 12.8% 12.8% 129 12.9% strconv.rightShift 59 5.9% 18.7% 97 9.7% strconv.Uitob64 57 5.7% 24.4% 57 5.7% runtime.memmove 52 5.2% 29.6% 860 86.0% encoding/json.(*encodeState).reflectValueQuoted 45 4.5% 34.1% 106 10.6% reflect.StructTag.Get 37 3.7% 37.8% 37 3.7% strings.Index 29 2.9% 40.7% 37 3.7% itab 29 2.9% 43.6% 32 3.2% strconv.(*decimal).Assign 26 2.6% 46.2% 141 14.1% strconv.roundShortest 20 2.0% 48.2% 20 2.0% bytes.(*Buffer).Len 20 2.0% 50.2% 118 11.8% runtime.mallocgc 19 1.9% 52.1% 40 4.0% bytes.(*Buffer).WriteByte 18 1.8% 53.9% 35 3.5% bytes.(*Buffer).grow 18 1.8% 55.7% 72 7.2% encoding/json.(*encodeState).string 16 1.6% 57.3% 18 1.8% MCentral_Alloc 15 1.5% 58.8% 40 4.0% runtime.MCache_Alloc 14 1.4% 60.2% 51 5.1% bytes.(*Buffer).WriteString 14 1.4% 61.6% 72 7.2% strconv.fmtF 13 1.3% 62.9% 73 7.3% reflect.(*structType).Field 13 1.3% 64.2% 58 5.8% runtime.growslice $ gotest -benchtime=5 -test.run=XXX -test.bench=CodeEncoder -cpuprofile after.prof rm -f _test/encoding/json.a rm -f _test/encoding/json.a gopack grc _test/encoding/json.a _gotest_.6 PASS json.BenchmarkCodeEncoder mallocs: 153679 mallocs: 7682969 50 173888940 ns/op 11.16 MB/s (pprof) top20 Total: 950 samples 156 16.4% 16.4% 159 16.7% strconv.rightShift 47 4.9% 21.4% 48 5.1% strconv.(*decimal).Assign 42 4.4% 25.8% 808 85.1% encoding/json.(*encodeState).reflectValueQuoted 41 4.3% 30.1% 92 9.7% strconv.Uitob64 39 4.1% 34.2% 39 4.1% runtime.memmove 38 4.0% 38.2% 108 11.4% reflect.StructTag.Get 37 3.9% 42.1% 37 3.9% strings.Index 30 3.2% 45.3% 37 3.9% itab 28 2.9% 48.2% 168 17.7% strconv.roundShortest 27 2.8% 51.1% 77 8.1% encoding/json.(*encodeState).string 21 2.2% 53.3% 78 8.2% runtime.mallocgc 17 1.8% 55.1% 29 3.1% bytes.(*Buffer).grow 17 1.8% 56.8% 37 3.9% runtime.newstack 15 1.6% 58.4% 28 2.9% reflect.valueInterface 13 1.4% 59.8% 36 3.8% bytes.(*Buffer).WriteByte 13 1.4% 61.2% 13 1.4% runtime.stringiter2 13 1.4% 62.5% 48 5.1% strconv.Unquote 12 1.3% 63.8% 30 3.2% runtime.oldstack 12 1.3% 65.1% 12 1.3% runtime.slicestring 12 1.3% 66.3% 12 1.3% unicode.IsLetter just caching the StructTag value in the encoder state,without ftoa, 50 173594020 ns/op 11.18 MB/s Total: 950 samples 162 17.1% 17.1% 167 17.6% strconv.rightShift 52 5.5% 22.5% 52 5.5% runtime.memmove 49 5.2% 27.7% 819 86.2% encoding/json.(*encodeState).reflectValueQuoted 34 3.6% 31.3% 61 6.4% strconv.Uitob64 32 3.4% 34.6% 33 3.5% strconv.(*decimal).Assign 29 3.1% 37.7% 76 8.0% encoding/json.(*encodeState).string 29 3.1% 40.7% 137 14.4% runtime.mallocgc 26 2.7% 43.5% 52 5.5% runtime.MCache_Alloc 25 2.6% 46.1% 158 16.6% strconv.roundShortest 22 2.3% 48.4% 24 2.5% itab 21 2.2% 50.6% 34 3.6% bytes.(*Buffer).WriteByte 21 2.2% 52.8% 21 2.2% memhash 17 1.8% 54.6% 26 2.7% runtime.newstack 16 1.7% 56.3% 75 7.9% strconv.fmtF 15 1.6% 57.9% 319 33.6% strconv.genericFtoa 14 1.5% 59.4% 38 4.0% hash_lookup 14 1.5% 60.8% 14 1.5% runtime.stringiter2 13 1.4% 62.2% 16 1.7% MCentral_Alloc 13 1.4% 63.6% 30 3.2% encoding/json.isValidTag 12 1.3% 64.8% 12 1.3% strings.Index I could do Itoa / Uitob64 also, and fix up a few other thing in the encoder. The web svg mode is very helpful. Unfortunately I kept them all up in my browser but gopprof deleted them from /tmp and Chrome won't let me save them back to disk, so I can't attach.
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, dsymonds@golang.org, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Russ, Thoughts on this one? I'm not sure how big of a speed-up you were hoping for, but: Before/after: json.BenchmarkCodeEncoder 100 127088880 ns/op 15.27 MB/s json.BenchmarkCodeEncoder 100 114246610 ns/op 16.98 MB/s Not bad. before/after svg attached, with 5 second benchtime. I have the before & after 6.out & .prof files too, if you want them. On Mon, Nov 21, 2011 at 11:10 AM, <bradfitz@golang.org> wrote: > Hello golang-dev@googlegroups.com, dsymonds@golang.org, rsc@golang.org > (cc: golang-dev@googlegroups.com), > > Please take another look. > > > http://codereview.appspot.com/**5411052/<http://codereview.appspot.com/5411052/> >
Sign in to reply to this message.
Try with -run=xxx to avoid running the tests? The profile says that reflect.Value.MapKeys is getting called, but I don't see why it would be, except if the tests were taking a significant part of the runtime. Russ
Sign in to reply to this message.
I thought I already ran with -run=xxx (I normally do), but I guess not. Attached. On Mon, Nov 21, 2011 at 11:48 AM, Russ Cox <rsc@golang.org> wrote: > Try with -run=xxx to avoid running the tests? > The profile says that reflect.Value.MapKeys is getting > called, but I don't see why it would be, except > if the tests were taking a significant part of the > runtime. > > Russ >
Sign in to reply to this message.
Faster with special-casing division by 10: before: json.BenchmarkCodeEncoder 100 127088880 ns/op 15.27 MB/s then: json.BenchmarkCodeEncoder 100 114246610 ns/op 16.98 MB/s divide by 10 case: json.BenchmarkCodeEncoder 100 108893550 ns/op 17.82 MB/s On Mon, Nov 21, 2011 at 1:04 PM, Brad Fitzpatrick <bradfitz@golang.org>wrote: > I thought I already ran with -run=xxx (I normally do), but I guess not. > > Attached. > > > On Mon, Nov 21, 2011 at 11:48 AM, Russ Cox <rsc@golang.org> wrote: > >> Try with -run=xxx to avoid running the tests? >> The profile says that reflect.Value.MapKeys is getting >> called, but I don't see why it would be, except >> if the tests were taking a significant part of the >> runtime. >> >> Russ >> > >
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, dsymonds@golang.org, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
http://codereview.appspot.com/5411052/diff/1002/src/pkg/strconv/itoa.go File src/pkg/strconv/itoa.go (right): http://codereview.appspot.com/5411052/diff/1002/src/pkg/strconv/itoa.go#newcode9 src/pkg/strconv/itoa.go:9: if base < 2 || 36 < base { this check is everywhere. we can refactor http://codereview.appspot.com/5411052/diff/1002/src/pkg/strconv/itoa.go#newco... src/pkg/strconv/itoa.go:54: func appendUitob10(dst []byte, u uint64) []byte { why this special case? for constant division? http://codereview.appspot.com/5411052/diff/1002/src/pkg/strconv/itoa.go#newco... src/pkg/strconv/itoa.go:67: buf[j] = "0123456789abcdefghijklmnopqrstuvwxyz"[mod] i thought it was base 10, but in any case why not make it a shared constant? http://codereview.appspot.com/5411052/diff/1002/src/pkg/strconv/itoa.go#newco... src/pkg/strconv/itoa.go:84: // AppendUitob64 works like Itob64 but appends onto dst instead of AppendItob64
Sign in to reply to this message.
http://codereview.appspot.com/5411052/diff/1002/src/pkg/strconv/itoa.go File src/pkg/strconv/itoa.go (right): http://codereview.appspot.com/5411052/diff/1002/src/pkg/strconv/itoa.go#newcode9 src/pkg/strconv/itoa.go:9: if base < 2 || 36 < base { On 2011/11/22 22:48:14, r wrote: > this check is everywhere. we can refactor I could make a checkbase function. Inlining will remove any overhead soon enough, I guess. Or were you thinking of something else? http://codereview.appspot.com/5411052/diff/1002/src/pkg/strconv/itoa.go#newco... src/pkg/strconv/itoa.go:54: func appendUitob10(dst []byte, u uint64) []byte { On 2011/11/22 22:48:14, r wrote: > why this special case? for constant division? yeah. this bit helped quite a bit: from/to: json.BenchmarkCodeEncoder 100 114246610 ns/op 16.98 MB/s json.BenchmarkCodeEncoder 100 108893550 ns/op 17.82 MB/s http://codereview.appspot.com/5411052/diff/1002/src/pkg/strconv/itoa.go#newco... src/pkg/strconv/itoa.go:67: buf[j] = "0123456789abcdefghijklmnopqrstuvwxyz"[mod] On 2011/11/22 22:48:14, r wrote: > i thought it was base 10, but in any case why not make it a shared constant? will do. http://codereview.appspot.com/5411052/diff/1002/src/pkg/strconv/itoa.go#newco... src/pkg/strconv/itoa.go:84: // AppendUitob64 works like Itob64 but appends onto dst instead of On 2011/11/22 22:48:14, r wrote: > AppendItob64 whoops.
Sign in to reply to this message.
http://codereview.appspot.com/5411052/diff/1002/src/pkg/strconv/itoa.go File src/pkg/strconv/itoa.go (right): http://codereview.appspot.com/5411052/diff/1002/src/pkg/strconv/itoa.go#newcode9 src/pkg/strconv/itoa.go:9: if base < 2 || 36 < base { On 2011/11/22 22:48:14, r wrote: > this check is everywhere. wait, it occurs twice.
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, dsymonds@golang.org, rsc@golang.org, r@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
LGTM but wait for rsc http://codereview.appspot.com/5411052/diff/13002/src/pkg/strconv/itoa.go File src/pkg/strconv/itoa.go (right): http://codereview.appspot.com/5411052/diff/13002/src/pkg/strconv/itoa.go#newc... src/pkg/strconv/itoa.go:32: // returning a string. and returns the resulting slice.
Sign in to reply to this message.
http://codereview.appspot.com/5411052/diff/13002/src/pkg/strconv/itoa.go File src/pkg/strconv/itoa.go (right): http://codereview.appspot.com/5411052/diff/13002/src/pkg/strconv/itoa.go#newc... src/pkg/strconv/itoa.go:87: // returning a string. and returns the resulting slice.
Sign in to reply to this message.
I think this can be streamlined quite a bit. Come to think of it, I would probably write one helper function func utoa(a [64]byte, x uint64, base int) int { ... } that converts the x into a to base 10 and returns the length. This one bottle-neck function can be optimized like hell (and maybe even written in assembly if necessary). Then I would just call it from everywhere. The append to byte and the convert to string functions all can use this same bottleneck function. http://codereview.appspot.com/5411052/diff/13002/src/pkg/strconv/itoa.go File src/pkg/strconv/itoa.go (right): http://codereview.appspot.com/5411052/diff/13002/src/pkg/strconv/itoa.go#newc... src/pkg/strconv/itoa.go:35: return appendUitob10(dst, u) This is likely the common case. I would not add an extra function call. http://codereview.appspot.com/5411052/diff/13002/src/pkg/strconv/itoa.go#newc... src/pkg/strconv/itoa.go:40: if u == 0 { I would put this test first (and not care that base is not checked - that's what the existing code (line 89) does, too. Also, then you don't need to check in appendUitob10. http://codereview.appspot.com/5411052/diff/13002/src/pkg/strconv/itoa.go#newc... src/pkg/strconv/itoa.go:42: } I would write the start of this function as: if u == 0 { return ... } if base == 10 { // optimization for common base 10 (same as general case but with constant base) ... } if base < 2 || .. { panic ... } // general case and get rid of appendUitob10 http://codereview.appspot.com/5411052/diff/13002/src/pkg/strconv/itoa.go#newc... src/pkg/strconv/itoa.go:65: u1 := u / 10 s/u1/q/ s/mod/r/ for quotient and remainder (easier mnemonics)? http://codereview.appspot.com/5411052/diff/13002/src/pkg/strconv/itoa.go#newc... src/pkg/strconv/itoa.go:66: mod := u - u1*10 Should have a comment why you do this instead of u%10 (presumably because it's faster - have you measured?). Also, note that the CPU provides both x/10 and x%10 with one instruction and thus gccgo may be able to produce better code if it recognizes the / and % pattern. http://codereview.appspot.com/5411052/diff/13002/src/pkg/strconv/itoa.go#newc... src/pkg/strconv/itoa.go:76: if i == 0 { Why is this test needed? Is 0 an overwhelmingly common case? Doesn't the rest of the code take care of it? http://codereview.appspot.com/5411052/diff/13002/src/pkg/strconv/itoa.go#newc... src/pkg/strconv/itoa.go:89: if i == 0 { Why is this test needed? Is 0 an overwhelmingly common case? Doesn't the rest of the code take care of it? http://codereview.appspot.com/5411052/diff/13002/src/pkg/strconv/itoa.go#newc... src/pkg/strconv/itoa.go:94: return AppendUitob64(dst, -uint64(i), base) just i = -i and get rid of the call and return (fall through)
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, dsymonds@golang.org, rsc@golang.org, r@golang.org, gri@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
I've now dropped all my AppendFloat, etc changes from this CL. I won't submit until this is faster. I'll wait for gri to do strconv.AppendFloat. (feel free to steal the implementations from this CL's earlier versions... :)) On Tue, Dec 6, 2011 at 1:56 PM, <bradfitz@golang.org> wrote: > Hello golang-dev@googlegroups.com, dsymonds@golang.org, rsc@golang.org, > r@golang.org, gri@golang.org (cc: golang-dev@googlegroups.com), > > Please take another look. > > > http://codereview.appspot.com/**5411052/<http://codereview.appspot.com/5411052/> >
Sign in to reply to this message.
Thanks. My code changes are very similar, but with some additional simplifications thrown in. - gri On Tue, Dec 6, 2011 at 1:57 PM, Brad Fitzpatrick <bradfitz@golang.org> wrote: > I've now dropped all my AppendFloat, etc changes from this CL. > > I won't submit until this is faster. I'll wait for gri to do > strconv.AppendFloat. (feel free to steal the implementations from this CL's > earlier versions... :)) > > > On Tue, Dec 6, 2011 at 1:56 PM, <bradfitz@golang.org> wrote: >> >> Hello golang-dev@googlegroups.com, dsymonds@golang.org, rsc@golang.org, >> r@golang.org, gri@golang.org (cc: golang-dev@googlegroups.com), >> >> Please take another look. >> >> >> http://codereview.appspot.com/5411052/ > >
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, dsymonds@golang.org, rsc@golang.org, r@golang.org, gri@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
LGTM
Sign in to reply to this message.
http://codereview.appspot.com/5411052/diff/21001/src/pkg/encoding/json/encode.go File src/pkg/encoding/json/encode.go (right): http://codereview.appspot.com/5411052/diff/21001/src/pkg/encoding/json/encode... src/pkg/encoding/json/encode.go:281: writeString(e, string(b)) I think it should be faster to use writeString(w, strconv.FormatInt(...)) in this case
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=0e5409a34221 *** json: use strconv.Append variants to avoid allocations in encoding Before/after, best of 3: json.BenchmarkCodeEncoder 10 183495300 ns/op 10.58 MB/s -> json.BenchmarkCodeEncoder 10 133025100 ns/op 14.59 MB/s But don't get too excited about this. These benchmarks, while stable at any point of time, fluctuate wildly with any line of code added or removed anywhere in the path due to stack splitting issues. It's currently much faster, though, and this is the API that doesn't allocate so should always be faster in theory. R=golang-dev, dsymonds, rsc, r, gri CC=golang-dev http://codereview.appspot.com/5411052
Sign in to reply to this message.
On Thu, Dec 15, 2011 at 11:20 AM, <gri@golang.org> wrote: > > http://codereview.appspot.com/**5411052/diff/21001/src/pkg/** > encoding/json/encode.go<http://codereview.appspot.com/5411052/diff/21001/src/pkg/encoding/json/encode.go> > File src/pkg/encoding/json/encode.**go (right): > > http://codereview.appspot.com/**5411052/diff/21001/src/pkg/** > encoding/json/encode.go#**newcode281<http://codereview.appspot.com/5411052/diff/21001/src/pkg/encoding/json/encode.go#newcode281> > src/pkg/encoding/json/encode.**go:281: writeString(e, string(b)) > I think it should be faster to use > > writeString(w, strconv.FormatInt(...)) > Whoops, already submitted. In any case, that quoted path is almost never hit. Only when sending int64s to JSON APIs like Google's, where int64s aren't safe by themselves. I'd prefer less code to more performance in a fringe case.
Sign in to reply to this message.
|