|
|
Created:
11 years, 10 months ago by jeff.allen Modified:
11 years, 10 months ago Reviewers:
CC:
bradfitz, golang-dev, dave_cheney.net, rsc, jra Visibility:
Public. |
Descriptionnet/textproto: faster header canonicalization with fewer allocations
By keeping a single copy of the strings that commonly show up
in headers, we can avoid one string allocation per header.
benchmark old ns/op new ns/op delta
BenchmarkReadMIMEHeader 19590 10824 -44.75%
BenchmarkUncommon 3168 1861 -41.26%
benchmark old allocs new allocs delta
BenchmarkReadMIMEHeader 32 25 -21.88%
BenchmarkUncommon 5 5 0.00%
Patch Set 1 #Patch Set 2 : diff -r 3d637cc9dff0 https://code.google.com/p/go #Patch Set 3 : diff -r 3d637cc9dff0 https://code.google.com/p/go #
Total comments: 5
Patch Set 4 : diff -r 8d919bfe75d3 https://code.google.com/p/go #Patch Set 5 : diff -r 3312fddeb739 https://code.google.com/p/go #
Total comments: 8
Patch Set 6 : diff -r 93dc7f0e302b https://code.google.com/p/go #Patch Set 7 : diff -r 25dcee3d220c https://code.google.com/p/go #Patch Set 8 : diff -r 25dcee3d220c https://code.google.com/p/go #Patch Set 9 : diff -r 25dcee3d220c https://code.google.com/p/go #
Total comments: 1
Patch Set 10 : diff -r eb7dceb84a12 https://code.google.com/p/go #Patch Set 11 : diff -r eb7dceb84a12 https://code.google.com/p/go #
Total comments: 1
Patch Set 12 : diff -r 42c8d3aadc40 https://code.google.com/p/go #Patch Set 13 : diff -r 42c8d3aadc40 https://code.google.com/p/go #
Total comments: 3
MessagesTotal messages: 25
Hello bradfitz@golang.org, golang-dev@googlegroups.com (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://code.google.com/p/go
Sign in to reply to this message.
Thanks for this proposal. I think with a bit of work we could recover that 3% elsewhere. https://codereview.appspot.com/6721055/diff/5001/src/pkg/net/textproto/reader.go File src/pkg/net/textproto/reader.go (right): https://codereview.appspot.com/6721055/diff/5001/src/pkg/net/textproto/reader... src/pkg/net/textproto/reader.go:512: // So try this first, before doing the proper canonicalization. Go style tends towards the terse, possibly drop the last line of this comment. https://codereview.appspot.com/6721055/diff/5001/src/pkg/net/textproto/reader... File src/pkg/net/textproto/reader_test.go (right): https://codereview.appspot.com/6721055/diff/5001/src/pkg/net/textproto/reader... src/pkg/net/textproto/reader_test.go:255: `, "\n", "\r\n", -1) nice trick.
Sign in to reply to this message.
https://codereview.appspot.com/6721055/diff/5001/src/pkg/net/textproto/reader.go File src/pkg/net/textproto/reader.go (right): https://codereview.appspot.com/6721055/diff/5001/src/pkg/net/textproto/reader... src/pkg/net/textproto/reader.go:567: "Content-MD5", really? common? this is some S3 thing I thought. https://codereview.appspot.com/6721055/diff/5001/src/pkg/net/textproto/reader... src/pkg/net/textproto/reader.go:626: func init() { Per http://code.google.com/p/go/issues/detail?id=2297 , there's a general movement to reduce the amount of work done in init. This could be done on first use with a sync.Once inside parsing. https://codereview.appspot.com/6721055/diff/5001/src/pkg/net/textproto/reader... src/pkg/net/textproto/reader.go:632: h = CanonicalMIMEHeaderKey(h) rather than keying from the canonical key, which won't trigger on "ETag", "WWW-Authenticate", "MIME-Version", etc, I would instead use a 27-alphabet trie ('a' to 'z' and '-') and map to the canonical string. Then we catch all case forms too. I surprised you're losing 3%. What's profiling say?
Sign in to reply to this message.
I think the lost 3% simply comes from the hash lookup, which is one more vote for your trie proposal. I don't know when I'm going to have time to learn how to do it, do it, and then re-measure it. For the list of common headers, it would be helpful to find a study of the top-N headers seen in real life on the Internet. I'm wondering if you could find such a thing inside of Google, since you have some webservers there. :)
Sign in to reply to this message.
I got the trie working. With the giant list of headers from my first patch, the trie takes up 41000 bytes (yuck). On the other hand, the time for BenchmarkReadMIMEHeader goes down by 8% due to using the trie, which is really nice. I think the key to getting the right tradeoff in space and time here is to have a small list of headers built-in to textproto, and to decide on what those should be, I need to find a good publicly available HTTP trace and analyze it.
Sign in to reply to this message.
What if you use binary search instead of a trie?
Sign in to reply to this message.
Hello bradfitz@golang.org, golang-dev@googlegroups.com, dave@cheney.net, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
https://codereview.appspot.com/6721055/diff/18001/src/pkg/net/textproto/reade... File src/pkg/net/textproto/reader.go (right): https://codereview.appspot.com/6721055/diff/18001/src/pkg/net/textproto/reade... src/pkg/net/textproto/reader.go:512: if h, ok := searchHeaderTrie(a); ok { this could be used for common header values too ("close", "keep-alive", etc), but might not be a win. https://codereview.appspot.com/6721055/diff/18001/src/pkg/net/textproto/reade... src/pkg/net/textproto/reader.go:541: func byteToKey(b byte) (int, bool) { does this get inlined? https://codereview.appspot.com/6721055/diff/18001/src/pkg/net/textproto/reade... src/pkg/net/textproto/reader.go:558: if !ok || headerTrie[cur].keys[k] == 0 { eliminate the duplicate expression two lines below? might be faster: if !ok { return "", false } cur = headerTrie[cur].keys[k] if cur == 0 { return "", false } it's at least less code.
Sign in to reply to this message.
The latest numbers in the CL description are accurate? On Tue, Oct 30, 2012 at 5:02 PM, <jeff.allen@gmail.com> wrote: > Hello bradfitz@golang.org, golang-dev@googlegroups.com, dave@cheney.net, > rsc@golang.org (cc: golang-dev@googlegroups.com), > > Please take another look. > > > http://codereview.appspot.com/**6721055/<http://codereview.appspot.com/6721055/> >
Sign in to reply to this message.
http://codereview.appspot.com/6721055/diff/18001/src/pkg/net/textproto/reader.go File src/pkg/net/textproto/reader.go (right): http://codereview.appspot.com/6721055/diff/18001/src/pkg/net/textproto/reader... src/pkg/net/textproto/reader.go:511: if false { I think this is an accident. Was it part of your benchmarking? also: how did this pass gofmt ?
Sign in to reply to this message.
I realized I could reduce the headerTrie memory consumption from 20k down to 8k. Yay! The performance measurement in the CL message is up to date. https://codereview.appspot.com/6721055/diff/18001/src/pkg/net/textproto/reade... File src/pkg/net/textproto/reader.go (right): https://codereview.appspot.com/6721055/diff/18001/src/pkg/net/textproto/reade... src/pkg/net/textproto/reader.go:511: if false { Fixed. As for gofmt, no idea. hg gofmt said it processed that file... mysterious. https://codereview.appspot.com/6721055/diff/18001/src/pkg/net/textproto/reade... src/pkg/net/textproto/reader.go:512: if h, ok := searchHeaderTrie(a); ok { Let's benchmark that separately once this is checked in. Also, the headerTrie's leaves MUST be canonical, so a search for "keep-alive" would return "Keep-Alive", which might not be what we really want. Though it would let us drop the strings.ToLower from a line like "if !strings.Contains(strings.ToLower(header.get("Connection")), "keep-alive")". https://codereview.appspot.com/6721055/diff/18001/src/pkg/net/textproto/reade... src/pkg/net/textproto/reader.go:541: func byteToKey(b byte) (int, bool) { -m says: ./reader.go:555: inlining call to byteToKey Whoo hoo! https://codereview.appspot.com/6721055/diff/18001/src/pkg/net/textproto/reade... src/pkg/net/textproto/reader.go:558: if !ok || headerTrie[cur].keys[k] == 0 { On 2012/10/30 16:09:27, bradfitz wrote: > eliminate the duplicate expression two lines below? might be faster: > > if !ok { > return "", false > } > cur = headerTrie[cur].keys[k] > if cur == 0 { > return "", false > } > > it's at least less code. Done.
Sign in to reply to this message.
I am not convinced that a trie is necessary. I understand that it is enjoyable to write programs to write programs, and to work on getting complex data structures like this right. But if the goal is just to reduce garbage, it seems like a much simpler scan would reap the same results, without requiring an 8 kB table, and without needing to maintain another mechanically generated source file. Please try adapting http://play.golang.org/p/UxqhCGgy2c and see if you get similar performance improvements. If so, let's do that instead of a trie. Thanks. Russ
Sign in to reply to this message.
Got it, thanks for the idea. I experimented with something like this, but I was having trouble seeing the nice way to include the canonicalization and the search together.
Sign in to reply to this message.
Please take another look. This now feels like really the right way to go; it is faster, uses a minimal amount of extra space in RAM and delivers the original goal of avoiding allocations on common headers.
Sign in to reply to this message.
On 2012/11/07 16:08:53, jeff.allen wrote: > Please take another look. This now feels like really the right way to go; it is > faster, uses a minimal amount of extra space in RAM and delivers the original > goal of avoiding allocations on common headers. I think you can squeeze a little more out benchmark old ns/op new ns/op delta BenchmarkReadMIMEHeader 8395 8335 -0.71% BenchmarkUncommon 1408 1394 -0.99% for i := 0; i < len(a); i++ { // Canonicalize: first letter upper case // and upper case after each dash. // (Host, User-Agent, If-Modified-Since). // MIME headers are ASCII only, so no Unicode issues. if a[i] == ' ' { a[i] = '-' upper = true continue } c := a[i] if upper && 'a' <= c && c <= 'z' { c -= toLower } else if !upper && 'A' <= c && c <= 'Z' { c += toLower } a[i] = c upper = c == '-' // for next time if lo < hi { for lo < hi && (len(commonHeaders[lo]) <= i || commonHeaders[lo][i] < c) { lo++ } for hi > lo && commonHeaders[hi-1][i] > c { hi-- } } }
Sign in to reply to this message.
Hello bradfitz@golang.org, golang-dev@googlegroups.com, dave@cheney.net, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Please add a test that loops over commonHeaders making sure that each entry is its own CanonicalHeaderKey. https://codereview.appspot.com/6721055/diff/28001/src/pkg/net/textproto/reade... File src/pkg/net/textproto/reader_test.go (right): https://codereview.appspot.com/6721055/diff/28001/src/pkg/net/textproto/reade... src/pkg/net/textproto/reader_test.go:276: if (i & 1) == 1 { Drop ()
Sign in to reply to this message.
Hello bradfitz@golang.org, golang-dev@googlegroups.com, dave@cheney.net, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Looking nice. https://codereview.appspot.com/6721055/diff/28003/src/pkg/net/textproto/reade... File src/pkg/net/textproto/reader.go (right): https://codereview.appspot.com/6721055/diff/28003/src/pkg/net/textproto/reade... src/pkg/net/textproto/reader.go:552: var commonHeaders = []string{ add: "If-Modified-Since" "If-None-Match"
Sign in to reply to this message.
Hello bradfitz@golang.org, golang-dev@googlegroups.com, dave@cheney.net, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
LGTM after the docs are simplified, unless that was requested. Russ, any comments? https://codereview.appspot.com/6721055/diff/19009/src/pkg/net/textproto/reade... File src/pkg/net/textproto/reader.go (right): https://codereview.appspot.com/6721055/diff/19009/src/pkg/net/textproto/reade... src/pkg/net/textproto/reader.go:489: // MIME headers should be ASCII only; Unicode runes are passed through were these docs requested earlier? If not, I'd just remove them and let this be an optimization-only CL. https://codereview.appspot.com/6721055/diff/19009/src/pkg/net/textproto/reade... File src/pkg/net/textproto/reader_test.go (right): https://codereview.appspot.com/6721055/diff/19009/src/pkg/net/textproto/reade... src/pkg/net/textproto/reader_test.go:27: // what if some UTF-8 gets in by mistake? It is passed unchanged, even little verbose. could also just say, one one line: {"üser-agenT", "üser-Agent"}, // non-ASCII unchanged
Sign in to reply to this message.
I added the sentence because the behavior was a surprise to me. Le 9 nov. 2012 10:15, <bradfitz@golang.org> a écrit : > LGTM after the docs are simplified, unless that was requested. > > Russ, any comments? > > > > https://codereview.appspot.**com/6721055/diff/19009/src/** > pkg/net/textproto/reader.go<https://codereview.appspot.com/6721055/diff/19009/src/pkg/net/textproto/reader.go> > File src/pkg/net/textproto/reader.**go (right): > > https://codereview.appspot.**com/6721055/diff/19009/src/** > pkg/net/textproto/reader.go#**newcode489<https://codereview.appspot.com/6721055/diff/19009/src/pkg/net/textproto/reader.go#newcode489> > src/pkg/net/textproto/reader.**go:489: // MIME headers should be ASCII > only; Unicode runes are passed through > were these docs requested earlier? If not, I'd just remove them and let > this be an optimization-only CL. > > https://codereview.appspot.**com/6721055/diff/19009/src/** > pkg/net/textproto/reader_test.**go<https://codereview.appspot.com/6721055/diff/19009/src/pkg/net/textproto/reader_test.go> > File src/pkg/net/textproto/reader_**test.go (right): > > https://codereview.appspot.**com/6721055/diff/19009/src/** > pkg/net/textproto/reader_test.**go#newcode27<https://codereview.appspot.com/6721055/diff/19009/src/pkg/net/textproto/reader_test.go#newcode27> > src/pkg/net/textproto/reader_**test.go:27: // what if some UTF-8 gets in > by mistake? It is passed unchanged, even > little verbose. could also just say, one one line: > > {"üser-agenT", "üser-Agent"}, // non-ASCII unchanged > > https://codereview.appspot.**com/6721055/<https://codereview.appspot.com/6721... >
Sign in to reply to this message.
looks good; leaving for brad https://codereview.appspot.com/6721055/diff/19009/src/pkg/net/textproto/reade... File src/pkg/net/textproto/reader.go (right): https://codereview.appspot.com/6721055/diff/19009/src/pkg/net/textproto/reade... src/pkg/net/textproto/reader.go:489: // MIME headers should be ASCII only; Unicode runes are passed through On 2012/11/09 09:15:54, bradfitz wrote: > were these docs requested earlier? If not, I'd just remove them and let this be > an optimization-only CL. // MIME header keys are assumed to be ASCII only.
Sign in to reply to this message.
Per email from Jeff, submitting with the doc change. On Mon, Nov 12, 2012 at 12:40 PM, <rsc@golang.org> wrote: > looks good; leaving for brad > > > > > https://codereview.appspot.**com/6721055/diff/19009/src/** > pkg/net/textproto/reader.go<https://codereview.appspot.com/6721055/diff/19009/src/pkg/net/textproto/reader.go> > File src/pkg/net/textproto/reader.**go (right): > > https://codereview.appspot.**com/6721055/diff/19009/src/** > pkg/net/textproto/reader.go#**newcode489<https://codereview.appspot.com/6721055/diff/19009/src/pkg/net/textproto/reader.go#newcode489> > src/pkg/net/textproto/reader.**go:489: // MIME headers should be ASCII > only; Unicode runes are passed through > On 2012/11/09 09:15:54, bradfitz wrote: > >> were these docs requested earlier? If not, I'd just remove them and >> > let this be > >> an optimization-only CL. >> > > // MIME header keys are assumed to be ASCII only. > > https://codereview.appspot.**com/6721055/<https://codereview.appspot.com/6721... >
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=d465db6441bc *** net/textproto: faster header canonicalization with fewer allocations By keeping a single copy of the strings that commonly show up in headers, we can avoid one string allocation per header. benchmark old ns/op new ns/op delta BenchmarkReadMIMEHeader 19590 10824 -44.75% BenchmarkUncommon 3168 1861 -41.26% benchmark old allocs new allocs delta BenchmarkReadMIMEHeader 32 25 -21.88% BenchmarkUncommon 5 5 0.00% R=bradfitz, golang-dev, dave, rsc, jra CC=golang-dev http://codereview.appspot.com/6721055 Committer: Brad Fitzpatrick <bradfitz@golang.org>
Sign in to reply to this message.
|