|
|
Created:
11 years, 6 months ago by Eric Roshan Eisner Modified:
11 years, 6 months ago Reviewers:
CC:
nigeltao, rsc, golang-dev Visibility:
Public. |
Descriptionstrings: implement a faster generic Replacer
This also fixes the semantics of some corner cases with the empty
match. TODOs for genericReplacer in the tests are fixed.
benchmark old ns/op new ns/op delta
BenchmarkGenericNoMatch 71395 3132 -95.61%
BenchmarkGenericMatch1 75610 20280 -73.18%
BenchmarkGenericMatch2 837995 86725 -89.65%
Patch Set 1 #Patch Set 2 : diff -r 5c4859bc123f https://go.googlecode.com/hg #Patch Set 3 : diff -r 5c4859bc123f https://go.googlecode.com/hg #
Total comments: 12
Patch Set 4 : diff -r 5c4859bc123f https://go.googlecode.com/hg #
Total comments: 2
Patch Set 5 : diff -r 5c4859bc123f https://go.googlecode.com/hg #
Total comments: 1
Patch Set 6 : diff -r df382f6986cf https://code.google.com/p/go #Patch Set 7 : diff -r df382f6986cf https://code.google.com/p/go #Patch Set 8 : diff -r df382f6986cf https://code.google.com/p/go #Patch Set 9 : diff -r df382f6986cf https://code.google.com/p/go #
Total comments: 2
Patch Set 10 : diff -r df382f6986cf https://code.google.com/p/go #
Total comments: 1
Patch Set 11 : diff -r df382f6986cf https://code.google.com/p/go #
Total comments: 4
Patch Set 12 : diff -r c99aad44d76c https://code.google.com/p/go #
Total comments: 1
Patch Set 13 : diff -r 46a4f787e8b7 https://code.google.com/p/go #
Total comments: 2
Patch Set 14 : diff -r 46a4f787e8b7 https://code.google.com/p/go #Patch Set 15 : diff -r cdee8bf43694 https://code.google.com/p/go #
Total comments: 5
Patch Set 16 : diff -r cdee8bf43694 https://code.google.com/p/go #
Total comments: 10
Patch Set 17 : diff -r cdee8bf43694 https://code.google.com/p/go #Patch Set 18 : diff -r cdee8bf43694 https://code.google.com/p/go #Patch Set 19 : diff -r cdee8bf43694 https://code.google.com/p/go #
Total comments: 12
Patch Set 20 : diff -r cdee8bf43694 https://code.google.com/p/go #
Total comments: 3
MessagesTotal messages: 46
Hello golang-dev@googlegroups.com, I'd like you to review this change to https://go.googlecode.com/hg
Sign in to reply to this message.
https://codereview.appspot.com/6492076/diff/5001/src/pkg/strings/replace.go File src/pkg/strings/replace.go (right): https://codereview.appspot.com/6492076/diff/5001/src/pkg/strings/replace.go#n... src/pkg/strings/replace.go:143: if len(key) == 0 { if key == "" { is more idiomatic, and should be just as efficient. Similarly below for len(r.emptyMatch). https://codereview.appspot.com/6492076/diff/5001/src/pkg/strings/replace.go#n... src/pkg/strings/replace.go:144: r.emptyMatch = val A TestReplacer test case with more than one empty old string would be nice: NewReplacer("", "foo", "", "bar"). https://codereview.appspot.com/6492076/diff/5001/src/pkg/strings/replace.go#n... src/pkg/strings/replace.go:166: i += keylen - 1 // looping adds another 1 The artificial -1 looks awkward. It might read better as for i := 0; i < len(s); { val, etc if match { etc i += keylen last = i continue } if r.emptyMatch != "" { buf = append(buf, s[i]) } i++ } https://codereview.appspot.com/6492076/diff/5001/src/pkg/strings/replace.go#n... src/pkg/strings/replace.go:168: buf = append(buf, s[i]) Should the buf = append(buf, r.emptyMatch...) a few lines above be done here instead? Can you add a TestReplacer test case that is: {blankToXReplacer, "op", "XOXpX"}, I suspect that you erroneously get "XOpX". https://codereview.appspot.com/6492076/diff/5001/src/pkg/strings/replace.go#n... src/pkg/strings/replace.go:183: // Same logic as Replace, so if you want to avoid potentially a lot of small Go doc comments should be complete sentences (i.e. end with a full stop), and start with the name of the thing being declared. http://golang.org/doc/effective_go.html#commentary https://codereview.appspot.com/6492076/diff/5001/src/pkg/strings/replace_test.go File src/pkg/strings/replace_test.go (right): https://codereview.appspot.com/6492076/diff/5001/src/pkg/strings/replace_test... src/pkg/strings/replace_test.go:117: var html = Repeat("It's <b>HTML</b>!", 10) Make this a local variable in BenchmarkGenericMatch, like line 110.
Sign in to reply to this message.
Hello nigeltao@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
https://codereview.appspot.com/6492076/diff/5001/src/pkg/strings/replace.go File src/pkg/strings/replace.go (right): https://codereview.appspot.com/6492076/diff/5001/src/pkg/strings/replace.go#n... src/pkg/strings/replace.go:143: if len(key) == 0 { On 2012/09/04 04:29:59, nigeltao wrote: > if key == "" { > is more idiomatic, and should be just as efficient. Similarly below for > len(r.emptyMatch). Done. https://codereview.appspot.com/6492076/diff/5001/src/pkg/strings/replace.go#n... src/pkg/strings/replace.go:144: r.emptyMatch = val On 2012/09/04 04:29:59, nigeltao wrote: > A TestReplacer test case with more than one empty old string would be nice: > NewReplacer("", "foo", "", "bar"). Some of the other implementations currently overwrite initial keys when they encounter duplicates. The doc does say that earlier ones should have precedence though. Fixed here. https://codereview.appspot.com/6492076/diff/5001/src/pkg/strings/replace.go#n... src/pkg/strings/replace.go:166: i += keylen - 1 // looping adds another 1 On 2012/09/04 04:29:59, nigeltao wrote: > The artificial -1 looks awkward. It might read better as > for i := 0; i < len(s); { > val, etc > if match { > etc > i += keylen > last = i > continue > } > if r.emptyMatch != "" { > buf = append(buf, s[i]) > } > i++ > } Done. https://codereview.appspot.com/6492076/diff/5001/src/pkg/strings/replace.go#n... src/pkg/strings/replace.go:168: buf = append(buf, s[i]) On 2012/09/04 04:29:59, nigeltao wrote: > Should the > buf = append(buf, r.emptyMatch...) > a few lines above be done here instead? Can you add a TestReplacer test case > that is: > {blankToXReplacer, "op", "XOXpX"}, > I suspect that you erroneously get "XOpX". Done. https://codereview.appspot.com/6492076/diff/5001/src/pkg/strings/replace.go#n... src/pkg/strings/replace.go:183: // Same logic as Replace, so if you want to avoid potentially a lot of small On 2012/09/04 04:29:59, nigeltao wrote: > Go doc comments should be complete sentences (i.e. end with a full stop), and > start with the name of the thing being declared. > > http://golang.org/doc/effective_go.html#commentary Done. https://codereview.appspot.com/6492076/diff/5001/src/pkg/strings/replace_test.go File src/pkg/strings/replace_test.go (right): https://codereview.appspot.com/6492076/diff/5001/src/pkg/strings/replace_test... src/pkg/strings/replace_test.go:117: var html = Repeat("It's <b>HTML</b>!", 10) On 2012/09/04 04:29:59, nigeltao wrote: > Make this a local variable in BenchmarkGenericMatch, like line 110. Done.
Sign in to reply to this message.
LGTM A couple of trivial comments below. http://codereview.appspot.com/6492076/diff/8001/src/pkg/strings/replace.go File src/pkg/strings/replace.go (right): http://codereview.appspot.com/6492076/diff/8001/src/pkg/strings/replace.go#ne... src/pkg/strings/replace.go:172: } else if r.emptyMatch != "" { this else is unnecessary. it's nice to make the sequential logic obvious, i think. http://codereview.appspot.com/6492076/diff/8001/src/pkg/strings/replace.go#ne... src/pkg/strings/replace.go:221: } else if r.emptyMatch != "" { this else is unnecessary.
Sign in to reply to this message.
Can you compare the allocation pattern before and after? (allocation count + bytes per iteration)
Sign in to reply to this message.
On 2012/09/04 09:29:16, remyoudompheng wrote: > Can you compare the allocation pattern before and after? (allocation count + > bytes per iteration) BenchmarkGenericNoMatch: initial call to NewReplacer: before 4912B 9 allocs after 16224B 11 allocs Replace per iteration: before 3632B 403 allocs after 0B 0 allocs BenchmarkGenericMatch: initial call to NewReplacer: before 5152B 8 allocs after 53072B 25 allocs Replace per iteration: before 3088B 343 allcs after 696B 9 allocs
Sign in to reply to this message.
I am concerned about the amount of memory this will consume. [256]*trie is 2 kB per node on a typical 64-bit machine. Is there something more compact we could do, like something based on a simple sorted slice of "before" strings? Russ
Sign in to reply to this message.
On 2012/09/04 20:59:23, rsc wrote: > I am concerned about the amount of memory this will consume. > [256]*trie is 2 kB per node on a typical 64-bit machine. Is there > something more compact we could do, like something based on a simple > sorted slice of "before" strings? > > Russ I agree that the current implementation's use of memory is unnecessarily large. A different strategy to reduce unnecessary memory usage while still keeping the performance benefits of a table would be to compress chains of nodes with one entry into a string. I'll tinker around with this option and see how the memory footprint looks. A sorted slice of prefixes in the trie would use even less memory, but may be slower when there are a lot of keys to match.
Sign in to reply to this message.
Hello nigeltao@golang.org, rogpeppe@gmail.com, remyoudompheng@gmail.com, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
https://codereview.appspot.com/6492076/diff/11001/src/pkg/strings/replace.go File src/pkg/strings/replace.go (right): https://codereview.appspot.com/6492076/diff/11001/src/pkg/strings/replace.go#... src/pkg/strings/replace.go:178: } else if node.prefix != "" && node.prefix == s[:len(node.prefix)] { Won't this panic if node.prefix is longer than s? A larger point is that I'd want a lot more test cases to cover all the different paths in this new code.
Sign in to reply to this message.
Hello nigeltao@golang.org, rogpeppe@gmail.com, remyoudompheng@gmail.com, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
On 2012/09/05 07:27:14, nigeltao wrote: > https://codereview.appspot.com/6492076/diff/11001/src/pkg/strings/replace.go > File src/pkg/strings/replace.go (right): > > https://codereview.appspot.com/6492076/diff/11001/src/pkg/strings/replace.go#... > src/pkg/strings/replace.go:178: } else if node.prefix != "" && node.prefix == > s[:len(node.prefix)] { > Won't this panic if node.prefix is longer than s? > > A larger point is that I'd want a lot more test cases to cover all the different > paths in this new code. I added another test replacer that covers all of the cases in the new builder. Fixed the panic.
Sign in to reply to this message.
Hello nigeltao@golang.org, rogpeppe@gmail.com, remyoudompheng@gmail.com, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Hello nigeltao@golang.org, rogpeppe@gmail.com, remyoudompheng@gmail.com, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
I'm sorry, but while I am still worried about the memory usage, To get the memory usage down, one trick that has worked well for me in the past is to group the input bytes into equivalence classes. Here it is probably enough to keep a list of all the bytes that occur in the match strings. You can use that list to map input bytes into a dense encoding. The n bytes that do occur map to a dense encoding 0 .. n-1 and all the other bytes, if there are any, map to n. You can keep the mapping in a [256]byte stored once in the replacer, not per trie node, and then each node has a []*trie instead of a [256]*trie. The line node = node.table[s[0]] becomes node = node.table[mapping[s[0]]]. This is in some sense a generalization of your most recent change. I suspect that you'll find it reduces the memory usage considerably. Russ
Sign in to reply to this message.
Hello nigeltao@golang.org, rogpeppe@gmail.com, remyoudompheng@gmail.com, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
On 2012/09/05 22:11:43, rsc wrote: > I'm sorry, but while I am still worried about the memory usage, > > To get the memory usage down, one trick that has worked well for me in > the past is to group the input bytes into equivalence classes. Here it > is probably enough to keep a list of all the bytes that occur in the > match strings. You can use that list to map input bytes into a dense > encoding. The n bytes that do occur map to a dense encoding 0 .. n-1 > and all the other bytes, if there are any, map to n. You can keep the > mapping in a [256]byte stored once in the replacer, not per trie node, > and then each node has a []*trie instead of a [256]*trie. The line > node = node.table[s[0]] becomes node = node.table[mapping[s[0]]]. This > is in some sense a generalization of your most recent change. > > I suspect that you'll find it reduces the memory usage considerably. > > Russ This is a clever trick. I added this mapping using a [256]int (otherwise a malicious set of keys that uses every byte would fail), and it only slightly decreased runtime performance, but dropped the memory by another factor of 2-3.
Sign in to reply to this message.
> This is a clever trick. I added this mapping using a [256]int (otherwise > a malicious set of keys that uses every byte would fail) You can make [256]byte work. If all the bytes are used then there is no need for a value for "other". Russ
Sign in to reply to this message.
The original post says > BenchmarkGenericNoMatch 2482 ns/op -96.36% > BenchmarkGenericMatch 6995 ns/op -90.42% but that is against a baseline that is a poor implementation of a simple algorithm. For comparison, https://codereview.appspot.com/6495094 is a better implementation of that same algorithm, unlike this CL which uses a different algorithm, with different memory requirements. Compared to tip, that CL gets: benchmark old ns/op new ns/op delta BenchmarkGenericNoMatch 76550 18709 -75.56% BenchmarkGenericMatch 88093 29899 -66.06% and the number of mallocs per iteration drop from 403 and 343 before to 3 and 3 after. I'm not saying that this CL should be abandoned. I'm just providing another reference point on how much this CL improves performance. Eric, is it possible to get a fresh set of numbers for this CL? https://codereview.appspot.com/6492076/diff/6/src/pkg/strings/replace.go File src/pkg/strings/replace.go (right): https://codereview.appspot.com/6492076/diff/6/src/pkg/strings/replace.go#newc... src/pkg/strings/replace.go:232: if r.emptyMatch == "" { Does this work if I say NewReplacer("", "", "", "foo")? https://codereview.appspot.com/6492076/diff/6/src/pkg/strings/replace.go#newc... src/pkg/strings/replace.go:283: var last, wn int This function body is a copy/paste of genericReplacer.Replace. Is it possible to look for a WriteString method, a la https://codereview.appspot.com/6495094, and avoid the duplicated code?
Sign in to reply to this message.
On 2012/09/06 03:23:58, nigeltao wrote: > The original post says > > > BenchmarkGenericNoMatch 2482 ns/op -96.36% > > BenchmarkGenericMatch 6995 ns/op -90.42% > > but that is against a baseline that is a poor implementation of a simple > algorithm. > > For comparison, https://codereview.appspot.com/6495094 is a better > implementation of that same algorithm, unlike this CL which uses a different > algorithm, with different memory requirements. Compared to tip, that CL gets: > > benchmark old ns/op new ns/op delta > BenchmarkGenericNoMatch 76550 18709 -75.56% > BenchmarkGenericMatch 88093 29899 -66.06% Quick question: what tool/option generates these benchmark deltas? > > and the number of mallocs per iteration drop from 403 and 343 before to 3 and 3 > after. > > I'm not saying that this CL should be abandoned. I'm just providing another > reference point on how much this CL improves performance. > > Eric, is it possible to get a fresh set of numbers for this CL? I had updated the numbers in the description (and dropped the part that said it was memory inefficient) in the last update or so. They are still multiple times faster than your improved simple implementation (but now more complex with a few memory workarounds). > > https://codereview.appspot.com/6492076/diff/6/src/pkg/strings/replace.go > File src/pkg/strings/replace.go (right): > > https://codereview.appspot.com/6492076/diff/6/src/pkg/strings/replace.go#newc... > src/pkg/strings/replace.go:232: if r.emptyMatch == "" { > Does this work if I say NewReplacer("", "", "", "foo")? > It doesn't, but I am a bit confused as to the semantics of the empty match. After thinking more about precedence, I think the following should yeild "AXbXbAX" (maybe without the trailing "X") NewReplacer("a", "A", "", "X").Replace("abba") The original implementation returns "AXbbAX", mine is very confused and returns "XAXbXbXAX" > https://codereview.appspot.com/6492076/diff/6/src/pkg/strings/replace.go#newc... > src/pkg/strings/replace.go:283: var last, wn int > This function body is a copy/paste of genericReplacer.Replace. Is it possible to > look for a WriteString method, a la https://codereview.appspot.com/6495094, and > avoid the duplicated code? I tried swapping out the copy-pasted Replace code with a call to WriteString on a simple buffer, and it added a 35% runtime overhead to the two benchmarks. The NoMatch case also now allocates buffer memory where it didn't used to allocate any. I included this version for reference as *genericReplacer.BufferedReplace
Sign in to reply to this message.
On Thu, Sep 6, 2012 at 5:53 PM, <eric.d.eisner@gmail.com> wrote: > Quick question: what tool/option generates these benchmark deltas? $GOROOT/misc/benchcmp
Sign in to reply to this message.
On 6 September 2012 17:53, <eric.d.eisner@gmail.com> wrote: > I had updated the numbers in the description (and dropped the part that > said it was memory inefficient) in the last update or so. They are still > multiple times faster than your improved simple implementation (but now > more complex with a few memory workarounds). Sorry for being dumb, but I can't find the new numbers. Can you e-mail them to this thread? > I tried swapping out the copy-pasted Replace code with a call to > WriteString on a simple buffer, and it added a 35% runtime overhead to > the two benchmarks. The NoMatch case also now allocates buffer memory > where it didn't used to allocate any. I included this version for > reference as *genericReplacer.BufferedReplace Your version calls io.WriteString, which has to sniff for the interface each time. My version does the interface check only once, at the top of genericReplacer.WriteString.
Sign in to reply to this message.
https://codereview.appspot.com/6492076/diff/13004/src/pkg/strings/replace.go File src/pkg/strings/replace.go (right): https://codereview.appspot.com/6492076/diff/13004/src/pkg/strings/replace.go#... src/pkg/strings/replace.go:244: // Write wrights to the buffer to satisfy io.Writer. s/wrights/writes/
Sign in to reply to this message.
On 2012/09/07 00:03:17, nigeltao wrote: > On 6 September 2012 17:53, <mailto:eric.d.eisner@gmail.com> wrote: > > I had updated the numbers in the description (and dropped the part that > > said it was memory inefficient) in the last update or so. They are still > > multiple times faster than your improved simple implementation (but now > > more complex with a few memory workarounds). > > Sorry for being dumb, but I can't find the new numbers. Can you e-mail > them to this thread? > > > > I tried swapping out the copy-pasted Replace code with a call to > > WriteString on a simple buffer, and it added a 35% runtime overhead to > > the two benchmarks. The NoMatch case also now allocates buffer memory > > where it didn't used to allocate any. I included this version for > > reference as *genericReplacer.BufferedReplace > > Your version calls io.WriteString, which has to sniff for the > interface each time. My version does the interface check only once, at > the top of genericReplacer.WriteString. I used your WriteString interface sniffing, and here are the updated numbers: Copy-paste Replace logic: benchmark old ns/op new ns/op delta BenchmarkGenericNoMatch 72882 2461 -96.62% BenchmarkGenericMatch 90858 9313 -89.75% benchmark old MB/s new MB/s speedup BenchmarkGenericNoMatch 2.74 81.26 29.66x BenchmarkGenericMatch 3.74 36.50 9.76x Using the buffer and WriteString: benchmark old ns/op new ns/op delta BenchmarkGenericNoMatch 72882 3058 -95.80% BenchmarkGenericMatch 90858 10361 -88.60% benchmark old MB/s new MB/s speedup BenchmarkGenericNoMatch 2.74 65.39 23.86x BenchmarkGenericMatch 3.74 32.81 8.77x
Sign in to reply to this message.
Hello nigeltao@golang.org, rogpeppe@gmail.com, remyoudompheng@gmail.com, rsc@golang.org, dsymonds@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go File src/pkg/strings/replace.go (right): https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go#n... src/pkg/strings/replace.go:93: // If this node has no children, the previous 3 fields will be nil. s/nil/zero/ https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go#n... src/pkg/strings/replace.go:106: for n = 0; n < len(t.prefix) && n < len(key); n++ { You can drop the "n=0". https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go#n... src/pkg/strings/replace.go:177: break Is this correct if every byte in the range [0, 255] is actually part of some old string? Again, I would like to see many more tests before I'm confident that this is all correct. https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go#n... src/pkg/strings/replace.go:278: var buf []byte Even if it's slightly slower, I would still prefer Replace to just call WriteString, for now. If it really matters, we can copy/paste the code later, but for now I would prefer to avoid the duplication.
Sign in to reply to this message.
On 2012/09/07 08:17:35, nigeltao wrote: > https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go > File src/pkg/strings/replace.go (right): > > https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go#n... > src/pkg/strings/replace.go:93: // If this node has no children, the previous 3 > fields will be nil. > s/nil/zero/ Done. > > https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go#n... > src/pkg/strings/replace.go:106: for n = 0; n < len(t.prefix) && n < len(key); > n++ { > You can drop the "n=0". Done. > > https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go#n... > src/pkg/strings/replace.go:177: break > Is this correct if every byte in the range [0, 255] is actually part of some old > string? I cleaned up this logic, so it should be more clear. I also added a test that hits this case. > > Again, I would like to see many more tests before I'm confident that this is all > correct. I added a few more tests, and changed the handling of the empty match (which magically removed a lot of code). The last test in that suite {NewReplacer("a", "A", "", "X"), "abba", "AXbXbAX"}, follows the spec to my understanding, but it is a different behavior than the original implementation. > > https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go#n... > src/pkg/strings/replace.go:278: var buf []byte > Even if it's slightly slower, I would still prefer Replace to just call > WriteString, for now. If it really matters, we can copy/paste the code later, > but for now I would prefer to avoid the duplication. Sounds good.
Sign in to reply to this message.
Hello nigeltao@golang.org, rogpeppe@gmail.com, remyoudompheng@gmail.com, rsc@golang.org, dsymonds@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
https://codereview.appspot.com/6492076/diff/7005/src/pkg/strings/replace_test.go File src/pkg/strings/replace_test.go (right): https://codereview.appspot.com/6492076/diff/7005/src/pkg/strings/replace_test... src/pkg/strings/replace_test.go:83: // Added here for initialization. You might as well move all the ReplacerTests into here. This is done in http://codereview.appspot.com/6488110.
Sign in to reply to this message.
Hello nigeltao@golang.org, rogpeppe@gmail.com, remyoudompheng@gmail.com, rsc@golang.org, dsymonds@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
I think that the data structure is generally OK, but the code could be re-written to be more obviously correct. I'll take a closer look tomorrow. Also, it might be nice to have some tests that adds things to a trie and then dumps it out as ASCII art. I'm thinking of something like testCases := []struct{ in, out }string{ { "abc;abdef;abdefgh;xx;xy;z", ` -ab +.c +.def +..gh -x +.x +.y +z`, }, // etc } The +/- prefix means has/has-not a value. https://codereview.appspot.com/6492076/diff/4014/src/pkg/strings/replace.go File src/pkg/strings/replace.go (right): https://codereview.appspot.com/6492076/diff/4014/src/pkg/strings/replace.go#n... src/pkg/strings/replace.go:217: for i := range r.mapping { for _, x := range mapping { r.tableSize += int(x) } https://codereview.appspot.com/6492076/diff/4014/src/pkg/strings/replace.go#n... src/pkg/strings/replace.go:222: for i := range r.mapping { for i, x := range r.mapping { if x == 0 { r.mapping[i] = etc } else { // etc } }
Sign in to reply to this message.
> I think that the data structure is generally OK, but the code could be > re-written to be more obviously correct. I'll take a closer look tomorrow. I assume the code you're referring to is the add method, which I agree is large and complex. I've made it as simple as I can, given that it's mangling and unmangling a somewhat convoluted data structure. I'll gladly accept any ideas to restructure it to make it clearer. > > Also, it might be nice to have some tests that adds things to a trie and then > dumps it out as ASCII art. I'm thinking of something like > > testCases := []struct{ in, out }string{ > { > "abc;abdef;abdefgh;xx;xy;z", ` > -ab > +.c > +.def > +..gh > -x > +.x > +.y > +z`, > }, > // etc > } > > The +/- prefix means has/has-not a value. The ASCII dumping would need to be shoved into the main strings package (and exported), as the tests are in an external package. > > https://codereview.appspot.com/6492076/diff/4014/src/pkg/strings/replace.go > File src/pkg/strings/replace.go (right): > > https://codereview.appspot.com/6492076/diff/4014/src/pkg/strings/replace.go#n... > src/pkg/strings/replace.go:217: for i := range r.mapping { > for _, x := range mapping { > r.tableSize += int(x) > } > > https://codereview.appspot.com/6492076/diff/4014/src/pkg/strings/replace.go#n... > src/pkg/strings/replace.go:222: for i := range r.mapping { > for i, x := range r.mapping { > if x == 0 { > r.mapping[i] = etc > } else { > // etc > } > }
Sign in to reply to this message.
You can add exported helpers to export_test.go and they will be available during testing. Russ
Sign in to reply to this message.
Hello nigeltao@golang.org, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Hello nigeltao@golang.org, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
http://codereview.appspot.com/6492076/diff/2008/src/pkg/strings/export_test.go File src/pkg/strings/export_test.go (right): http://codereview.appspot.com/6492076/diff/2008/src/pkg/strings/export_test.g... src/pkg/strings/export_test.go:29: index := int(m) You don't really need this variable. The next line could be if int(m) != r.tableSize && t.table[m] != nil { http://codereview.appspot.com/6492076/diff/2008/src/pkg/strings/replace_test.go File src/pkg/strings/replace_test.go (right): http://codereview.appspot.com/6492076/diff/2008/src/pkg/strings/replace_test.... src/pkg/strings/replace_test.go:317: .bra- I'm surprised that "abra" is broken over two tries. Similarly with "c/adabra", "h/am" and "s/ion". Do you know if the opening "a" in "abra" is a prefix or a table entry? Either way, it seems sub-optimal. http://codereview.appspot.com/6492076/diff/2008/src/pkg/strings/replace_test.... src/pkg/strings/replace_test.go:342: spaceRemover := NewReplacer(" ", "", "\t", "") Are there any spaces? Isn't the leading white space all tabs? Also, it's a little worrying that this test uses a NewReplacer that is a generic replacer that relies on the trie that this test is testing. Instead, I'd write the replacer by hand: stripTabs := func(s string) string { b := make([]byte, 0, len(s)) for i := 0; i < len(s); i++ { if s[i] != '\t' { b = append(b, s[i]) } } return string(b) } http://codereview.appspot.com/6492076/diff/2008/src/pkg/strings/replace_test.... src/pkg/strings/replace_test.go:350: gen := NewReplacer(args...) You could roll this into the next line: in := NewReplacer(args...).PrintTrie() More idiomatic would be to rename in and out to got and want: got := NewReplacer(args...).PrintTrie() want := stripTabs(tc.out) if got != want { etc } http://codereview.appspot.com/6492076/diff/2008/src/pkg/strings/replace_test.... src/pkg/strings/replace_test.go:356: t.Errorf("PrintTrie(%q) = %s, want %s", tc.in, in, out) Given that in and out (or got and want) have new-lines in them, it might be better to format it as t.Errorf("input=%q\ngot %s\nwant %s\n", tc.in, got, want)
Sign in to reply to this message.
On 2012/09/12 12:23:57, nigeltao wrote: > http://codereview.appspot.com/6492076/diff/2008/src/pkg/strings/export_test.go > File src/pkg/strings/export_test.go (right): > > http://codereview.appspot.com/6492076/diff/2008/src/pkg/strings/export_test.g... > src/pkg/strings/export_test.go:29: index := int(m) > You don't really need this variable. The next line could be > if int(m) != r.tableSize && t.table[m] != nil { Done. > > http://codereview.appspot.com/6492076/diff/2008/src/pkg/strings/replace_test.go > File src/pkg/strings/replace_test.go (right): > > http://codereview.appspot.com/6492076/diff/2008/src/pkg/strings/replace_test.... > src/pkg/strings/replace_test.go:317: .bra- > I'm surprised that "abra" is broken over two tries. Similarly with "c/adabra", > "h/am" and "s/ion". Do you know if the opening "a" in "abra" is a prefix or a > table entry? Either way, it seems sub-optimal. I added an explicit table marker in PrintTrie to make it more clear where the lookup tables are. Note there is always one node for each line. For the a/bra split, there's a line in makeGenericReplacer that says "Ensure root node uses a lookup table." This is purely for performance reasons, and it gives BenchmarkGenericMatch2 a 15% speedup (probably from being able to skip over unmatched regions faster). For the h/am and s/ion splits the lookup table after "abra" can only cover one byte of the key before pointing to the next node. In this data structure, each node can have multiple children or have multiple byte prefixes, but not both. > > http://codereview.appspot.com/6492076/diff/2008/src/pkg/strings/replace_test.... > src/pkg/strings/replace_test.go:342: spaceRemover := NewReplacer(" ", "", "\t", > "") > Are there any spaces? Isn't the leading white space all tabs? > > Also, it's a little worrying that this test uses a NewReplacer that is a generic > replacer that relies on the trie that this test is testing. Instead, I'd write > the replacer by hand: It was using a byteStringReplacer, but I swapped it out for a call to strings.Replace. > > stripTabs := func(s string) string { > b := make([]byte, 0, len(s)) > for i := 0; i < len(s); i++ { > if s[i] != '\t' { > b = append(b, s[i]) > } > } > return string(b) > } > > http://codereview.appspot.com/6492076/diff/2008/src/pkg/strings/replace_test.... > src/pkg/strings/replace_test.go:350: gen := NewReplacer(args...) > You could roll this into the next line: > in := NewReplacer(args...).PrintTrie() > > More idiomatic would be to rename in and out to got and want: > > got := NewReplacer(args...).PrintTrie() > want := stripTabs(tc.out) > if got != want { > etc > } Done. > > http://codereview.appspot.com/6492076/diff/2008/src/pkg/strings/replace_test.... > src/pkg/strings/replace_test.go:356: t.Errorf("PrintTrie(%q) = %s, want %s", > tc.in, in, out) > Given that in and out (or got and want) have new-lines in them, it might be > better to format it as > > t.Errorf("input=%q\ngot %s\nwant %s\n", tc.in, got, want) Done.
Sign in to reply to this message.
https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace.go File src/pkg/strings/replace.go (right): https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace.go#... src/pkg/strings/replace.go:83: type trie struct { s/trie/trieNode/ might be a better name. Also, I think that future maintainers (including you or me six months from now) will be helped by a worked example. It took me a while to figure out how it all works. In my example below, it wasn't obvious to me that the "cbc" prefix was part of node n4, not n5. I've pasted a proposal below. WDYT? // trieNode is a node in a lookup trie for prioritized key/value pairs. Keys // and values may be empty. For example, the trie containing keys "ax", "ay", // "bcbc", "x" and "xy" could have eight nodes: // // n0 - // n1 a- // n2 .x+ // n3 .y+ // n4 b- // n5 .cbc+ // n6 x+ // n7 .y+ // // n0 is the root node, and its children are n1, n4 and n6; n1's children are // n2 and n3; n4's child is n5; n6's child is n7. Nodes n0, n1 and n4 (marked // with a trailing "-") are partial keys, and nodes n2, n3, n5, n6 and n7 // (marked with a trailing "+") are complete keys. type trieNode struct { // value is the value of the trie node's key/value pair. It is empty if // this node is not a complete key. value string // priority is the priority (higher is more important) of the trie node's // key/value pair; keys are not necessarily matched shortest- or longest- // first. Priority is positive if this node is a complete key, and zero // otherwise. In the example above, positive/zero priorities are marked // with a trailing "+" or "-". priority int // A trie node may have zero, one or more child nodes: // * if zero, the remaining fields are zero. // * if one, prefix and next are non-zero. // * if more, table is non-zero. // // For lookup efficiency, the root node is an exception, and will never // use its prefix or next fields. It will always use the table field, // unless the only key is empty. // prefix is the difference in keys between this trie node and the next. // In the example above, node n4 has prefix "cbc" and n4's next node is n5. // Node n5 has no children and so has zero prefix, next and table fields. prefix string next *trieNode // table is a lookup table indexed by the next byte in the key, after // remapping that byte through genericReplacer.mapping to create a dense // index. In the example above, the keys only use 'a', 'b', 'c', 'x' and // 'y', which remap to 0, 1, 2, 3 and 4. All other bytes map to 5, so that // genericReplacer.tableSize will be 6. Node n0's table will be // []*trieNode{ 0:n1, 1:n4, 3:n6 }, where the 0, 1 and 3 are the remapped // 'a', 'b' and 'x'. table []*trieNode } https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace.go#... src/pkg/strings/replace.go:86: // Priority of this match vs superstrings. Higher is better, and 0 means What is a superstring? https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace.go#... src/pkg/strings/replace.go:89: // If this node has multiple children (and thus with multiple different http://golang.org/doc/effective_go.html#commentary says that "The first sentence [of a documentation comment] should be a one-sentence summary that starts with the name being declared." Thus, we usually say something like "table is ..." instead of "If this node ...". https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace.go#... src/pkg/strings/replace.go:93: // If this node has exactly one child, this will be nonempty. Ah, "a/bra" means that this comment isn't true for the root node, right? https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace.go#... src/pkg/strings/replace.go:151: b := key[0] The next few lines simplify if this is m := r.mapping[key[0]] https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace_tes... File src/pkg/strings/replace_test.go (right): https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace_tes... src/pkg/strings/replace_test.go:305: testCases := []struct{ in, out string }{ It's worth testing "foo;foo" and "foo;foo;foo1" as inputs. It's probably also worth testing those in TestReplacer. https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace_tes... src/pkg/strings/replace_test.go:306: {"abc;abdef;abdefgh;xx;xy;z", `- table: Now that I understand how it works, the "table:" strings are kind of ugly, and I'd rather leave them out. https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace_tes... src/pkg/strings/replace_test.go:353: want := Replace(tc.out, "\t", "", -1) I would still do this by hand. A future refactoring could implement strings.Replace using strings.NewReplacer, and then this test might give false positives.
Sign in to reply to this message.
https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace_tes... File src/pkg/strings/replace_test.go (right): https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace_tes... src/pkg/strings/replace_test.go:305: testCases := []struct{ in, out string }{ On 2012/09/13 08:05:58, nigeltao wrote: > It's worth testing "foo;foo" and "foo;foo;foo1" as inputs. It's probably also > worth testing those in TestReplacer. Oh, I'd also like to see a "foo;;bar" TestGenericTrieBuilding test. Note the double-semi-colon, meaning an empty key.
Sign in to reply to this message.
http://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace.go File src/pkg/strings/replace.go (right): http://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace.go#n... src/pkg/strings/replace.go:204: root *trie Dropping the * would also save you one level of indirection.
Sign in to reply to this message.
On 2012/09/13 08:05:58, nigeltao wrote: > https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace.go > File src/pkg/strings/replace.go (right): > > https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace.go#... > src/pkg/strings/replace.go:83: type trie struct { > s/trie/trieNode/ might be a better name. Done. > > Also, I think that future maintainers (including you or me six months from now) > will be helped by a worked example. It took me a while to figure out how it all > works. In my example below, it wasn't obvious to me that the "cbc" prefix was > part of node n4, not n5. > > I've pasted a proposal below. WDYT? Looks good and thorough. I tweaked it a little. > > > // trieNode is a node in a lookup trie for prioritized key/value pairs. Keys > // and values may be empty. For example, the trie containing keys "ax", "ay", > // "bcbc", "x" and "xy" could have eight nodes: > // > // n0 - > // n1 a- > // n2 .x+ > // n3 .y+ > // n4 b- > // n5 .cbc+ > // n6 x+ > // n7 .y+ > // > // n0 is the root node, and its children are n1, n4 and n6; n1's children are > // n2 and n3; n4's child is n5; n6's child is n7. Nodes n0, n1 and n4 (marked > // with a trailing "-") are partial keys, and nodes n2, n3, n5, n6 and n7 > // (marked with a trailing "+") are complete keys. > type trieNode struct { > // value is the value of the trie node's key/value pair. It is empty if > // this node is not a complete key. > value string > // priority is the priority (higher is more important) of the trie node's > // key/value pair; keys are not necessarily matched shortest- or longest- > // first. Priority is positive if this node is a complete key, and zero > // otherwise. In the example above, positive/zero priorities are marked > // with a trailing "+" or "-". > priority int > > // A trie node may have zero, one or more child nodes: > // * if zero, the remaining fields are zero. > // * if one, prefix and next are non-zero. > // * if more, table is non-zero. > // > // For lookup efficiency, the root node is an exception, and will never > // use its prefix or next fields. It will always use the table field, > // unless the only key is empty. > > // prefix is the difference in keys between this trie node and the next. > // In the example above, node n4 has prefix "cbc" and n4's next node is n5. > // Node n5 has no children and so has zero prefix, next and table fields. > prefix string > next *trieNode > > // table is a lookup table indexed by the next byte in the key, after > // remapping that byte through genericReplacer.mapping to create a dense > // index. In the example above, the keys only use 'a', 'b', 'c', 'x' and > // 'y', which remap to 0, 1, 2, 3 and 4. All other bytes map to 5, so that > // genericReplacer.tableSize will be 6. Node n0's table will be tablesize will be 5 here, and lookup explicitly checks for the mapping yielding tablesize, and aborts if so. I did it this way instead of having a space at the end of the table for "no match", because it makes the performance 5-15% faster. > // []*trieNode{ 0:n1, 1:n4, 3:n6 }, where the 0, 1 and 3 are the remapped > // 'a', 'b' and 'x'. > table []*trieNode > } > > https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace.go#... > src/pkg/strings/replace.go:86: // Priority of this match vs superstrings. Higher > is better, and 0 means > What is a superstring? I meant priority vs all potential children, to which this node is a prefix. Your version explains it better though. > > https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace.go#... > src/pkg/strings/replace.go:89: // If this node has multiple children (and thus > with multiple different > http://golang.org/doc/effective_go.html#commentary says that "The first sentence > [of a documentation comment] should be a one-sentence summary that starts with > the name being declared." Thus, we usually say something like "table is ..." > instead of "If this node ...". > > https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace.go#... > src/pkg/strings/replace.go:93: // If this node has exactly one child, this will > be nonempty. > Ah, "a/bra" means that this comment isn't true for the root node, right? Yes, the doc might more accurately explain it inversely: * if the remaining fields are zero, there are no children * if prefix and next are non-zero, there is one child in next * if table is non-zero, it defines all the children, 0-256 of them add just happens to implement it to favor using prefixes over tables to save memory. > > https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace.go#... > src/pkg/strings/replace.go:151: b := key[0] > The next few lines simplify if this is > m := r.mapping[key[0]] Done. > > https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace_tes... > File src/pkg/strings/replace_test.go (right): > > https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace_tes... > src/pkg/strings/replace_test.go:305: testCases := []struct{ in, out string }{ > It's worth testing "foo;foo" and "foo;foo;foo1" as inputs. It's probably also > worth testing those in TestReplacer. Done. The abra* test covers using duplicate keys. > > https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace_tes... > src/pkg/strings/replace_test.go:306: {"abc;abdef;abdefgh;xx;xy;z", `- table: > Now that I understand how it works, the "table:" strings are kind of ugly, and > I'd rather leave them out. Done. > > https://codereview.appspot.com/6492076/diff/12011/src/pkg/strings/replace_tes... > src/pkg/strings/replace_test.go:353: want := Replace(tc.out, "\t", "", -1) > I would still do this by hand. A future refactoring could implement > strings.Replace using strings.NewReplacer, and then this test might give false > positives. Done.
Sign in to reply to this message.
This is getting pretty close. https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go File src/pkg/strings/replace.go (right): https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... src/pkg/strings/replace.go:128: // 'y', which remap to 0, 1, 2, 3 and 4. All other bytes map to 5, and My fault, but changing the second "map" to "remap" would be more consistent. https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... src/pkg/strings/replace.go:156: // The current prefixed node ends up in prefixnode, Maybe // Looking up what is currently t.prefix[0] will lead to prefixnode, and // looking up key[0] will lead to keynode. Also, s/prefixnode/prefixNode/ and s/keynode/keyNode/. https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... src/pkg/strings/replace.go:190: next := t.table[m] You could just write t.table[m].add(etc) on the next line. https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... src/pkg/strings/replace.go:240: // The size of a trie node's lookup table: the number of unique key bytes. // tableSize is the size of a trie node's lookup table. It is the number of // unique key bytes. https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... src/pkg/strings/replace.go:242: // At the index of each byte found anywhere in any key, this will contain // mapping maps from key bytes to a dense index for trieNode.table. https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... src/pkg/strings/replace.go:249: t := &r.root Just use r.root below: r.root.table = make(etc) r.root.add(etc) https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... src/pkg/strings/replace.go:275: key, val := oldnew[i], oldnew[i+1] I'm not sure if the key and val variables add much. I'd make the next line r.root.add(oldnew[i], oldnew[i+1], len(oldnew)-i, r) https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... src/pkg/strings/replace.go:309: // WriteString writes the replaced version of s to w, using the same logic as I'd just use the same comment as the one on Replacer.WriteString // WriteString writes s to w with all replacements performed. and drop the parts about (1) using the same logic, and (2) avoiding small writes. Or just drop the doc comment entirely. It's not an exported type, and I don't think it adds much. Sorry for the busywork if I've asked you to add these comments before. https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... src/pkg/strings/replace.go:343: wn, err = sw.WriteString(s[last:]) Wrapping this line and the next with a if last != len(s) might save a few unnecessary writes. https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... src/pkg/strings/replace.go:490: var discard io.Writer = devNull(0) Delete discard and devNull (lines 488-496). https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace_tes... File src/pkg/strings/replace_test.go (right): https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace_tes... src/pkg/strings/replace_test.go:226: blankLowPriority := NewReplacer("a", "A", "", "X") Rename blankToXOToO as blankHighPriority? Use the same inputs for both: "oo", "ii", "iooi", "oiio" and ""? https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace_tes... src/pkg/strings/replace_test.go:363: var wantbuf []byte wantbuf := make([]byte, 0, len(tc.out)) might save a few re-allocations.
Sign in to reply to this message.
On 2012/09/14 04:10:28, nigeltao wrote: > This is getting pretty close. > > https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go > File src/pkg/strings/replace.go (right): > > https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... > src/pkg/strings/replace.go:128: // 'y', which remap to 0, 1, 2, 3 and 4. All > other bytes map to 5, and > My fault, but changing the second "map" to "remap" would be more consistent. > > https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... > src/pkg/strings/replace.go:156: // The current prefixed node ends up in > prefixnode, > Maybe > // Looking up what is currently t.prefix[0] will lead to prefixnode, and > // looking up key[0] will lead to keynode. > > Also, s/prefixnode/prefixNode/ and s/keynode/keyNode/. > > https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... > src/pkg/strings/replace.go:190: next := t.table[m] > You could just write > t.table[m].add(etc) > on the next line. > > https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... > src/pkg/strings/replace.go:240: // The size of a trie node's lookup table: the > number of unique key bytes. > // tableSize is the size of a trie node's lookup table. It is the number of > // unique key bytes. > > https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... > src/pkg/strings/replace.go:242: // At the index of each byte found anywhere in > any key, this will contain > // mapping maps from key bytes to a dense index for trieNode.table. > > https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... > src/pkg/strings/replace.go:249: t := &r.root > Just use r.root below: > r.root.table = make(etc) > r.root.add(etc) > > https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... > src/pkg/strings/replace.go:275: key, val := oldnew[i], oldnew[i+1] > I'm not sure if the key and val variables add much. I'd make the next line > r.root.add(oldnew[i], oldnew[i+1], len(oldnew)-i, r) > > https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... > src/pkg/strings/replace.go:309: // WriteString writes the replaced version of s > to w, using the same logic as > I'd just use the same comment as the one on Replacer.WriteString > // WriteString writes s to w with all replacements performed. > and drop the parts about (1) using the same logic, and (2) avoiding small > writes. > > Or just drop the doc comment entirely. It's not an exported type, and I don't > think it adds much. > > Sorry for the busywork if I've asked you to add these comments before. > > https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... > src/pkg/strings/replace.go:343: wn, err = sw.WriteString(s[last:]) > Wrapping this line and the next with a > if last != len(s) > might save a few unnecessary writes. > > https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace.go#... > src/pkg/strings/replace.go:490: var discard io.Writer = devNull(0) > Delete discard and devNull (lines 488-496). > > https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace_tes... > File src/pkg/strings/replace_test.go (right): > > https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace_tes... > src/pkg/strings/replace_test.go:226: blankLowPriority := NewReplacer("a", "A", > "", "X") > Rename blankToXOToO as blankHighPriority? Use the same inputs for both: "oo", > "ii", "iooi", "oiio" and ""? > > https://codereview.appspot.com/6492076/diff/13010/src/pkg/strings/replace_tes... > src/pkg/strings/replace_test.go:363: var wantbuf []byte > wantbuf := make([]byte, 0, len(tc.out)) > might save a few re-allocations. Done all.
Sign in to reply to this message.
LGTM, submitting... Thanks for your patience in this code review. https://codereview.appspot.com/6492076/diff/6006/src/pkg/strings/replace.go File src/pkg/strings/replace.go (right): https://codereview.appspot.com/6492076/diff/6006/src/pkg/strings/replace.go#n... src/pkg/strings/replace.go:113: // * if prefix and next are non-zero, there is one child in next Trailing full stop, and ditto on the next line. https://codereview.appspot.com/6492076/diff/6006/src/pkg/strings/replace.go#n... src/pkg/strings/replace.go:156: // what is crrently t.prefix[0] will lead to prefixNode, and Typo in currently. https://codereview.appspot.com/6492076/diff/6006/src/pkg/strings/replace_test.go File src/pkg/strings/replace_test.go (right): https://codereview.appspot.com/6492076/diff/6006/src/pkg/strings/replace_test... src/pkg/strings/replace_test.go:236: testCase{blankHighPriority, "iooi", "XiXOXOXiX"}, I'd also add an "oiio" test.
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=23b89fb47d66 *** strings: implement a faster generic Replacer This also fixes the semantics of some corner cases with the empty match. TODOs for genericReplacer in the tests are fixed. benchmark old ns/op new ns/op delta BenchmarkGenericNoMatch 71395 3132 -95.61% BenchmarkGenericMatch1 75610 20280 -73.18% BenchmarkGenericMatch2 837995 86725 -89.65% R=nigeltao, rsc CC=golang-dev http://codereview.appspot.com/6492076 Committer: Nigel Tao <nigeltao@golang.org>
Sign in to reply to this message.
|