|
|
Descriptionstrings: add Replacer, NewReplacer
This is just a new API to do many replacements at once.
While the point of this API is to be faster than doing replacements one
at a time, the implementation in this CL has the optimizations removed
and may actually be slower.
Future CLs will bring back & add optimizations.
Patch Set 1 #Patch Set 2 : diff -r 4d792b5bea35 https://go.googlecode.com/hg #Patch Set 3 : diff -r 4d792b5bea35 https://go.googlecode.com/hg #
Total comments: 4
Patch Set 4 : diff -r 9bb6275dfc1f https://go.googlecode.com/hg #
Total comments: 10
Patch Set 5 : diff -r b1e0821538e1 https://go.googlecode.com/hg #
MessagesTotal messages: 24
Hello golang-dev@googlegroups.com (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg
Sign in to reply to this message.
there's a lot of stuff here, and it needs to be done for bytes as well. i wonder if it would be better to do it there, and have string wrap it. http://codereview.appspot.com/5081042/diff/5001/src/pkg/strings/replace.go File src/pkg/strings/replace.go (right): http://codereview.appspot.com/5081042/diff/5001/src/pkg/strings/replace.go#ne... src/pkg/strings/replace.go:25: anyMultiByte := false you don't need the flag. just return inside the loop and outside http://codereview.appspot.com/5081042/diff/5001/src/pkg/strings/replace.go#ne... src/pkg/strings/replace.go:42: // d http://codereview.appspot.com/5081042/diff/5001/src/pkg/strings/replace.go#ne... src/pkg/strings/replace.go:86: // TODO(bradfitz): this isn't optimized like byteKeyReplace is s/like/as/ http://codereview.appspot.com/5081042/diff/5001/src/pkg/strings/replace.go#ne... src/pkg/strings/replace.go:122: // byteKeyReplace implements Replace for ReplacementMaps which have s/which/that/
Sign in to reply to this message.
On Mon, Sep 19, 2011 at 4:49 PM, <r@golang.org> wrote: > there's a lot of stuff here, and it needs to be done for bytes as well. > i wonder if it would be better to do it there, and have string wrap it. > Maybe, but I'm skeptical. That negates the point of it being for speed, having to do needless copies. (looks at calendar to decide whether it's been too soon since I last grumbled about string vs []byte....)
Sign in to reply to this message.
the map only needs to exist in one place. strings can have a map that just embeds the bytes one. although that dependency arc does not exist today, i think it's ok. all the code can be in bytes (so they're methods on ReplacementMap); the strings package can wrap them trivially for clean access. there's no extra copying called for. (also i need to warm to the idea in general;this is a lot of mechanism for something very simple in principle.) -rob
Sign in to reply to this message.
On Mon, Sep 19, 2011 at 5:01 PM, Rob 'Commander' Pike <r@golang.org> wrote: > the map only needs to exist in one place. strings can have a map that > just embeds the bytes one. although that dependency arc does not exist > today, i think it's ok. > > all the code can be in bytes (so they're methods on ReplacementMap); > the strings package can wrap them trivially for clean access. there's > no extra copying called for. > Okay. Considering the pushback I got on http://codereview.appspot.com/4969049/ (bytes.EqualsString), I would've thought bytes/strings mixing was verboten. > (also i need to warm to the idea in general;this is a lot of mechanism > for something very simple in principle.) > FWIW, I also tried with both of exp/regexp's Replace*Func but it was very much slower. Something like 15000 ns.
Sign in to reply to this message.
This is more complex than using strings.Replace and yet still not the fastest possible way to write the code (which would be a loop, and preferably something that didn't return a string, like template.HTMLEscape). That is, it's not simplest and not fastest. That makes me wonder where it would make sense to use. In http, sanitizeName, sanitizeValue, and WriteSubset don't need this: they should be using strings.Map instead of multiple Replace calls. I would be interested to see a comparison of the speed of strings.Map for those cases vs this code. That leaves http.htmlEscape and multipart.escapeQuotes. It's hard to believe that either really needs the speed here, so I think the replacement has to compete on simplicity. The current escapeQuotes is (or should be): func escapeQuotes(s string) string { s = strings.Replace(s, `\`, `\\`, -1) s = strings.Replace(s, `"`, `\"`, -1) return s } The replacement is: var quoteEscapeMap = strings.NewReplacementMap(map[string]string{ `\`: `\\`, `"`: `\"`, }) func escapeQuotes(s string) string { return quoteEscapeMap.Replace(s, -1) } So far, it's not much simpler. If we could cut it to: var quoteEscaper = strings.NewReplacer(`\`, `\\`, `"`, `\"`) func escapeQuotes(s string) string { return quoteEscaper.Replace(s) } then I think it's a more compelling story. Even more compelling would be if the API were: func NewReplacer(args ...string) *Replacer func Replace(s string) string func ReplaceBytes(b []byte) []byte func RewriteBytes(b []byte) func WriteString(w io.Writer, s string) (n int, err os.Error) func WriteBytes(w io.Writer, b []byte) (n int, err os.Error) because then you could rewrite template's: var ( htmlQuot = []byte(""") // shorter than """ htmlApos = []byte("'") // shorter than "'" htmlAmp = []byte("&") htmlLt = []byte("<") htmlGt = []byte(">") ) func HTMLEscape(w io.Writer, b []byte) { last := 0 for i, c := range b { var html []byte switch c { case '"': html = htmlQuot case '\'': html = htmlApos case '&': html = htmlAmp case '<': html = htmlLt case '>': html = htmlGt default: continue } w.Write(b[last:i]) w.Write(html) last = i + 1 } w.Write(b[last:]) } into var htmlEscaper = strings.Replacer(`"`, `"`, `'`, `'`, `&`, `&`, `<`, `<`, `>`, `>`) func HTMLEscape(w io.Writer, b []byte) { htmlEscaper.WriteBytes(w, b) } without loss of speed. If we can get there, then that's interesting. Can we get there? Russ
Sign in to reply to this message.
I like that API proposal. I should've flagged my CL as "for discussion". In any case, your proposal is consistent with where I wanted to get to: call sites that passed their whole intent down to the strings package where the execution strategy could be optimized better (even if not necessarily today), rather than callers doing stuff in multiple steps where there's no hope of an ideal plan. How should RewriteBytes works if it doesn't return []byte? What if it needs to grow? On Tue, Sep 20, 2011 at 7:27 AM, Russ Cox <rsc@golang.org> wrote: > This is more complex than using strings.Replace > and yet still not the fastest possible way to write > the code (which would be a loop, and preferably something > that didn't return a string, like template.HTMLEscape). > That is, it's not simplest and not fastest. That makes > me wonder where it would make sense to use. > > In http, sanitizeName, sanitizeValue, and WriteSubset > don't need this: they should be using strings.Map > instead of multiple Replace calls. I would be interested > to see a comparison of the speed of strings.Map for > those cases vs this code. > > That leaves http.htmlEscape and multipart.escapeQuotes. > It's hard to believe that either really needs the speed here, > so I think the replacement has to compete on simplicity. > The current escapeQuotes is (or should be): > > func escapeQuotes(s string) string { > s = strings.Replace(s, `\`, `\\`, -1) > s = strings.Replace(s, `"`, `\"`, -1) > return s > } > > The replacement is: > > var quoteEscapeMap = strings.NewReplacementMap(map[string]string{ > `\`: `\\`, > `"`: `\"`, > }) > func escapeQuotes(s string) string { > return quoteEscapeMap.Replace(s, -1) > } > > So far, it's not much simpler. If we could cut it to: > > var quoteEscaper = strings.NewReplacer(`\`, `\\`, `"`, `\"`) > > func escapeQuotes(s string) string { > return quoteEscaper.Replace(s) > } > > then I think it's a more compelling story. Even more > compelling would be if the API were: > > func NewReplacer(args ...string) *Replacer > func Replace(s string) string > func ReplaceBytes(b []byte) []byte > func RewriteBytes(b []byte) > func WriteString(w io.Writer, s string) (n int, err os.Error) > func WriteBytes(w io.Writer, b []byte) (n int, err os.Error) > > because then you could rewrite template's: > > > var ( > htmlQuot = []byte(""") // shorter than """ > htmlApos = []byte("'") // shorter than "'" > htmlAmp = []byte("&") > htmlLt = []byte("<") > htmlGt = []byte(">") > ) > > func HTMLEscape(w io.Writer, b []byte) { > last := 0 > for i, c := range b { > var html []byte > switch c { > case '"': > html = htmlQuot > case '\'': > html = htmlApos > case '&': > html = htmlAmp > case '<': > html = htmlLt > case '>': > html = htmlGt > default: > continue > } > w.Write(b[last:i]) > w.Write(html) > last = i + 1 > } > w.Write(b[last:]) > } > > into > > var htmlEscaper = strings.Replacer(`"`, `"`, `'`, `'`, `&`, > `&`, `<`, `<`, `>`, `>`) > > func HTMLEscape(w io.Writer, b []byte) { > htmlEscaper.WriteBytes(w, b) > } > > without loss of speed. If we can get there, then that's interesting. > Can we get there? > > Russ >
Sign in to reply to this message.
interesting. russ came up with the same API i was about to suggest. it loses the ability to store state under a set of substitutions, but at least they can be grouped into a slice and passed with ... -rob
Sign in to reply to this message.
On Tue, Sep 20, 2011 at 10:50, Brad Fitzpatrick <bradfitz@golang.org> wrote: > How should RewriteBytes works if it doesn't return []byte? What if it needs > to grow? Sorry, I realized after sending that it needed to return []byte and also that in some cases it would still need to reallocate. It's not clear how worthwhile it is. I'd drop it from this round. In fact I'd drop all the cleverness from this round and just do the simple loop over a list of pairs. It would be a lot easier to do algorithms in a separate CL. Russ
Sign in to reply to this message.
On Tue, Sep 20, 2011 at 7:52 AM, Rob 'Commander' Pike <r@golang.org> wrote: > interesting. russ came up with the same API i was about to suggest. it > loses the ability to store state under a set of substitutions, but at > least they can be grouped into a slice and passed with ... Why can't I store state? From Russ' proposal: func NewReplacer(args ...string) *Replacer I could do: type Replacer struct { ...state... }
Sign in to reply to this message.
On 20 September 2011 15:27, Russ Cox <rsc@golang.org> wrote: > func NewReplacer(args ...string) *Replacer > func Replace(s string) string > func ReplaceBytes(b []byte) []byte > func RewriteBytes(b []byte) > func WriteString(w io.Writer, s string) (n int, err os.Error) > func WriteBytes(w io.Writer, b []byte) (n int, err os.Error) would this be part of the strings package, the bytes package, split into each, or in its own package?
Sign in to reply to this message.
On Tue, Sep 20, 2011 at 7:53 AM, Russ Cox <rsc@golang.org> wrote: > On Tue, Sep 20, 2011 at 10:50, Brad Fitzpatrick <bradfitz@golang.org> > wrote: > > How should RewriteBytes works if it doesn't return []byte? What if it > needs > > to grow? > > Sorry, I realized after sending that it needed to return []byte > and also that in some cases it would still need to reallocate. > It's not clear how worthwhile it is. I'd drop it from this round. > > In fact I'd drop all the cleverness from this round and just > do the simple loop over a list of pairs. It would be a lot > easier to do algorithms in a separate CL. > Works for me. Should NewReplacer return an os.Error() then for unsupported replacement pairs? Or panic? If the latter, should it be named "Must" somehow?
Sign in to reply to this message.
you're right. i'm lacking my first coffee. i was thinking of just making Replace be a variadic function. even simpler. -rob
Sign in to reply to this message.
On Tue, Sep 20, 2011 at 10:56, roger peppe <rogpeppe@gmail.com> wrote: > would this be part of the strings package, the bytes package, > split into each, or in its own package? strings only. it is a string replacer.
Sign in to reply to this message.
i was looking at ReplaceBytes, WriteBytes, etc which are more byte oriented. the WriteBytes and WriteString methods make me feel slightly uncomfortable, although i can't think of another way of nicely optimising the no-replace case, which is presumably why they're there. On 20 September 2011 16:18, Russ Cox <rsc@golang.org> wrote: > On Tue, Sep 20, 2011 at 10:56, roger peppe <rogpeppe@gmail.com> wrote: >> would this be part of the strings package, the bytes package, >> split into each, or in its own package? > > strings only. > it is a string replacer. >
Sign in to reply to this message.
On Tue, Sep 20, 2011 at 11:57, roger peppe <rogpeppe@gmail.com> wrote: > the WriteBytes and WriteString methods make me feel > slightly uncomfortable, although i can't think of another > way of nicely optimising the no-replace case, which is > presumably why they're there. they are there so that they can use multiple Write calls and zero allocations. russ
Sign in to reply to this message.
On 20 September 2011 17:24, Russ Cox <rsc@golang.org> wrote: > On Tue, Sep 20, 2011 at 11:57, roger peppe <rogpeppe@gmail.com> wrote: >> the WriteBytes and WriteString methods make me feel >> slightly uncomfortable, although i can't think of another >> way of nicely optimising the no-replace case, which is >> presumably why they're there. > > they are there so that they can use multiple Write calls > and zero allocations. ah, interesting. i'd assumed that they'd each do a single Write.
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, r@golang.org, rsc@golang.org, rogpeppe@gmail.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
http://codereview.appspot.com/5081042/diff/5003/src/pkg/http/cookie.go File src/pkg/http/cookie.go (right): http://codereview.appspot.com/5081042/diff/5003/src/pkg/http/cookie.go#newcod... src/pkg/http/cookie.go:210: var nameReplacer = strings.NewReplacer("\n", "-", "\r", "-") I still think these should be using strings.Map. http://codereview.appspot.com/5081042/diff/5003/src/pkg/http/header.go File src/pkg/http/header.go (right): http://codereview.appspot.com/5081042/diff/5003/src/pkg/http/header.go#newcode50 src/pkg/http/header.go:50: var newlinesToSpaces = strings.NewReplacer("\n", " ", "\r", " ") strings.Map.
Sign in to reply to this message.
http://codereview.appspot.com/5081042/diff/5003/src/pkg/http/cookie.go File src/pkg/http/cookie.go (right): http://codereview.appspot.com/5081042/diff/5003/src/pkg/http/cookie.go#newcod... src/pkg/http/cookie.go:210: var nameReplacer = strings.NewReplacer("\n", "-", "\r", "-") On 2011/09/20 21:30:50, rsc wrote: > I still think these should be using strings.Map. I like Replacer here (and below). strings.Map is an implementation. strings.NewReplacer specifies a goal. The returned Replacer can be implemented in terms of Map if that's the best implementation. In this case, Replacer can notice that that all the inputs (and outputs!) are single ASCII bytes and avoid anything having to do with runes or even being UTF-8 aware. Using strings.Map, you'd have to do at least two passes over the input string to know the final size and fill it, or allocate some slack / use append. With Replacer, this could be one pass and 1 exactly-sized allocation.
Sign in to reply to this message.
ping On Tue, Sep 20, 2011 at 2:49 PM, <bradfitz@golang.org> wrote: > > http://codereview.appspot.com/**5081042/diff/5003/src/pkg/**http/cookie.go<ht... > File src/pkg/http/cookie.go (right): > > http://codereview.appspot.com/**5081042/diff/5003/src/pkg/** > http/cookie.go#newcode210<http://codereview.appspot.com/5081042/diff/5003/src/pkg/http/cookie.go#newcode210> > src/pkg/http/cookie.go:210: var nameReplacer = strings.NewReplacer("\n", > "-", "\r", "-") > On 2011/09/20 21:30:50, rsc wrote: > >> I still think these should be using strings.Map. >> > > I like Replacer here (and below). > > strings.Map is an implementation. > > strings.NewReplacer specifies a goal. > > The returned Replacer can be implemented in terms of Map if that's the > best implementation. > > In this case, Replacer can notice that that all the inputs (and > outputs!) are single ASCII bytes and avoid anything having to do with > runes or even being UTF-8 aware. > > Using strings.Map, you'd have to do at least two passes over the input > string to know the final size and fill it, or allocate some slack / use > append. With Replacer, this could be one pass and 1 exactly-sized > allocation. > > http://codereview.appspot.com/**5081042/<http://codereview.appspot.com/5081042/> >
Sign in to reply to this message.
LGTM Please revert the uses for now, until it's fast. http://codereview.appspot.com/5081042/diff/5003/src/pkg/strings/replace.go File src/pkg/strings/replace.go (right): http://codereview.appspot.com/5081042/diff/5003/src/pkg/strings/replace.go#ne... src/pkg/strings/replace.go:28: // NewReplacer returns a new Replacer from a list of old and new strings. s/old and new strings/old, new string pairs. http://codereview.appspot.com/5081042/diff/5003/src/pkg/strings/replace.go#ne... src/pkg/strings/replace.go:29: // Replacements are performed in order, without overlapping matches. we'll need to be more specific about the semantics but it can wait. i don't have any good sentences now. http://codereview.appspot.com/5081042/diff/5003/src/pkg/strings/replace.go#ne... src/pkg/strings/replace.go:32: panic("odd number of elements") panic("strings.NewReplacer: odd argument count") http://codereview.appspot.com/5081042/diff/5003/src/pkg/strings/replace.go#ne... src/pkg/strings/replace.go:60: // WriteString writes s to w with all replacements performed. two spaces before w (should be one)
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=9a44dedca5dd *** strings: add Replacer, NewReplacer This is just a new API to do many replacements at once. While the point of this API is to be faster than doing replacements one at a time, the implementation in this CL has the optimizations removed and may actually be slower. Future CLs will bring back & add optimizations. R=r, rsc, rogpeppe CC=golang-dev http://codereview.appspot.com/5081042 http://codereview.appspot.com/5081042/diff/5003/src/pkg/strings/replace.go File src/pkg/strings/replace.go (right): http://codereview.appspot.com/5081042/diff/5003/src/pkg/strings/replace.go#ne... src/pkg/strings/replace.go:28: // NewReplacer returns a new Replacer from a list of old and new strings. On 2011/09/28 16:17:59, rsc wrote: > s/old and new strings/old, new string pairs. Done. http://codereview.appspot.com/5081042/diff/5003/src/pkg/strings/replace.go#ne... src/pkg/strings/replace.go:32: panic("odd number of elements") On 2011/09/28 16:17:59, rsc wrote: > panic("strings.NewReplacer: odd argument count") Done. http://codereview.appspot.com/5081042/diff/5003/src/pkg/strings/replace.go#ne... src/pkg/strings/replace.go:60: // WriteString writes s to w with all replacements performed. On 2011/09/28 16:17:59, rsc wrote: > two spaces before w (should be one) Done.
Sign in to reply to this message.
This is old - but the most relevant thread I could find: Any plans for a bytes.Replacer? (the strings.Replacer is screaming fast, and it would be really nice to have that speed in the bytes department too). Bjørn Erik Pedersen tirsdag 20. september 2011 01.38.50 UTC+2 skrev Brad Fitzpatrick følgende: > > Reviewers: golang-dev_googlegroups.com, > > Message: > Hello golan...@googlegroups.com <javascript:> (cc: > golan...@googlegroups.com <javascript:>), > > I'd like you to review this change to > https://go.googlecode.com/hg > > > Description: > strings: add ReplacementMap > > Faster way to apply multiple replacements against a string at > once, minimizing copies. With an optimized case for when all > the replacement source strings are single bytes. > > strings_test.BenchmarkOldHTTPHTMLReplace 1000000 2918 ns/op > strings_test.BenchmarkReplaceMapSingleByte 1000000 1330 ns/op > strings_test.BenchmarkReplaceMap 500000 5240 ns/op > > Please review this at http://codereview.appspot.com/5081042/ > > Affected files: > M src/pkg/http/cookie.go > M src/pkg/http/header.go > M src/pkg/http/server.go > M src/pkg/mime/multipart/writer.go > M src/pkg/strings/Makefile > A src/pkg/strings/replace.go > A src/pkg/strings/replace_test.go > > >
Sign in to reply to this message.
|