|
|
Created:
14 years, 1 month ago by bradfitz Modified:
14 years, 1 month ago Reviewers:
CC:
r, rog, eh, rsc, golang-dev Visibility:
Public. |
Descriptionstrings: Map: avoid allocation when string is unchanged
This speeds up strings.ToLower, etc.
before/after:
strings_test.BenchmarkMapNoChanges 1000000 1013 ns/op
strings_test.BenchmarkMapNoChanges 5000000 442 ns/op
Patch Set 1 #Patch Set 2 : diff -r 11c0906fdf7d https://go.googlecode.com/hg/ #Patch Set 3 : diff -r 11c0906fdf7d https://go.googlecode.com/hg/ #
Total comments: 4
Patch Set 4 : diff -r a2cc9baea5d3 https://go.googlecode.com/hg/ #
MessagesTotal messages: 24
Hello golang-dev@googlegroups.com (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/
Sign in to reply to this message.
http://codereview.appspot.com/4306056/diff/4001/src/pkg/strings/strings.go File src/pkg/strings/strings.go (right): http://codereview.appspot.com/4306056/diff/4001/src/pkg/strings/strings.go#ne... src/pkg/strings/strings.go:331: b = []byte(s) if mapping() throws away almost all characters and s is big, this does too much allocation. we at least garbage collect b (return a copy of it) but i'm a little unhappy. i feel there's a refinement to the old method that can avoid this problem yes, the old code has this problem too, so if you can't think of anything it's still an improvement. maybe i'm overthinking it. pushbacks accepted. http://codereview.appspot.com/4306056/diff/4001/src/pkg/strings/strings_test.go File src/pkg/strings/strings_test.go (right): http://codereview.appspot.com/4306056/diff/4001/src/pkg/strings/strings_test.... src/pkg/strings/strings_test.go:444: } you don't need reflect and unsafe for this. use fmt.Sprintf("%p", orig) == fmt.Sprintf("%p", m)
Sign in to reply to this message.
http://codereview.appspot.com/4306056/diff/4001/src/pkg/strings/strings_test.go File src/pkg/strings/strings_test.go (right): http://codereview.appspot.com/4306056/diff/4001/src/pkg/strings/strings_test.... src/pkg/strings/strings_test.go:444: } On 2011/03/28 01:58:38, r wrote: > you don't need reflect and unsafe for this. use fmt.Sprintf("%p", orig) == > fmt.Sprintf("%p", m) %p is the pointer to the string header, not the contents, right? package main import ( "fmt" "reflect" "unsafe" ) func main() { a := "foo" b := a c := string([]byte("foo")) fmt.Printf("%p\n", a) fmt.Printf("%p\n", b) fmt.Printf("%p\n", c) fmt.Printf("%p\n", &a) fmt.Printf("%p\n", &b) fmt.Printf("%p\n", &c) fmt.Printf("%#v\n", (*reflect.StringHeader)(unsafe.Pointer(&a))) fmt.Printf("%#v\n", (*reflect.StringHeader)(unsafe.Pointer(&b))) fmt.Printf("%#v\n", (*reflect.StringHeader)(unsafe.Pointer(&c))) } Outputs: %!p(string=foo) %!p(string=foo) %!p(string=foo) 0x978e21b8 0x978e21f8 # NOTE: different from previous line 0x978e21f0 &reflect.StringHeader{Data:0x809d7fc, Len:3} &reflect.StringHeader{Data:0x809d7fc, Len:3} &reflect.StringHeader{Data:0x978e21e0, Len:3}
Sign in to reply to this message.
maybe. i'm surprised, though. it's not much use to print the address of the header. i'll look in this tomorrow; maybe i've forgotten why it does this. in any case, &str[0] will work. -rob
Sign in to reply to this message.
On Sun, Mar 27, 2011 at 8:20 PM, Rob 'Commander' Pike <r@golang.org> wrote: > maybe. i'm surprised, though. it's not much use to print the address > of the header. i'll look in this tomorrow; maybe i've forgotten why > it does this. > > in any case, &str[0] will work. > But string contents aren't addressable, I thought? package main import "fmt" func main() { a := "abc" fmt.Printf("%p", &a[0]) } test.go:5: cannot take the address of a[0]
Sign in to reply to this message.
I've paged in what I was after. If you compute the address of the cap'th element of a slice, you get the same answer if they hold the same underlying array. Strings don't have caps. So I'm disappointed to agree that you need unsafe to test this. It feels wrong, though, to depend on the layout of a string header in order to achieve this. Is it worth testing? Seems fragile. I'm probably overthinking again. -rob
Sign in to reply to this message.
looks good but you should do the same transformation to bytes.Map, and in that test you can use the address-of-cap trick or just use SliceValue.Get to do it for you. -rob
Sign in to reply to this message.
On Sun, Mar 27, 2011 at 8:55 PM, Rob 'Commander' Pike <r@golang.org> wrote: ... > So I'm disappointed to agree that you need unsafe to test this. It > feels wrong, though, to depend on the layout of a string header in > order to achieve this. Is it worth testing? Seems fragile. > How's it fragile? The language spec doesn't say that that: func foo(a string) string { return a; } ...isn't _allowed_ to copy. But the spec does say that strings are immutable so it seems that any sane implementation would avoid an unnecessary copy.
Sign in to reply to this message.
Oh, wow, there's a lot of stuff in package bytes that I've never used. On Sun, Mar 27, 2011 at 8:59 PM, <r@golang.org> wrote: > looks good but you should do the same transformation to bytes.Map, and > in that test you can use the address-of-cap trick or just use > SliceValue.Get to do it for you. > > -rob > > > > http://codereview.appspot.com/4306056/ >
Sign in to reply to this message.
The test seems ungainly is what I meant. I wasn't impugning the Map function. -rob
Sign in to reply to this message.
http://codereview.appspot.com/4306056/diff/4001/src/pkg/strings/strings.go File src/pkg/strings/strings.go (right): http://codereview.appspot.com/4306056/diff/4001/src/pkg/strings/strings.go#ne... src/pkg/strings/strings.go:331: b = []byte(s) it would be easier to be more efficient if copy worked from string to []byte.
Sign in to reply to this message.
On 28 March 2011 09:07, <rogpeppe@gmail.com> wrote: > > http://codereview.appspot.com/4306056/diff/4001/src/pkg/strings/strings.go > File src/pkg/strings/strings.go (right): > > http://codereview.appspot.com/4306056/diff/4001/src/pkg/strings/strings.go#ne... > src/pkg/strings/strings.go:331: b = []byte(s) > it would be easier to be more efficient if copy worked from string to > []byte. Spec says: As a special case, copy also accepts a destination argument assignable to type []byte with a source argument of a string type. This form copies the bytes from the string into the byte slice. Is that what you wanted? Chris -- Chris "allusive" Dollin
Sign in to reply to this message.
On 28 March 2011 09:15, chris dollin <ehog.hedge@googlemail.com> wrote: > On 28 March 2011 09:07, <rogpeppe@gmail.com> wrote: >> >> http://codereview.appspot.com/4306056/diff/4001/src/pkg/strings/strings.go >> File src/pkg/strings/strings.go (right): >> >> http://codereview.appspot.com/4306056/diff/4001/src/pkg/strings/strings.go#ne... >> src/pkg/strings/strings.go:331: b = []byte(s) >> it would be easier to be more efficient if copy worked from string to >> []byte. > > Spec says: > > As a special case, copy also accepts a destination argument assignable > to type []byte with a source argument of a string type. This form copies > the bytes from the string into the byte slice. > > Is that what you wanted? indeed it is. that change must have sneaked in without me noticing...
Sign in to reply to this message.
On 28 March 2011 04:20, Rob 'Commander' Pike <r@golang.org> wrote: > maybe. i'm surprised, though. it's not much use to print the address > of the header. i'll look in this tomorrow; maybe i've forgotten why > it does this. it would be useful to be able to print a string's address with %p. if a different method name was used in reflect for getting pointer values, e.g. GetPtr() uintptr, then StringValue could have both Get() string and GetPtr() uintptr, and it would be easy to implement %p for strings in fmt.
Sign in to reply to this message.
On 28 March 2011 02:58, <r@golang.org> wrote: > > http://codereview.appspot.com/4306056/diff/4001/src/pkg/strings/strings.go > File src/pkg/strings/strings.go (right): > > http://codereview.appspot.com/4306056/diff/4001/src/pkg/strings/strings.go#ne... > src/pkg/strings/strings.go:331: b = []byte(s) > if mapping() throws away almost all characters and s is big, this does > too much allocation. we at least garbage collect b (return a copy of it) > but i'm a little unhappy. i feel there's a refinement to the old method > that can avoid this problem here's an alternative version of the code that never allocates more than the original version. // Map returns a copy of the string s with all its characters modified // according to the mapping function. If mapping returns a negative value, the character is // dropped from the string with no replacement. func Map(mapping func(rune int) int, s string) string { // Scan the string to find the first character that changes. // If no characters change, there is no need to allocate a // new string. var i, c, rune int for i, c = range s { if rune = mapping(c); rune != c { break } } if rune == c { return s } b := make([]byte, len(s)+utf8.UTFMax) nbytes := copy(b, s[0:i]) if rune >= 0 { nbytes += utf8.EncodeRune(b[nbytes:], rune) } for _, c := range s[i+utf8.RuneLen(c):] { rune := mapping(c) if rune >= 0 { wid := 1 if rune >= utf8.RuneSelf { wid = utf8.RuneLen(rune) } if nbytes+wid > len(b) { // Grow the buffer. nb := make([]byte, len(b)*2+utf8.UTFMax) copy(nb, b[0:nbytes]) b = nb } nbytes += utf8.EncodeRune(b[nbytes:], rune) } } return string(b[0:nbytes]) }
Sign in to reply to this message.
But you call the mapping function more often. Have you benchmarked a wide variety of inputs to see where the trade-off sweet spot is? I haven't. I was lazy so I made the minimal possible change that made things better but didn't fundamentally change how it worked. On Mon, Mar 28, 2011 at 3:28 AM, roger peppe <rogpeppe@gmail.com> wrote: > On 28 March 2011 02:58, <r@golang.org> wrote: > > > > > http://codereview.appspot.com/4306056/diff/4001/src/pkg/strings/strings.go > > File src/pkg/strings/strings.go (right): > > > > > http://codereview.appspot.com/4306056/diff/4001/src/pkg/strings/strings.go#ne... > > src/pkg/strings/strings.go:331: b = []byte(s) > > if mapping() throws away almost all characters and s is big, this does > > too much allocation. we at least garbage collect b (return a copy of it) > > but i'm a little unhappy. i feel there's a refinement to the old method > > that can avoid this problem > > here's an alternative version of the code that never allocates more than > the original version. > > // Map returns a copy of the string s with all its characters modified > // according to the mapping function. If mapping returns a negative > value, the character is > // dropped from the string with no replacement. > func Map(mapping func(rune int) int, s string) string { > // Scan the string to find the first character that changes. > // If no characters change, there is no need to allocate a > // new string. > var i, c, rune int > for i, c = range s { > if rune = mapping(c); rune != c { > break > } > } > if rune == c { > return s > } > > b := make([]byte, len(s)+utf8.UTFMax) > nbytes := copy(b, s[0:i]) > if rune >= 0 { > nbytes += utf8.EncodeRune(b[nbytes:], rune) > } > > for _, c := range s[i+utf8.RuneLen(c):] { > rune := mapping(c) > if rune >= 0 { > wid := 1 > if rune >= utf8.RuneSelf { > wid = utf8.RuneLen(rune) > } > if nbytes+wid > len(b) { > // Grow the buffer. > nb := make([]byte, len(b)*2+utf8.UTFMax) > copy(nb, b[0:nbytes]) > b = nb > } > nbytes += utf8.EncodeRune(b[nbytes:], rune) > } > } > return string(b[0:nbytes]) > } >
Sign in to reply to this message.
On Sun, Mar 27, 2011 at 8:59 PM, <r@golang.org> wrote: > looks good but you should do the same transformation to bytes.Map, and > in that test you can use the address-of-cap trick or just use > SliceValue.Get to do it for you. > Turns out I shouldn't make the change to bytes. The documentation of bytes.Map says that it returns a copy. Returning the same mutable slice could break existing programs. I'll submit the strings CL as-is and we can continue to iterate (on strings and/or bytes) as desired.
Sign in to reply to this message.
At the risk of bikeshedding this further, I think the CL is too complex for what it does. It should suffice to make the beginning of the loop say: rune := mapping(c) if b == nil { if rune == c { continue } b = make([]byte, maxbytes) nbytes = copy(b, s[:i]) } and leave the rest of the code (except the return s check) unchanged. Russ
Sign in to reply to this message.
On 28 March 2011 17:15, Brad Fitzpatrick <bradfitz@golang.org> wrote: > But you call the mapping function more often. i think i call the mapping function exactly the same number of times. (once per character) the main difference is that your change would always do two allocations when the string changes (one to convert from string to []byte, the other to append to that) > Have you benchmarked a wide variety of inputs to see where the trade-off > sweet spot is? I haven't. I was lazy so I made the minimal possible change > that made things better but didn't fundamentally change how it worked. no, i haven't benchmarked it because this change is mainly about allocations, but i left the original loop fairly untouched, just adding an initial loop to find the first changed character (apart from removing the unnecessary maxbytes) > On Mon, Mar 28, 2011 at 3:28 AM, roger peppe <rogpeppe@gmail.com> wrote: >> >> On 28 March 2011 02:58, <r@golang.org> wrote: >> > >> > >> > http://codereview.appspot.com/4306056/diff/4001/src/pkg/strings/strings.go >> > File src/pkg/strings/strings.go (right): >> > >> > >> > http://codereview.appspot.com/4306056/diff/4001/src/pkg/strings/strings.go#ne... >> > src/pkg/strings/strings.go:331: b = []byte(s) >> > if mapping() throws away almost all characters and s is big, this does >> > too much allocation. we at least garbage collect b (return a copy of it) >> > but i'm a little unhappy. i feel there's a refinement to the old method >> > that can avoid this problem >> >> here's an alternative version of the code that never allocates more than >> the original version. >> >> // Map returns a copy of the string s with all its characters modified >> // according to the mapping function. If mapping returns a negative >> value, the character is >> // dropped from the string with no replacement. >> func Map(mapping func(rune int) int, s string) string { >> // Scan the string to find the first character that changes. >> // If no characters change, there is no need to allocate a >> // new string. >> var i, c, rune int >> for i, c = range s { >> if rune = mapping(c); rune != c { >> break >> } >> } >> if rune == c { >> return s >> } >> >> b := make([]byte, len(s)+utf8.UTFMax) >> nbytes := copy(b, s[0:i]) >> if rune >= 0 { >> nbytes += utf8.EncodeRune(b[nbytes:], rune) >> } >> >> for _, c := range s[i+utf8.RuneLen(c):] { >> rune := mapping(c) >> if rune >= 0 { >> wid := 1 >> if rune >= utf8.RuneSelf { >> wid = utf8.RuneLen(rune) >> } >> if nbytes+wid > len(b) { >> // Grow the buffer. >> nb := make([]byte, len(b)*2+utf8.UTFMax) >> copy(nb, b[0:nbytes]) >> b = nb >> } >> nbytes += utf8.EncodeRune(b[nbytes:], rune) >> } >> } >> return string(b[0:nbytes]) >> } > >
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, r, bradfitzwork, rog, eh, rsc1 (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/
Sign in to reply to this message.
LGTM although is it really worth having maxbytes as a separate variable when we could just use len(b) ? On 28 March 2011 17:35, <bradfitz@golang.org> wrote: > Hello golang-dev@googlegroups.com, r, bradfitzwork, rog, eh, rsc1 (cc: > golang-dev@googlegroups.com), > > I'd like you to review this change to > https://go.googlecode.com/hg/ > > > http://codereview.appspot.com/4306056/ >
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=65815920b95a *** strings: Map: avoid allocation when string is unchanged This speeds up strings.ToLower, etc. before/after: strings_test.BenchmarkMapNoChanges 1000000 1013 ns/op strings_test.BenchmarkMapNoChanges 5000000 442 ns/op R=r, rog, eh, rsc CC=golang-dev http://codereview.appspot.com/4306056 Committer: Brad Fitzpatrick <bradfitz@golang.org>
Sign in to reply to this message.
Roger, Feel free to continue to work on it. I just wandered into that code on accident and need to pop back out to other stuff. - Brad On Mon, Mar 28, 2011 at 9:40 AM, roger peppe <rogpeppe@gmail.com> wrote: > LGTM > > although is it really worth having maxbytes as a separate > variable when we could just use len(b) ? > > On 28 March 2011 17:35, <bradfitz@golang.org> wrote: > > Hello golang-dev@googlegroups.com, r, bradfitzwork, rog, eh, rsc1 (cc: > > golang-dev@googlegroups.com), > > > > I'd like you to review this change to > > https://go.googlecode.com/hg/ > > > > > > http://codereview.appspot.com/4306056/ > > >
Sign in to reply to this message.
|