|
|
Created:
12 years, 6 months ago by rick Modified:
12 years, 5 months ago Reviewers:
CC:
remyoudompheng, r, rsc, golang-dev Visibility:
Public. |
Descriptionregexp: add Split() method
As discussed in issue 2672 and on golang-nuts, this CL adds a Split() method
to regexp. It is based on returning the "opposite" of FindAllString() so that
the returned substrings are everything not matched by the expression.
See: https://groups.google.com/forum/?fromgroups=#!topic/golang-nuts/xodBZh9Lh2E
Fixes issue 2762.
Patch Set 1 #Patch Set 2 : diff -r d465db6441bc https://code.google.com/p/go #Patch Set 3 : diff -r d465db6441bc https://code.google.com/p/go #Patch Set 4 : diff -r d465db6441bc https://code.google.com/p/go #
Total comments: 5
Patch Set 5 : diff -r d465db6441bc https://code.google.com/p/go #
Total comments: 5
Patch Set 6 : diff -r d465db6441bc https://code.google.com/p/go #
Total comments: 2
Patch Set 7 : diff -r d465db6441bc https://code.google.com/p/go #
Total comments: 4
Patch Set 8 : diff -r d465db6441bc https://code.google.com/p/go #Patch Set 9 : diff -r d465db6441bc https://code.google.com/p/go #MessagesTotal messages: 23
Hello golang-dev@googlegroups.com, I'd like you to review this change to https://code.google.com/p/go
Sign in to reply to this message.
Hello golang-dev@googlegroups.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
http://codereview.appspot.com/6846048/diff/7001/src/pkg/regexp/regexp.go File src/pkg/regexp/regexp.go (right): http://codereview.appspot.com/6846048/diff/7001/src/pkg/regexp/regexp.go#newc... src/pkg/regexp/regexp.go:1054: // indicates the maximum number of splits to perform when n >= 0. this looks inconsistent with strings.SplitN http://codereview.appspot.com/6846048/diff/7001/src/pkg/regexp/regexp.go#newc... src/pkg/regexp/regexp.go:1059: // The length of the returned slice is equal to the number of matches found + 1. this doesn't look correct, it seems it is rather #matches+1
Sign in to reply to this message.
http://codereview.appspot.com/6846048/diff/7001/src/pkg/regexp/all_test.go File src/pkg/regexp/all_test.go (right): http://codereview.appspot.com/6846048/diff/7001/src/pkg/regexp/all_test.go#ne... src/pkg/regexp/all_test.go:426: {"foo:and:bar", ":", -1, []string{"foo", "and", "bar"}}, always test empty strings! here we need to test empty input with matching and non-matching patterns, empty patterns with empt and non-empty input, etc. http://codereview.appspot.com/6846048/diff/7001/src/pkg/regexp/all_test.go#ne... src/pkg/regexp/all_test.go:444: t.Errorf("Split: test %d: expression doesn't compile: %s", i, test.r) include the error http://codereview.appspot.com/6846048/diff/7001/src/pkg/regexp/all_test.go#ne... src/pkg/regexp/all_test.go:448: split := re.Split(test.s, test.n) reflect.DeepEqual is easier than all this.
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, remyoudompheng@gmail.com, r@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
On 2012/11/13 23:52:36, r wrote: > http://codereview.appspot.com/6846048/diff/7001/src/pkg/regexp/all_test.go > File src/pkg/regexp/all_test.go (right): > > http://codereview.appspot.com/6846048/diff/7001/src/pkg/regexp/all_test.go#ne... > src/pkg/regexp/all_test.go:426: {"foo:and:bar", ":", -1, []string{"foo", "and", > "bar"}}, > always test empty strings! > here we need to test empty input with matching and non-matching patterns, > empty patterns with empt and non-empty input, etc. Done. Added 5 different empty string variations. > http://codereview.appspot.com/6846048/diff/7001/src/pkg/regexp/all_test.go#ne... > src/pkg/regexp/all_test.go:444: t.Errorf("Split: test %d: expression doesn't > compile: %s", i, test.r) > include the error Done. > http://codereview.appspot.com/6846048/diff/7001/src/pkg/regexp/all_test.go#ne... > src/pkg/regexp/all_test.go:448: split := re.Split(test.s, test.n) > reflect.DeepEqual is easier than all this. Done.
Sign in to reply to this message.
On 2012/11/13 23:47:08, remyoudompheng wrote: > http://codereview.appspot.com/6846048/diff/7001/src/pkg/regexp/regexp.go > File src/pkg/regexp/regexp.go (right): > > http://codereview.appspot.com/6846048/diff/7001/src/pkg/regexp/regexp.go#newc... > src/pkg/regexp/regexp.go:1054: // indicates the maximum number of splits to > perform when n >= 0. > this looks inconsistent with strings.SplitN Changed the comment to indicate it is consistent with regexp 'All' methods instead of strings.SplitN(). > http://codereview.appspot.com/6846048/diff/7001/src/pkg/regexp/regexp.go#newc... > src/pkg/regexp/regexp.go:1059: // The length of the returned slice is equal to > the number of matches found + 1. > this doesn't look correct, it seems it is rather #matches+1 Updated the comment to indicate it is the number of matches actually applied + 1. Does that look better?
Sign in to reply to this message.
PTAL
Sign in to reply to this message.
https://codereview.appspot.com/6846048/diff/8003/src/pkg/regexp/all_test.go File src/pkg/regexp/all_test.go (right): https://codereview.appspot.com/6846048/diff/8003/src/pkg/regexp/all_test.go#n... src/pkg/regexp/all_test.go:450: t.Errorf("Split: test %d: expression doesn't compile: %s; error: %s", i, test.r, err.Error()) needlessly verbose. #%d: %q: compile error: %s https://codereview.appspot.com/6846048/diff/8003/src/pkg/regexp/all_test.go#n... src/pkg/regexp/all_test.go:456: t.Errorf("Split: test %d: split = %#v; want = %#v", i, split, test.out) here and above s/Split: test %d/#%d: %q:/ the diagnostics from the testing package include the name of the test. https://codereview.appspot.com/6846048/diff/8003/src/pkg/regexp/regexp.go File src/pkg/regexp/regexp.go (right): https://codereview.appspot.com/6846048/diff/8003/src/pkg/regexp/regexp.go#new... src/pkg/regexp/regexp.go:1054: // the integer argument n indicates the maximum number of matches to split against "against"? does this mean n refers to the matches not the return value? it should be the return value, i believe. https://codereview.appspot.com/6846048/diff/8003/src/pkg/regexp/regexp.go#new... src/pkg/regexp/regexp.go:1061: // split against + 1. aha. no, this feels wrong to me. maybe others disagree. the return value should be a local decision, not one affected by the signature of a separate function.
Sign in to reply to this message.
https://codereview.appspot.com/6846048/diff/8003/src/pkg/regexp/regexp.go File src/pkg/regexp/regexp.go (right): https://codereview.appspot.com/6846048/diff/8003/src/pkg/regexp/regexp.go#new... src/pkg/regexp/regexp.go:1061: // split against + 1. On 2012/11/19 20:39:47, r wrote: > aha. no, this feels wrong to me. maybe others disagree. the return value should > be a local decision, not one affected by the signature of a separate function. I agree. The N here should be the same interpretation as in strings.SplitN.
Sign in to reply to this message.
Hello remyoudompheng@gmail.com, r@golang.org, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
PTAL Thanks for being patient with me on this CL. I am new to this and learning a lot about submitting a good patch.
Sign in to reply to this message.
it's nearly there https://codereview.appspot.com/6846048/diff/2002/src/pkg/regexp/all_test.go File src/pkg/regexp/all_test.go (right): https://codereview.appspot.com/6846048/diff/2002/src/pkg/regexp/all_test.go#n... src/pkg/regexp/all_test.go:459: t.Errorf("#%d: %q: split = %#v; want = %#v", i, test.r, split, test.out) convention: s/split =/got/ s/want =/want/ https://codereview.appspot.com/6846048/diff/2002/src/pkg/regexp/regexp.go File src/pkg/regexp/regexp.go (right): https://codereview.appspot.com/6846048/diff/2002/src/pkg/regexp/regexp.go#new... src/pkg/regexp/regexp.go:1056: // not contained in the slice returned by FindAllString(). this is confusing enough i think a simple one-line example here would help.
Sign in to reply to this message.
Hello remyoudompheng@gmail.com, r@golang.org, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
PTAL
Sign in to reply to this message.
I think if you make these test changes you will find that the matching loop needs a little bit more work. I would suggest looking at the implementation of strings.SplitN for inspiration. Also, if you pass n-1 to FindAllIndex you should be able to simplify the loop a little, because you won't have to worry about using too many matches. https://codereview.appspot.com/6846048/diff/13001/src/pkg/regexp/all_test.go File src/pkg/regexp/all_test.go (right): https://codereview.appspot.com/6846048/diff/13001/src/pkg/regexp/all_test.go#... src/pkg/regexp/all_test.go:421: var tests = []struct { Please move this up out of the function. var splitTests = []... func TestSplit(t *testing.T) { ... https://codereview.appspot.com/6846048/diff/13001/src/pkg/regexp/all_test.go#... src/pkg/regexp/all_test.go:448: {"abaabaccadaaae", "a*", 5, []string{"b", "b", "c", "c", "daaae"}}, Please add {":x:y:z:", ":", -1, []string{"", "x", "y", "z", ""}} https://codereview.appspot.com/6846048/diff/13001/src/pkg/regexp/all_test.go#... src/pkg/regexp/all_test.go:460: t.Errorf("#%d: %q: got %#v; want %#v", i, test.r, split, test.out) %#v is fine here but you might find %q a little more readable https://codereview.appspot.com/6846048/diff/13001/src/pkg/regexp/all_test.go#... src/pkg/regexp/all_test.go:461: } add if QuoteMeta(test.r) == test.r { strsplit := strings.SplitN(test.s, test.r, test.n) if !reflect.DeepEqual(split, strsplit) { t.Errorf("#%d: Split(%q, %q, %d): regexp vs strings mismatch\nregexp=%q\nstrings=%q", i, test.s, test.r, test.n, split, strsplit) } }
Sign in to reply to this message.
On Monday, November 26, 2012 10:16:53 AM UTC-5, rsc wrote: > > I think if you make these test changes you will find that the matching > loop needs a little bit more work. I would suggest looking at the > implementation of strings.SplitN for inspiration. > Thanks, I will check these out. So you think the results should be the same as strings.SplitN where possible? What we discussed on golang-nuts (and the function documentation) says it will return the opposite of FindAllIndex(), which doesn't necessarily match unless I'm understanding it differently. For instance: strings.SplitN("foo:and:bar:foo", "foo", -1) --> ["", ":and:bar:", ""] regexp.MustCompile("foo").FindAllIndex([]byte("foo:and:bar:foo"), -1) --> [0, 3], [12, 15] (so the result string would be everything not contained in [0, 3] and [12, 15] or ":and:bar:") If so, do you have a better way of describing how the function works for the documentation? It sounds like you mostly want the behavior of patch set 5? Thanks, Rick
Sign in to reply to this message.
I think if you have a literal string s and you use strings.SplitN, and then you convert to use an equivalent regexp with Split, the behavior needs to not change at all, not even in corner cases. Russ
Sign in to reply to this message.
Hello remyoudompheng@gmail.com, r@golang.org, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Hello remyoudompheng@gmail.com, r@golang.org, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
PTAL. I used FindAllStringIndex(s, n) instead of FindAllStringIndex(s, n-1) since sometimes a regular expression will match before the start of the string and I need to skip that match. For example: `a*`.Split("baabaab", 3) would result in: ["", "b", "baab"] instead of ["b", "b", "b"] without the extra match logic.
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=efc46b2230aa *** regexp: add Split As discussed in issue 2672 and on golang-nuts, this CL adds a Split() method to regexp. It is based on returning the "opposite" of FindAllString() so that the returned substrings are everything not matched by the expression. See: https://groups.google.com/forum/?fromgroups=#!topic/golang-nuts/xodBZh9Lh2E Fixes issue 2762. R=remyoudompheng, r, rsc CC=golang-dev http://codereview.appspot.com/6846048 Committer: Russ Cox <rsc@golang.org>
Sign in to reply to this message.
|