|
|
Created:
14 years, 2 months ago by rsc Modified:
14 years, 2 months ago Reviewers:
CC:
eridius, r, rog, golang-dev Visibility:
Public. |
Descriptionpath: add Match
Patch Set 1 #Patch Set 2 : code review 217088: path: add Match #
Total comments: 15
Patch Set 3 : code review 217088: path: add Match #
Total comments: 11
Patch Set 4 : code review 217088: path: add Match #Patch Set 5 : code review 217088: path: add Match #
MessagesTotal messages: 31
Hello eridius, r (cc: golang-dev@googlegroups.com), I'd like you to review this change.
Sign in to reply to this message.
http://codereview.appspot.com/217088/diff/1001/1003 File src/pkg/path/match.go (right): http://codereview.appspot.com/217088/diff/1001/1003#newcode17 src/pkg/path/match.go:17: // '[' [ '^' ] character-ranges ']' character class {,} too please for character class, ] can be first and means literal ] - can be first or last and means literal - http://codereview.appspot.com/217088/diff/1001/1003#newcode19 src/pkg/path/match.go:19: // Match requires pattern to match all of name, not just a substring. usually we call this an anchored search, but maybe jargon isn't helpful here. http://codereview.appspot.com/217088/diff/1001/1003#newcode118 src/pkg/path/match.go:118: if chunk[0] == ']' { doesn't get the corner cases right with first char ] or - or last char - http://codereview.appspot.com/217088/diff/1001/1003#newcode127 src/pkg/path/match.go:127: return // syntax error not an error if chunk[1]==']' http://codereview.appspot.com/217088/diff/1001/1004 File src/pkg/path/match_test.go (right): http://codereview.appspot.com/217088/diff/1001/1004#newcode15 src/pkg/path/match_test.go:15: MatchTest{"abc", "abc", true}, annoying but you should implement {} also write some non-ascii tests.
Sign in to reply to this message.
> {,} too please even though it looks like one, {} is not a pattern. it gets expanded before glob search. compare echo xxx[0123] echo xxx{0,1,2,3} my algorithm also only works because there is no {,}. russ
Sign in to reply to this message.
will address the rest of the comments later. i admit to dodging the corner cases in character classes. what's the right specification? russ
Sign in to reply to this message.
On Feb 23, 2010, at 11:19 AM, Russ Cox wrote: >> {,} too please > > even though it looks like one, {} is not a pattern. > it gets expanded before glob search. ah yes. > compare > > echo xxx[0123] > echo xxx{0,1,2,3} > > my algorithm also only works because there is no {,}. yes, i thought of that as i was typing. > > russ
Sign in to reply to this message.
On Feb 23, 2010, at 11:20 AM, Russ Cox wrote: > will address the rest of the comments later. > i admit to dodging the corner cases in character > classes. what's the right specification? i think i wrote them in the comments. ] must be first - must be first or last
Sign in to reply to this message.
http://codereview.appspot.com/217088/diff/1001/1004 File src/pkg/path/match_test.go (right): http://codereview.appspot.com/217088/diff/1001/1004#newcode15 src/pkg/path/match_test.go:15: MatchTest{"abc", "abc", true}, On 2010/02/23 00:14:03, r wrote: > annoying but you should implement {} > also write some non-ascii tests. I have a full suite of tests in my port of fnmatch (186111). That could probably be used here.
Sign in to reply to this message.
http://codereview.appspot.com/217088/diff/1001/1003 File src/pkg/path/match.go (right): http://codereview.appspot.com/217088/diff/1001/1003#newcode15 src/pkg/path/match.go:15: // '?' matches any single character This should also only match non-/ characters. http://codereview.appspot.com/217088/diff/1001/1003#newcode17 src/pkg/path/match.go:17: // '[' [ '^' ] character-ranges ']' character class How is - defined if it appears in something like [a-b-c]? The first a-b is clearly a range, but after that? AFAIK it's undefined by the shell globbing syntax. http://codereview.appspot.com/217088/diff/1001/1003#newcode17 src/pkg/path/match.go:17: // '[' [ '^' ] character-ranges ']' character class {,} isn't part of shell pattern matching. It's part of brace expansion. Also, while fnmatch() doesn't support it, shell pattern matching is supposed to include character classes, like [[:alpha:]] http://codereview.appspot.com/217088/diff/1001/1003#newcode139 src/pkg/path/match.go:139: case '?': This should not match / http://codereview.appspot.com/217088/diff/1001/1003#newcode146 src/pkg/path/match.go:146: chunk = chunk[1:] If this is the last character in the pattern, how do we want to behave?
Sign in to reply to this message.
> http://codereview.appspot.com/217088/diff/1001/1003#newcode15 > src/pkg/path/match.go:15: // '?' matches any single > character > This should also only match non-/ characters. nice catch > > http://codereview.appspot.com/217088/diff/1001/1003#newcode17 > src/pkg/path/match.go:17: // '[' [ '^' ] character-ranges ']' > character > class > How is - defined if it appears in something like [a-b-c]? The first a-b > is clearly a range, but after that? AFAIK it's undefined by the shell > globbing syntax. there are many definitions. we just have to say what ours is. > http://codereview.appspot.com/217088/diff/1001/1003#newcode17 > src/pkg/path/match.go:17: // '[' [ '^' ] character-ranges ']' > character > class > {,} isn't part of shell pattern matching. It's part of brace expansion. > > Also, while fnmatch() doesn't support it, shell pattern matching is > supposed to include character classes, like [[:alpha:]] i'd rather avoid posix pollution such as this. > http://codereview.appspot.com/217088/diff/1001/1003#newcode146 > src/pkg/path/match.go:146: chunk = chunk[1:] > If this is the last character in the pattern, how do we want to behave? it will return false, which is fine. russ
Sign in to reply to this message.
On Feb 22, 2010, at 5:09 PM, Russ Cox wrote: >> http://codereview.appspot.com/217088/diff/1001/1003#newcode17 >> src/pkg/path/match.go:17: // '[' [ '^' ] character-ranges ']' >> character >> class >> {,} isn't part of shell pattern matching. It's part of brace expansion. >> >> Also, while fnmatch() doesn't support it, shell pattern matching is >> supposed to include character classes, like [[:alpha:]] > > i'd rather avoid posix pollution such as this. There are named unicode character ranges, we could support those. But then again, that's fairly nonstandard, so we should probably just ignore it like fnmatch() does. -Kevin Ballard -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com
Sign in to reply to this message.
On 23 February 2010 01:09, Russ Cox <rsc@golang.org> wrote: >> http://codereview.appspot.com/217088/diff/1001/1003#newcode15 >> src/pkg/path/match.go:15: // '?' matches any single >> character >> This should also only match non-/ characters. > > nice catch it's probably too late, but can i put in a bid to have Match not treat / characters as anything special? to do filename globbing properly, the pattern must be split on / anyway, and the text (a filename) is guaranteed not to contain a /. so if match is used to implement globbing, this restriction won't make any difference. and Match is potentially useful for things other than file names too.
Sign in to reply to this message.
http://codereview.appspot.com/217088/diff/1001/1003 File src/pkg/path/match.go (right): http://codereview.appspot.com/217088/diff/1001/1003#newcode64 src/pkg/path/match.go:64: i++ this will cause a crash if the \ is the last character in the string.
Sign in to reply to this message.
On Feb 23, 2010, at 3:13 AM, roger peppe wrote: > it's probably too late, but can i put in a bid to have Match > not treat / characters as anything special? > > to do filename globbing properly, the pattern must be > split on / anyway, and the text (a filename) is guaranteed not to contain a /. > so if match is used to implement globbing, this restriction won't > make any difference. > > and Match is potentially useful for things other than file names too. Sure, if you use this to implement glob, then the / restriction doesn't do anything, but this is primarily meant to match full paths, not match single filenames. Think fnmatch() with the FNM_PATHNAME flag. More specifically, this is an implementation of the /bin/sh Pattern Matching syntax, and in that context, / is a special character that must be matched exactly. I am fully in favor of a separate function that does the same algorithm, but without treating / specially. That is why my original proposal was a port of fnmatch(), preserving the flags. However, that was (rightly) deemed as not appropriate for Go. The only real alternative that made sense was a suggestion by rsc for a glob package that had MatchPath() and MatchString() as functions, and it's not too late to do that, but nobody really seemed to think that was the right way to go. My feelings on a potential glob package are that if we want to have glob.MatchPath() and glob.MatchString() somebody would have to implement glob.Glob() first in order to justify the package's name, and I don't have the time to do that. I suppose another alternative would be to have path.MatchString(), but that just feels like it's in the wrong place, and the only justification for a MatchString() in the path package is because it's using the shell Pattern Matching syntax. -Kevin Ballard -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com
Sign in to reply to this message.
http://codereview.appspot.com/217088/diff/1001/1003 File src/pkg/path/match.go (right): http://codereview.appspot.com/217088/diff/1001/1003#newcode64 src/pkg/path/match.go:64: i++ On 2010/02/23 11:45:30, rog wrote: > this will cause a crash if the \ is the last character in the string. No it won't. It'll start the loop again, determine that the `i < len(pattern)` condition no longer holds, and exit the loop. At this point i will be equal to len(pattern), so the return values will be (star, pattern, "").
Sign in to reply to this message.
http://codereview.appspot.com/217088/diff/1001/1003 File src/pkg/path/match.go (right): http://codereview.appspot.com/217088/diff/1001/1003#newcode64 src/pkg/path/match.go:64: i++ On 2010/02/23 20:56:34, eridius wrote: > On 2010/02/23 11:45:30, rog wrote: > > this will cause a crash if the \ is the last character in the string. > > No it won't. It'll start the loop again, determine that the `i < len(pattern)` > condition no longer holds, and exit the loop. At this point i will be equal to > len(pattern), so the return values will be (star, pattern, ""). erm, i still think it will crash. i will be incremented twice, once in the '\\' case, and once by the loop continuation. upon which i is len(pattern)+1 and the pattern[i:] slice will fail.
Sign in to reply to this message.
On 24 February 2010 11:04, <rogpeppe@gmail.com> wrote: > erm, i still think it will crash. i will be incremented twice, once in > the '\\' case, and once by the loop continuation. upon which i is > len(pattern)+1 and the pattern[i:] slice will fail. confirmed by experiment. BTW, what *should* be the results of Match("a\\", "a\\") and Match("a\\", "a") ? i'd say false in both cases, but opinions might differ one other thing: i think i'd prefer the arguments to Match to be the other way round, read as "match path against pattern", similar to the functions in the strings package (e.g. LastIndex(s, sep), where sep is a search pattern) i'm aware that regexp.MatchString doesn't work that way, but i think the difference is that the regexp package suggests that the regexp is the subject, whereas with strings and path, the pattern is the object.
Sign in to reply to this message.
http://codereview.appspot.com/217088/diff/1001/1003 File src/pkg/path/match.go (right): http://codereview.appspot.com/217088/diff/1001/1003#newcode64 src/pkg/path/match.go:64: i++ On 2010/02/24 11:04:12, rog wrote: > On 2010/02/23 20:56:34, eridius wrote: > > On 2010/02/23 11:45:30, rog wrote: > > > this will cause a crash if the \ is the last character in the string. > > > > No it won't. It'll start the loop again, determine that the `i < len(pattern)` > > condition no longer holds, and exit the loop. At this point i will be equal to > > len(pattern), so the return values will be (star, pattern, ""). > > erm, i still think it will crash. i will be incremented twice, once in the '\\' > case, and once by the loop continuation. upon which i is len(pattern)+1 and the > pattern[i:] slice will fail. Ah yes, you are right. I clearly need to drink more coffee.
Sign in to reply to this message.
Please take another look. I have addressed all comments, expanded the test suite, and documented the character class syntax more precisely. @rog: I agree that I'd rather have the arguments in the opposite order, but regexp.Match has set a precedent, and I think it would be even more confusing path.Match didn't follow it. I added an err os.Error to the return, so now they match exactly. @r: I started to implement the special cases for - and ] but backed off. They're so subtle and tricky: can we break with the past, clean this up, and require them to be escaped? That's what I've done, and documented, for now. Russ
Sign in to reply to this message.
http://codereview.appspot.com/217088/diff/4001/3003 File src/pkg/path/match.go (right): http://codereview.appspot.com/217088/diff/4001/3003#newcode26 src/pkg/path/match.go:26: // '\\' c matches character c i think this makes sense and i broke with tradition in going this way in my own regexp implementations. it's probably ok at least for now. we break compatibility in other ways too - e.g. regexps. in other words, ok but not great. http://codereview.appspot.com/217088/diff/4001/3003#newcode83 src/pkg/path/match.go:83: if i+1 < len(pattern) { error case handled in matchChunk: bad pattern http://codereview.appspot.com/217088/diff/4001/3003#newcode104 src/pkg/path/match.go:104: esc := false esc is the name of an ascii character, which makes this confusing. call it backslash. however, it might be simpler just to do the \ processing in case '\\. http://codereview.appspot.com/217088/diff/4001/3003#newcode125 src/pkg/path/match.go:125: want := true want is another weak name. matchPositive? it's a double negative but notNegated works well. best might be negated and flip the sense. http://codereview.appspot.com/217088/diff/4001/3003#newcode141 src/pkg/path/match.go:141: } lo, chunk, err := ... if err != nil {} (shorter, no need for 'var')
Sign in to reply to this message.
http://codereview.appspot.com/217088/diff/4001/3003 File src/pkg/path/match.go (right): http://codereview.appspot.com/217088/diff/4001/3003#newcode26 src/pkg/path/match.go:26: // '\\' c matches character c On 2010/02/24 23:49:54, r wrote: > i think this makes sense and i broke with tradition in going this way in my own > regexp implementations. it's probably ok at least for now. we break > compatibility in other ways too - e.g. regexps. > > in other words, ok but not great. fwiw ; ls \foo ; ls [\\]* \foo ; ls [\]* ls: cannot access []*: No such file or directory ; http://codereview.appspot.com/217088/diff/4001/3003#newcode83 src/pkg/path/match.go:83: if i+1 < len(pattern) { On 2010/02/24 23:49:54, r wrote: > error case handled in matchChunk: bad pattern added this as a comment. (wasn't sure what you meant.) the goal of this test is just not to crash. matchChunk does the real error checking. http://codereview.appspot.com/217088/diff/4001/3003#newcode104 src/pkg/path/match.go:104: esc := false esc is gone. fallthrough works nicely. http://codereview.appspot.com/217088/diff/4001/3003#newcode125 src/pkg/path/match.go:125: want := true On 2010/02/24 23:49:54, r wrote: > want is another weak name. matchPositive? it's a double negative but > notNegated works well. best might be negated and flip the sense. Done. http://codereview.appspot.com/217088/diff/4001/3003#newcode141 src/pkg/path/match.go:141: } On 2010/02/24 23:49:54, r wrote: > lo, chunk, err := ... > if err != nil {} > (shorter, no need for 'var') there's no := here to avoid declaring an inner copy of chunk.
Sign in to reply to this message.
Hello eridius, r, rog (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
LGTM http://codereview.appspot.com/217088/diff/4001/3003 File src/pkg/path/match.go (right): http://codereview.appspot.com/217088/diff/4001/3003#newcode83 src/pkg/path/match.go:83: if i+1 < len(pattern) { thanks, a comment was what i was asking for
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=a91149da5703 *** path: add Match R=eridius, r, rog CC=golang-dev http://codereview.appspot.com/217088
Sign in to reply to this message.
Oh crud, I was still reviewing the previous patch. Is it worth commenting on the latest patchset now, or is that "done"? -Kevin Ballard On Feb 24, 2010, at 4:11 PM, rsc@golang.org wrote: > *** Submitted as > http://code.google.com/p/go/source/detail?r=a91149da5703 *** > > path: add Match > > R=eridius, r, rog > CC=golang-dev > http://codereview.appspot.com/217088 > > > http://codereview.appspot.com/217088/show > -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com
Sign in to reply to this message.
On Feb 25, 2010, at 11:14 AM, Kevin Ballard wrote: > Oh crud, I was still reviewing the previous patch. Is it worth > commenting on the latest patchset now, or is that "done"? rsc jumped the gun. if you have comments, send them in and they can be addressed in another round. -rob
Sign in to reply to this message.
Ok, I'll go move my draft comments to patchset 5, figure out which ones are still applicable, then send them in. -Kevin On Feb 24, 2010, at 4:15 PM, Rob 'Commander' Pike wrote: > > On Feb 25, 2010, at 11:14 AM, Kevin Ballard wrote: > >> Oh crud, I was still reviewing the previous patch. Is it worth commenting on the latest patchset now, or is that "done"? > > rsc jumped the gun. if you have comments, send them in and they can be addressed in another round. > > -rob > -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com
Sign in to reply to this message.
On Feb 25, 2010, at 11:18 AM, Kevin Ballard wrote: > Ok, I'll go move my draft comments to patchset 5, figure out which > ones are still applicable, then send them in. you shouldn't need to move them. send them based on the previous patch -rob
Sign in to reply to this message.
Some of them are no longer applicable, and I was putting some actual code in the comments (in an attempt to address the edge cases of character ranges) and I want to make sure the code is appropriate for the latest patchset. Besides, I wasn't done writing comments, so it's no problem to move them to the right patchset and continue. -Kevin On Feb 24, 2010, at 4:19 PM, Rob 'Commander' Pike wrote: > > On Feb 25, 2010, at 11:18 AM, Kevin Ballard wrote: > >> Ok, I'll go move my draft comments to patchset 5, figure out which ones are still applicable, then send them in. > > you shouldn't need to move them. send them based on the previous patch > > -rob > -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com
Sign in to reply to this message.
Acutally, since all of my comments that aren't about the - and ] edge cases are already obsolete, I'll just submit a new patch to handle those edge cases rather than publishing comments. -Kevin Ballard On Feb 24, 2010, at 4:19 PM, Rob 'Commander' Pike wrote: > > On Feb 25, 2010, at 11:18 AM, Kevin Ballard wrote: > >> Ok, I'll go move my draft comments to patchset 5, figure out which ones are still applicable, then send them in. > > you shouldn't need to move them. send them based on the previous patch > > -rob > -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com
Sign in to reply to this message.
Uh oh, turns out there's a fairly significant bug, illustrated by the following case path.Match("*x", "xxx") This returns false. The problem is * matches non-greedily, but if the name isn't exhausted, the last segment is not re-tried at the next position. I'll look into fixing it, but I thought I should make you aware of the issue first. -Kevin Ballard On Feb 24, 2010, at 4:11 PM, rsc@golang.org wrote: > *** Submitted as > http://code.google.com/p/go/source/detail?r=a91149da5703 *** > > path: add Match > > R=eridius, r, rog > CC=golang-dev > http://codereview.appspot.com/217088 > > > http://codereview.appspot.com/217088/show > -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com
Sign in to reply to this message.
Ok, I've submitted a new change 223050 that fixes this issue. -Kevin Ballard On Feb 25, 2010, at 1:51 AM, Kevin Ballard wrote: > Uh oh, turns out there's a fairly significant bug, illustrated by the following case > > path.Match("*x", "xxx") > > This returns false. The problem is * matches non-greedily, but if the name isn't exhausted, the last segment is not re-tried at the next position. I'll look into fixing it, but I thought I should make you aware of the issue first. > > -Kevin Ballard > > On Feb 24, 2010, at 4:11 PM, rsc@golang.org wrote: > >> *** Submitted as >> http://code.google.com/p/go/source/detail?r=a91149da5703 *** >> >> path: add Match >> >> R=eridius, r, rog >> CC=golang-dev >> http://codereview.appspot.com/217088 >> >> >> http://codereview.appspot.com/217088/show >> > > -- > Kevin Ballard > http://kevin.sb.org > kevin@sb.org > http://www.tildesoft.com > > > -- Kevin Ballard http://kevin.sb.org kevin@sb.org http://www.tildesoft.com
Sign in to reply to this message.
|