|
|
Created:
13 years, 3 months ago by rsc Modified:
13 years, 3 months ago Reviewers:
CC:
r, bradfitz, r2, rog, niemeyer, golang-dev Visibility:
Public. |
Descriptionregexp: allow substitutions in Replace, ReplaceString
Add Expand, ExpandString for access to the substitution functionality.
Fixes issue 2736.
Patch Set 1 #Patch Set 2 : diff -r 82bac8cdab6b https://go.googlecode.com/hg/ #Patch Set 3 : diff -r 82bac8cdab6b https://go.googlecode.com/hg/ #Patch Set 4 : diff -r 82bac8cdab6b https://go.googlecode.com/hg/ #Patch Set 5 : diff -r 82bac8cdab6b https://go.googlecode.com/hg/ #
Total comments: 16
Patch Set 6 : diff -r 77df677f85a4 https://go.googlecode.com/hg/ #Patch Set 7 : diff -r 77df677f85a4 https://go.googlecode.com/hg/ #
Total comments: 1
Patch Set 8 : diff -r a4dc8322859a https://code.google.com/p/go/ #MessagesTotal messages: 20
Hello golang-dev@googlegroups.com, I'd like you to review this change to https://go.googlecode.com/hg/
Sign in to reply to this message.
I'm not sure about this. First, I can't protect a $, which means that it's hard to get $1 into the output as a literal and that the Replace functions are subject to failure if the replacement text is programmatic rather than literal. Second, this is an API change. It may be one people expect, but it's a change nonetheless. How can I get the old behavior? In short, intricate as this interface is, it might still be too simple.
Sign in to reply to this message.
I was also just wondering about a literal $. What do other languages do about that? I also think that this CL could use some Examples. But I think our docs as a whole could use a ton, so not needed in this one necessarily. On Mon, Feb 6, 2012 at 12:53 PM, <r@golang.org> wrote: > I'm not sure about this. > > First, I can't protect a $, which means that it's hard to get $1 into > the output as a literal and that the Replace functions are subject to > failure if the replacement text is programmatic rather than literal. > > Second, this is an API change. It may be one people expect, but it's a > change nonetheless. How can I get the old behavior? > > In short, intricate as this interface is, it might still be too simple. > > > http://codereview.appspot.com/**5638046/<http://codereview.appspot.com/5638046/> >
Sign in to reply to this message.
On Mon, Feb 6, 2012 at 15:53, <r@golang.org> wrote: > First, I can't protect a $, which means that it's hard to get $1 into > the output as a literal and that the Replace functions are subject to > failure if the replacement text is programmatic rather than literal. Fair enough. I added $$ meaning $. > Second, this is an API change. It may be one people expect, but it's a > change nonetheless. How can I get the old behavior? Yes; it was on the Go 1 list so that we can make the change without breaking programs later. If you want a literal string, you can either s/$/$$/ beforehand or you can use ReplaceAllFunc(src, func([]byte)[]byte{ return repl }). Russ
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, r@golang.org, bradfitz@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Can I have that one-liner as a predefined helper please? I'll use a security argument: It needs to be easy to be safe. Also maybe a quoter to turn $ into $$, but then I ask, why does quoting replacement text have a different convention than quoting pattern text? -rob
Sign in to reply to this message.
On Mon, Feb 6, 2012 at 16:14, Rob 'Commander' Pike <r@google.com> wrote: > Can I have that one-liner as a predefined helper please? I'll use a security argument: It needs to be easy to be safe. Sure. What API did you have in mind? Russ
Sign in to reply to this message.
On 07/02/2012, at 8:51 AM, Russ Cox wrote: > On Mon, Feb 6, 2012 at 16:14, Rob 'Commander' Pike <r@google.com> wrote: >> Can I have that one-liner as a predefined helper please? I'll use a security argument: It needs to be easy to be safe. > > Sure. What API did you have in mind? It doesn't need to be elegant, just simple. I was thinking of something prolix like: func (re *Regexp) ReplaceAllLiteral(src, repl []byte) []byte -rob
Sign in to reply to this message.
http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/all_test.go File src/pkg/regexp/all_test.go (right): http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/all_test.go#ne... src/pkg/regexp/all_test.go:192: {"(?P<x>hi)|(?P<x>bye)", "hello $$x", "hi", "hello $x"}, what happens with replacement text "$"? "$x$"? what if i say (?P<$>hi), stupid though that is? what about $3 where there is only 1 submatch? is $1x $1 followed by x or ${1x}? http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go File src/pkg/regexp/regexp.go (right): http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... src/pkg/regexp/regexp.go:434: // Regexp have been replaced by the return value of of function repl (whose s/of of/of/ but it's a complex construction. rewrite it to say what the function does instead of how it does it. ditto for the previous function. how about this approach: ReplaceAllString returns a copy of src, replacing matches of the Regexp with the replacement string repl. Inside repl, $ signs are interpreted as in Expand, so for instance $1 represents the text of the first submatch. http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... src/pkg/regexp/regexp.go:524: // is not expanded further (such as by using Expand). rewrite for simplicity. http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... src/pkg/regexp/regexp.go:537: // QuoteMeta returns a string that quotes all regular expression metacharacters this function and its support elements are out of place in the file now. http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... src/pkg/regexp/regexp.go:702: // returned by FindSubmatchIndex. In the template, $$ is replaced with a single $. what if it's not? http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... src/pkg/regexp/regexp.go:703: // A purely numeric name, like $1, refers to the submatch with s/,// http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... src/pkg/regexp/regexp.go:710: // ExpandString is like Expand but takes a string template and string matches. hard to read ExpandString is like Expand but the template and source are strings. http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... src/pkg/regexp/regexp.go:780: if !unicode.IsLetter(rune) && !unicode.IsDigit(rune) && rune != '_' { see tests. what is the right definition for a name? should 1x be a valid name?
Sign in to reply to this message.
i vote for the use of \ as a quoting character for Expand. then QuoteMeta could be used to quote both patterns and replacement text. i've no idea what the precedents are though. On 7 February 2012 00:37, <r@golang.org> wrote: > > http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/all_test.go > File src/pkg/regexp/all_test.go (right): > > http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/all_test.go#ne... > src/pkg/regexp/all_test.go:192: {"(?P<x>hi)|(?P<x>bye)", "hello $$x", > "hi", "hello $x"}, > what happens with replacement text "$"? "$x$"? what if i say (?P<$>hi), > stupid though that is? > > what about $3 where there is only 1 submatch? > is $1x $1 followed by x or ${1x}? > > http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go > File src/pkg/regexp/regexp.go (right): > > http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... > src/pkg/regexp/regexp.go:434: // Regexp have been replaced by the return > value of of function repl (whose > s/of of/of/ > but it's a complex construction. rewrite it to say what the function > does instead of how it does it. ditto for the previous function. how > about this approach: > > ReplaceAllString returns a copy of src, replacing matches of the Regexp > with the replacement string repl. Inside repl, $ signs are interpreted > as in Expand, so for instance $1 represents the text of the first > submatch. > > http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... > src/pkg/regexp/regexp.go:524: // is not expanded further (such as by > using Expand). > rewrite for simplicity. > > http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... > src/pkg/regexp/regexp.go:537: // QuoteMeta returns a string that quotes > all regular expression metacharacters > this function and its support elements are out of place in the file now. > > http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... > src/pkg/regexp/regexp.go:702: // returned by FindSubmatchIndex. In the > template, $$ is replaced with a single $. > what if it's not? > > http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... > src/pkg/regexp/regexp.go:703: // A purely numeric name, like $1, refers > to the submatch with > s/,// > > http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... > src/pkg/regexp/regexp.go:710: // ExpandString is like Expand but takes a > string template and string matches. > hard to read > ExpandString is like Expand but the template and source are strings. > > http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... > src/pkg/regexp/regexp.go:780: if !unicode.IsLetter(rune) && > !unicode.IsDigit(rune) && rune != '_' { > see tests. what is the right definition for a name? should 1x be a valid > name? > > http://codereview.appspot.com/5638046/
Sign in to reply to this message.
On 07/02/2012, at 11:06 PM, roger peppe wrote: > i vote for the use of \ as a quoting character > for Expand. then QuoteMeta could be used to > quote both patterns and replacement text. > > i've no idea what the precedents are though. we talked about this today and decided to stick with $, with $$ to quote it. it's the real precedent. even with \, QuoteMeta would be wrong for this since it will quote other things that are irrelevant to replacement processing. also, in effect, it makes \( mean (, for instance, and that's not what you want \( to be in a replacement string. we also considered \ as the substitution character (\1), since it's familiar from the ed(1) line, but that has the disadvantage that in patterns \ quotes while in replacements its sometimes quotes and sometimes magicizes. it's not a completely one-sided argument but $ and $$ win. as usual in modern computing, the true unix way loses. -rob
Sign in to reply to this message.
On 7 February 2012 12:25, Rob 'Commander' Pike <r@google.com> wrote: > we talked about this today and decided to stick with $, with $$ to quote it. it's the real precedent. i hadn't encountered the $ notation before, but it looks like that's what's used by perl, so i guess that's the precedent. a little experimentation shows that perl uses \$ not $$. > in effect, it makes \( mean (, for instance, and that's not what you want \( > to be in a replacement string i'm quite happy for \x to mean x for any character x. i don't generally mind if more characters get quoted than are necessary, but i wouldn't mind having two Quote functions either.
Sign in to reply to this message.
The precedents are ultimately not that important, but if you want one, Python's regexp replace uses $$ to escape $. If you use \ to escape the $s, then you have to explain why there are two different kinds of escape characters. If you use \ instead of $ entirely, then \{name} has a special meaning, so QuoteMeta fails, because it violates the widespread convention that \c for a punctuation character c always means the literal c. These expansion templates are at least self-contained: $ is the only special character, creating an escaped version is trivial (s/$/$$/g), and there will be literal (no expansion) methods too. Russ
Sign in to reply to this message.
On 7 February 2012 14:43, Russ Cox <rsc@golang.org> wrote: > The precedents are ultimately not that important, but if you > want one, Python's regexp replace uses $$ to escape $. > > If you use \ to escape the $s, then you have to explain > why there are two different kinds of escape characters. i'd class $ as a metacharacter rather than an escape character, so i think it's odd to have two kinds of escape character in the current proposal ($ for replacement and \ for matching) that said, i appreciate the python precedent too, and i don't mind $$ much. > If you use \ instead of $ entirely, then \{name} has a > special meaning, so QuoteMeta fails, because it violates > the widespread convention that \c for a punctuation > character c always means the literal c. agreed, though i wasn't suggesting that. > These expansion templates are at least self-contained: > $ is the only special character, creating an escaped > version is trivial (s/$/$$/g), and there will be literal > (no expansion) methods too. i'm ok with it.
Sign in to reply to this message.
That's quite useful, thanks. Make is also a precedent for the $$ as escaping. Just a trivial: http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go File src/pkg/regexp/regexp.go (right): http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... src/pkg/regexp/regexp.go:791: if i >= len(str) && str[i] != '}' { s/&&/||/
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, r@golang.org, bradfitz@golang.org, r@google.com, rogpeppe@gmail.com, n13m3y3r@gmail.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/all_test.go File src/pkg/regexp/all_test.go (right): http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/all_test.go#ne... src/pkg/regexp/all_test.go:192: {"(?P<x>hi)|(?P<x>bye)", "hello $$x", "hi", "hello $x"}, On 2012/02/07 00:37:51, r wrote: > what happens with replacement text "$"? "$x$"? what if i say (?P<$>hi), stupid > though that is? > > what about $3 where there is only 1 submatch? > is $1x $1 followed by x or ${1x}? Added tests and rewrote the comment: // Expand appends template to dst and returns the result; during the // append, Expand replaces variables in the template with corresponding // matches drawn from src. The match slice should have been returned by // FindSubmatchIndex. // // In the template, a variable is denoted by a substring of the form // $name or ${name}, where name is a non-empty sequence of letters, // digits, and underscores. A purely numeric name, like $1, refers to // the submatch with the corresponding index; other names refer to // capturing parentheses named with the (?P<name>...) syntax. A // reference to an out of range or unmatched index or a name that is not // present in the regular expression is replaced with an empty string. // // In the $name form, name is taken to be as long as possible: $1x is // equivalent to ${1x}, not ${1}x, and, $10 is equivalent to ${10}, not ${1}0. // // To insert a literal $ in the output, use $$ in the template. Therefore: "$" does not contain a substring of that form, so it passes through. "$x$" contains only one substring of that form: "$x". The final "$" passes through. "$1x" is "${1x}". Believe it or not, (?P<$>hi) is not a valid regular expression: the name in (?P<name> is required by regexp/parse to be identifier-like (alphanumeric + underscore). In fact, the name is also required to be ASCII, but I imagine that might be relaxed some day, and I've defined $name here to be compatible with such a future change. http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go File src/pkg/regexp/regexp.go (right): http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... src/pkg/regexp/regexp.go:537: // QuoteMeta returns a string that quotes all regular expression metacharacters On 2012/02/07 00:37:51, r wrote: > this function and its support elements are out of place in the file now. Are you asking me to move QuoteMeta somewhere else in the file? http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... src/pkg/regexp/regexp.go:702: // returned by FindSubmatchIndex. In the template, $$ is replaced with a single $. On 2012/02/07 00:37:51, r wrote: > what if it's not? Rewrote. http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... src/pkg/regexp/regexp.go:703: // A purely numeric name, like $1, refers to the submatch with On 2012/02/07 00:37:51, r wrote: > s/,// Done. http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... src/pkg/regexp/regexp.go:710: // ExpandString is like Expand but takes a string template and string matches. On 2012/02/07 00:37:51, r wrote: > hard to read > ExpandString is like Expand but the template and source are strings. Done. http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... src/pkg/regexp/regexp.go:780: if !unicode.IsLetter(rune) && !unicode.IsDigit(rune) && rune != '_' { On 2012/02/07 00:37:51, r wrote: > see tests. what is the right definition for a name? should 1x be a valid name? It is accepted by regexp/syntax. http://codereview.appspot.com/5638046/diff/4004/src/pkg/regexp/regexp.go#newc... src/pkg/regexp/regexp.go:791: if i >= len(str) && str[i] != '}' { On 2012/02/07 16:22:36, niemeyer wrote: > s/&&/||/ Ouch, thanks.
Sign in to reply to this message.
PTAL
Sign in to reply to this message.
LGTM http://codereview.appspot.com/5638046/diff/14001/src/pkg/regexp/all_test.go File src/pkg/regexp/all_test.go (right): http://codereview.appspot.com/5638046/diff/14001/src/pkg/regexp/all_test.go#n... src/pkg/regexp/all_test.go:196: {"a+", "${oops", "aaa", "${oops"}, {"a+", "$", "aaa", "$"},
Sign in to reply to this message.
*** Submitted as e3cab3dc2d31 *** regexp: allow substitutions in Replace, ReplaceString Add Expand, ExpandString for access to the substitution functionality. Fixes issue 2736. R=r, bradfitz, r, rogpeppe, n13m3y3r CC=golang-dev http://codereview.appspot.com/5638046
Sign in to reply to this message.
|