regexp: change Expr() to String(); add HasOperator method to Regexp.
It reports whether a regular expression has operators
as opposed to matching literal text.
On Fri, Dec 17, 2010 at 5:21 AM, <rsc@google.com> wrote: > > http://codereview.appspot.com/3731041/diff/16001/src/pkg/regexp/regexp.go > File ...
14 years, 6 months ago
(2010-12-17 16:23:04 UTC)
#9
On Fri, Dec 17, 2010 at 5:21 AM, <rsc@google.com> wrote:
>
> http://codereview.appspot.com/3731041/diff/16001/src/pkg/regexp/regexp.go
> File src/pkg/regexp/regexp.go (right):
>
>
http://codereview.appspot.com/3731041/diff/16001/src/pkg/regexp/regexp.go#new...
> src/pkg/regexp/regexp.go:1026: // HasMeta returns a boolean indicating
> whether the string contains
> can we drop HasMeta now?
> Literal is much more general
HasMeta asks about a string that has yet to be compiled and the answer
may affect how you proceed. Literal asks about an already-compiled
regexp. the two really do different things.
however, i think i agree. HasMeta and QuoteMeta are really the
overlap, and HasMeta can go. i'll send an update in a bit.
-rob
Right now I do the following to find the index of the first metacharacter in ...
14 years, 6 months ago
(2010-12-17 17:35:49 UTC)
#10
Right now I do the following to find the index of the first metacharacter in
the regexp, so I can split the regexp into a metacharacter-free prefix and
the rest:
meta := sort.Search(len(expr), func(i int) bool {
return regexp.HasMeta(expr[0 : i+1])
})
With your proposed change I will have to do the following:
meta := sort.Search(len(expr), func(i int) bool {
if r, err := regexp.Compile(expr[0 : i+1]); err != nil {
_, ok := r.Literal()
return !ok
}
return false // or should it be true? not clear to me
})
Is my understanding correct? What I really want is the length of the
metacharacter-free predix of the regular expression, and that prefix as a
literal. Given a regular expression of the form "x...xxy...y" where the
"xxx" _literal_ is metacharacter-free, I need to split into "x...x" and
"xy...." such that literal("x...x") is metacharacter-free, and "xy..." is a
legal regular expression starting with a non-metacharacter so that I can use
"^" + "xy..." for an anchored search.
- gri
On Fri, Dec 17, 2010 at 8:23 AM, Rob 'Commander' Pike <r@golang.org> wrote:
> On Fri, Dec 17, 2010 at 5:21 AM, <rsc@google.com> wrote:
> >
> >
> http://codereview.appspot.com/3731041/diff/16001/src/pkg/regexp/regexp.go
> > File src/pkg/regexp/regexp.go (right):
> >
> >
>
http://codereview.appspot.com/3731041/diff/16001/src/pkg/regexp/regexp.go#new...
> > src/pkg/regexp/regexp.go:1026: // HasMeta returns a boolean indicating
> > whether the string contains
> > can we drop HasMeta now?
> > Literal is much more general
>
> HasMeta asks about a string that has yet to be compiled and the answer
> may affect how you proceed. Literal asks about an already-compiled
> regexp. the two really do different things.
>
> however, i think i agree. HasMeta and QuoteMeta are really the
> overlap, and HasMeta can go. i'll send an update in a bit.
>
> -rob
>
It sounds like gri needs something like LiteralPrefix() (prefix string, complete bool) that returns the ...
14 years, 6 months ago
(2010-12-17 17:47:33 UTC)
#12
It sounds like gri needs something like
LiteralPrefix() (prefix string, complete bool)
that returns the literal prefix of the regexp
and complete==true if that's the entire regexp.
HasMeta is not general enough. It handles some
easy cases but not equivalent other ones.
For example hello vs h[e]llo vs h\.llo, all of
which are 5-byte literal strings and would
(should) return complete==true in LiteralPrefix.
Russ
what you're doing is very special and odd. you're right that Literal isn't the easiest ...
14 years, 6 months ago
(2010-12-17 17:49:07 UTC)
#13
what you're doing is very special and odd. you're right that Literal
isn't the easiest thing you might do, but it does give you the right
answer, which HasMeta does not.
it's possible what you need may be too special-purpose to add to the
public regexp package. perhaps all you need is a string constant of
metacharacters that you can use strings.FindIndex to scan. but again,
that won't give you the best answer.
what i was trying to get at with adding Literal is that the regexp
abc\.def
is a literal match, but your code as written cannot exploit the suffix
array code to its full potential in its search. your loop as written
above can get all the power back, but of course it does a lot of
compilation.
the issue is that you don't want the metacharacter-free prefix, you
want the *operator*-free prefix after backslash processing, and i
don't see a clean way to give that to you
another approach is for you to do your own backslash processing and
understand the operator set in your code.
On Fri, Dec 17, 2010 at 9:47 AM, Russ Cox <rsc@golang.org> wrote: > It sounds ...
14 years, 6 months ago
(2010-12-17 17:50:57 UTC)
#14
On Fri, Dec 17, 2010 at 9:47 AM, Russ Cox <rsc@golang.org> wrote:
> It sounds like gri needs something like
>
> LiteralPrefix() (prefix string, complete bool)
>
> that returns the literal prefix of the regexp
> and complete==true if that's the entire regexp.
>
> HasMeta is not general enough. It handles some
> easy cases but not equivalent other ones.
> For example hello vs h[e]llo vs h\.llo, all of
> which are 5-byte literal strings and would
> (should) return complete==true in LiteralPrefix.
that could work. its specialness is peculiar, but maybe it's the
answer. i'll try it.
-rob
LiteralPrefix might work. But instead of complete, I'd rather have the length of the literal ...
14 years, 6 months ago
(2010-12-17 17:56:16 UTC)
#16
LiteralPrefix might work. But instead of complete, I'd rather have the
length of the literal prefix in the incoming regexp string. Otherwise I
cannot split that string. So given an *regexp.Regexp r:
expr := r.String() // gives me the original string
prefix, n := s.LiteralPrefix() // gives me the literal prefix (after \
processing) and its length in expr (and n maybe != len(prefix))
suffix := expr[n:] // gives me the rest
- Robert
On Fri, Dec 17, 2010 at 9:52 AM, Russ Cox <rsc@golang.org> wrote:
> it's not that special. re2/regexp.h has an equivalent function.
>
On Fri, Dec 17, 2010 at 09:56, Robert Griesemer <gri@golang.org> wrote: > LiteralPrefix might work. ...
14 years, 6 months ago
(2010-12-17 18:01:06 UTC)
#18
On Fri, Dec 17, 2010 at 09:56, Robert Griesemer <gri@golang.org> wrote:
> LiteralPrefix might work. But instead of complete, I'd rather have the
> length of the literal prefix in the incoming regexp string. Otherwise I
> cannot split that string. So given an *regexp.Regexp r:
> expr := r.String() // gives me the original string
> prefix, n := s.LiteralPrefix() // gives me the literal prefix (after \
> processing) and its length in expr (and n maybe != len(prefix))
> suffix := expr[n:] // gives me the rest
In a really aggressive library that's not well defined.
Once you find the prefix I would say just pass the
whole thing to the regexp library (not just the suffix)
and let it process the prefix a second time.
Russ
that's a lot more work for me; regexp doesn't have the information it needs. you ...
14 years, 6 months ago
(2010-12-17 18:01:28 UTC)
#19
that's a lot more work for me; regexp doesn't have the information it
needs. you can do that yourself with a very simple loop comparing the
original text with the prefix value.
-rob
> In a really aggressive library that's not well defined. > Once you find the ...
14 years, 6 months ago
(2010-12-17 18:02:46 UTC)
#20
> In a really aggressive library that's not well defined.
> Once you find the prefix I would say just pass the
> whole thing to the regexp library (not just the suffix)
> and let it process the prefix a second time.
that works too. regexp is reasonably fast scanning anchored literal text.
-rob
> In a really aggressive library that's not well defined. For example /(foo.*bar)+baz/ has a ...
14 years, 6 months ago
(2010-12-17 18:03:32 UTC)
#21
> In a really aggressive library that's not well defined.
For example /(foo.*bar)+baz/ has a required "foo" prefix
but doesn't have a few leading characters that you can
strip off.
Russ
Issue 3731041: code review 3731041: regexp: change Expr() to String(); add HasOperator meth...
(Closed)
Created 14 years, 6 months ago by r
Modified 14 years, 6 months ago
Reviewers:
Base URL:
Comments: 5