Issue 236070044: Add initial rules_parser.

nednguyen

Sorry for long wait time. I have some concern about the overall design goal as ...

8 years, 11 months ago (2015-06-08 17:26:43 UTC) #3

zwr

https://codereview.appspot.com/236070044/diff/1/documentation/Rules.md File documentation/Rules.md (right): https://codereview.appspot.com/236070044/diff/1/documentation/Rules.md#newcode56 documentation/Rules.md:56: import rules Well, rules should be in a module, ...

8 years, 11 months ago (2015-06-09 15:27:42 UTC) #4

nednguyen

https://codereview.appspot.com/236070044/diff/1/documentation/Rules.md File documentation/Rules.md (right): https://codereview.appspot.com/236070044/diff/1/documentation/Rules.md#newcode56 documentation/Rules.md:56: import rules On 2015/06/09 15:27:42, zwr wrote: > Well, ...

8 years, 11 months ago (2015-06-09 15:57:10 UTC) #5

https://codereview.appspot.com/236070044/diff/1/documentation/Rules.md
File documentation/Rules.md (right):

https://codereview.appspot.com/236070044/diff/1/documentation/Rules.md#newcode56
documentation/Rules.md:56: import rules
On 2015/06/09 15:27:42, zwr wrote:
> Well, rules should be in a module, so we need some kind of
>   import MODULE_NAME
> so it can later do
>   add_rule(MODULE_NAME....)
> 
> The rules_parser.py currently limits the allowed_imports to {'rules'}, b/c I
> think it's unsafe to import arbitrary modules.  I'll modify the constructor to
> accept allowed_imports as an arg, where {'*'} means "allow all".

I don't understand this safety model. If what you want is to prevent the
malformed rule crashes wpr at all cost, then I don't think it's possible. For
example, what if people just run wpr with --rules_path <my_rules> whereas
my_rules refer to the bogus value?

I think the main direction we should take here should be leaning toward
flexibility for users and reducing load on wpr's codebase, i.e: if you come up
with a crazy new rule, wpr should enable you to config that rule without
touching github.com/chromium/web-page-replay .
On the safety matter, the best thing we can do is make it easy for debugging
when things go wrong via clear stack trace + debugging messages.

https://codereview.appspot.com/236070044/diff/1/rules_parser.py
File rules_parser.py (right):

https://codereview.appspot.com/236070044/diff/1/rules_parser.py#newcode26
rules_parser.py:26: By design, only the above 'rules' import and primitive args
are supported.
On 2015/06/09 15:27:42, zwr wrote:
> Two reasons:
> 
> 1)
> I want to limit the API to a clean subset, similar to BUILD files.  All custom
> code belongs in rule classes, *not* the config file.
> 
> 2)
> I want to restrict the API to a safe subset.  In latency_lab we let users
submit
> their rules as part of their test via a "--rules_path" flag:
>   http://cs/file:wpr_controller.py%20rules_path
> and we don't allow tests to run arbitrary Python code on our servers.
> 
> If you insist, I can add a flag to disable the _ValidateModule call, but I
think
> that's a bad idea.

I agree that it's worrisome if any of your users can submit arbitrary python
code that crash your system.

However, that's a design constraint of latency_lab, not web-page-replay. One
thing you can do to make your system safer is by defining s.t like
latency_lab_rules that know how to read json files at some location and
translate that to your custom rules. Your end users will be sending json to your
service instead of python.

Sign in to reply to this message.

zwr

https://codereview.appspot.com/236070044/diff/1/documentation/Rules.md File documentation/Rules.md (right): https://codereview.appspot.com/236070044/diff/1/documentation/Rules.md#newcode56 documentation/Rules.md:56: import rules I want an invalid rules file to ...

8 years, 11 months ago (2015-06-09 18:22:35 UTC) #6

nednguyen

https://codereview.appspot.com/236070044/diff/1/rules_parser.py File rules_parser.py (right): https://codereview.appspot.com/236070044/diff/1/rules_parser.py#newcode26 rules_parser.py:26: By design, only the above 'rules' import and primitive ...

8 years, 11 months ago (2015-06-09 20:22:09 UTC) #7

https://codereview.appspot.com/236070044/diff/1/rules_parser.py
File rules_parser.py (right):

https://codereview.appspot.com/236070044/diff/1/rules_parser.py#newcode26
rules_parser.py:26: By design, only the above 'rules' import and primitive args
are supported.
On 2015/06/09 18:22:35, zwr wrote:
> > read json files at some location and
> > translate that to your custom rules. Your end users will be sending json to
> your
> > service instead of python.
> 
> Ugh, I'd rather not pre-validate the input on the latency_lab side.  I'd
prefer
> to keep this self-contained, in case the rule syntax evolves.  I don't want a
> second syntax.

If the rule syntax evolves, I suppose it will be handle by latency_lab's setup
and not by adding more custom logic to web-page-replay.

There may be way on your side to avoid maintaining two set of rules in parallel.

Anyway, now I see that you allow your customers to submit the config file, I can
clearly see why you want that config to json in the first place.

To make this productive for collaboration, can you create a doc that include the
bigger picture of web-page-replay features + constraints from latency_lab server
side? 

The design constraints we are seeing are:
1) I don't want the rules in web-page-replay to keep growing as more and more
customers bring up more use cases.
2) LatencyLab doesn't want customers to send unsafe configs that can mess up the
server.

I believe there are primitives we can build in web-page-replay that allows
LatencyLab to be both flexible in adding more routing rule without 1) sking for
web-page-replay's owners approval 2)compromising the safety. Having a  design
doc so we can work on this together is the first good step.

> 
> Yes, this is a design constraint.  I think it's a good one that offers several
> benefits and few limitations, and even then could easily be turned off via a
> flag.

Sign in to reply to this message.

zwr

https://codereview.appspot.com/236070044/diff/1/rules_parser.py File rules_parser.py (right): https://codereview.appspot.com/236070044/diff/1/rules_parser.py#newcode26 rules_parser.py:26: By design, only the above 'rules' import and primitive ...

8 years, 11 months ago (2015-06-09 20:26:44 UTC) #8

https://codereview.appspot.com/236070044/diff/1/rules_parser.py
File rules_parser.py (right):

https://codereview.appspot.com/236070044/diff/1/rules_parser.py#newcode26
rules_parser.py:26: By design, only the above 'rules' import and primitive args
are supported.
On 2015/06/09 20:22:09, nednguyen wrote:
> On 2015/06/09 18:22:35, zwr wrote:
> > > read json files at some location and
> > > translate that to your custom rules. Your end users will be sending json
to
> > your
> > > service instead of python.
> > 
> > Ugh, I'd rather not pre-validate the input on the latency_lab side.  I'd
> prefer
> > to keep this self-contained, in case the rule syntax evolves.  I don't want
a
> > second syntax.
> 
> If the rule syntax evolves, I suppose it will be handle by latency_lab's setup
> and not by adding more custom logic to web-page-replay.
> 
> There may be way on your side to avoid maintaining two set of rules in
parallel.
> 
> Anyway, now I see that you allow your customers to submit the config file, I
can
> clearly see why you want that config to json in the first place.
> 
> To make this productive for collaboration, can you create a doc that include
the
> bigger picture of web-page-replay features + constraints from latency_lab
server
> side? 
> 
> The design constraints we are seeing are:
> 1) I don't want the rules in web-page-replay to keep growing as more and more
> customers bring up more use cases.
> 2) LatencyLab doesn't want customers to send unsafe configs that can mess up
the
> server.
> 
> I believe there are primitives we can build in web-page-replay that allows
> LatencyLab to be both flexible in adding more routing rule without 1) sking
for
> web-page-replay's owners approval 2)compromising the safety. Having a  design
> doc so we can work on this together is the first good step.
> 
> > 
> > Yes, this is a design constraint.  I think it's a good one that offers
several
> > benefits and few limitations, and even then could easily be turned off via a
> > flag.

SGTM, I'll (a) create an offline doc to discuss and (b) add a "Design" section
to documentation/Rules.md

Sign in to reply to this message.

zwr

On 2015/06/09 20:26:44, zwr wrote: > https://codereview.appspot.com/236070044/diff/1/rules_parser.py > File rules_parser.py (right): > > https://codereview.appspot.com/236070044/diff/1/rules_parser.py#newcode26 > ...

8 years, 11 months ago (2015-06-09 22:03:40 UTC) #9

On 2015/06/09 20:26:44, zwr wrote:
> https://codereview.appspot.com/236070044/diff/1/rules_parser.py
> File rules_parser.py (right):
> 
> https://codereview.appspot.com/236070044/diff/1/rules_parser.py#newcode26
> rules_parser.py:26: By design, only the above 'rules' import and primitive
args
> are supported.
> On 2015/06/09 20:22:09, nednguyen wrote:
> > On 2015/06/09 18:22:35, zwr wrote:
> > > > read json files at some location and
> > > > translate that to your custom rules. Your end users will be sending json
> to
> > > your
> > > > service instead of python.
> > > 
> > > Ugh, I'd rather not pre-validate the input on the latency_lab side.  I'd
> > prefer
> > > to keep this self-contained, in case the rule syntax evolves.  I don't
want
> a
> > > second syntax.
> > 
> > If the rule syntax evolves, I suppose it will be handle by latency_lab's
setup
> > and not by adding more custom logic to web-page-replay.
> > 
> > There may be way on your side to avoid maintaining two set of rules in
> parallel.
> > 
> > Anyway, now I see that you allow your customers to submit the config file, I
> can
> > clearly see why you want that config to json in the first place.
> > 
> > To make this productive for collaboration, can you create a doc that include
> the
> > bigger picture of web-page-replay features + constraints from latency_lab
> server
> > side? 
> > 
> > The design constraints we are seeing are:
> > 1) I don't want the rules in web-page-replay to keep growing as more and
more
> > customers bring up more use cases.
> > 2) LatencyLab doesn't want customers to send unsafe configs that can mess up
> the
> > server.
> > 
> > I believe there are primitives we can build in web-page-replay that allows
> > LatencyLab to be both flexible in adding more routing rule without 1) sking
> for
> > web-page-replay's owners approval 2)compromising the safety. Having a 
design
> > doc so we can work on this together is the first good step.
> > 
> > > 
> > > Yes, this is a design constraint.  I think it's a good one that offers
> several
> > > benefits and few limitations, and even then could easily be turned off via
a
> > > flag.
> 
> SGTM, I'll (a) create an offline doc to discuss and (b) add a "Design" section
> to documentation/Rules.md

Sign in to reply to this message.

zwr

Offline design doc: https://docs.google.com/document/d/1DKRenYTzFIUJ5T5Tg8vYpnliy4omAUkBRMQaNt8qIgE I'll update documentation/Rules.md once we've settled on a design.

8 years, 11 months ago (2015-06-09 22:04:29 UTC) #10

nednguyen

Thanks for putting up the doc! I will review this as soon as possible. On ...

8 years, 11 months ago (2015-06-09 22:05:19 UTC) #11

klm

Hi Ned, Zoe is going to be out for a month starting next week. Is ...

8 years, 11 months ago (2015-06-23 13:39:54 UTC) #12

nednguyen

Yes, I will keep this in mind. Feel free to ping me through chat if ...

8 years, 11 months ago (2015-06-23 21:46:57 UTC) #13

nednguyen

https://codereview.appspot.com/236070044/diff/1/rules_parser.py File rules_parser.py (right): https://codereview.appspot.com/236070044/diff/1/rules_parser.py#newcode26 rules_parser.py:26: By design, only the above 'rules' import and primitive ...

8 years, 11 months ago (2015-06-23 22:10:45 UTC) #14

https://codereview.appspot.com/236070044/diff/1/rules_parser.py
File rules_parser.py (right):

https://codereview.appspot.com/236070044/diff/1/rules_parser.py#newcode26
rules_parser.py:26: By design, only the above 'rules' import and primitive args
are supported.
On 2015/06/09 20:26:44, zwr wrote:
> On 2015/06/09 20:22:09, nednguyen wrote:
> > On 2015/06/09 18:22:35, zwr wrote:
> > > > read json files at some location and
> > > > translate that to your custom rules. Your end users will be sending json
> to
> > > your
> > > > service instead of python.
> > > 
> > > Ugh, I'd rather not pre-validate the input on the latency_lab side.  I'd
> > prefer
> > > to keep this self-contained, in case the rule syntax evolves.  I don't
want
> a
> > > second syntax.
> > 
> > If the rule syntax evolves, I suppose it will be handle by latency_lab's
setup
> > and not by adding more custom logic to web-page-replay.
> > 
> > There may be way on your side to avoid maintaining two set of rules in
> parallel.
> > 
> > Anyway, now I see that you allow your customers to submit the config file, I
> can
> > clearly see why you want that config to json in the first place.
> > 
> > To make this productive for collaboration, can you create a doc that include
> the
> > bigger picture of web-page-replay features + constraints from latency_lab
> server
> > side? 
> > 
> > The design constraints we are seeing are:
> > 1) I don't want the rules in web-page-replay to keep growing as more and
more
> > customers bring up more use cases.
> > 2) LatencyLab doesn't want customers to send unsafe configs that can mess up
> the
> > server.
> > 
> > I believe there are primitives we can build in web-page-replay that allows
> > LatencyLab to be both flexible in adding more routing rule without 1) sking
> for
> > web-page-replay's owners approval 2)compromising the safety. Having a 
design
> > doc so we can work on this together is the first good step.
> > 
> > > 
> > > Yes, this is a design constraint.  I think it's a good one that offers
> several
> > > benefits and few limitations, and even then could easily be turned off via
a
> > > flag.
> 
> SGTM, I'll (a) create an offline doc to discuss and (b) add a "Design" section
> to documentation/Rules.md

My concern with this is not many people are familiar with python's ast library,
hence increase the cost of maintaining this custom parser.

Should we switch to a json file for the input file and keep the rest the same?

[ 
   {"rule_class_name1": 
     {  "arg1": value,
        "arg2": value,
        ....
      }
   },
   {"rule_class_name2": 
     {  "arg1": value,
        "arg2": value,
        ....
      }
   }...
]
e.g: the input file will be s.t like:

[
    {
        "SendStatus": {
            "url": "example\\.com/ss.*",
            "status": 204
        }
    },
    {
        "ModifyUrl": {
            "url": "(example\\.com)(/.*)",
            "new_url": "{1}"
        }
    }
]

The parser then will just read the json file and find the right classes
(rules.SendStatus & rules.ModifyUrl in this case) in rules/ and pass in the
params with their corresponding values.

This makes the processing of consuming input from end users safe, without
compromising the flexibility of adding a new rule.

Sign in to reply to this message.

zwr

Hi, I'm back from my 4-week leave. I've changed the rules file from Python to ...

8 years, 9 months ago (2015-07-28 19:53:01 UTC) #15

nednguyen

https://codereview.appspot.com/236070044/diff/20001/rules/log_url.py File rules/log_url.py (right): https://codereview.appspot.com/236070044/diff/20001/rules/log_url.py#newcode39 rules/log_url.py:39: def ApplyRule(self, return_value, request, _): Instead of _, you ...

8 years, 9 months ago (2015-07-28 20:23:53 UTC) #16

zwr

Hi, I've added the minimal replay.py and httpproxy.py integration to this review, to better point ...

8 years, 9 months ago (2015-07-29 20:28:26 UTC) #17

nednguyen

On 2015/07/29 20:28:26, zwr wrote: > Hi, I've added the minimal replay.py and httpproxy.py integration ...

8 years, 9 months ago (2015-07-29 21:37:21 UTC) #18

On 2015/07/29 20:28:26, zwr wrote:
> Hi, I've added the minimal replay.py and httpproxy.py integration to this
> review, to better point out exactly where the rules are applied.
> 
> On 2015/07/28 20:23:53, nednguyen wrote:
> > https://codereview.appspot.com/236070044/diff/20001/rules/log_url.py
> > File rules/log_url.py (right):
> > 
> >
https://codereview.appspot.com/236070044/diff/20001/rules/log_url.py#newcode39
> > rules/log_url.py:39: def ApplyRule(self, return_value, request, _):
> > Instead of _, you can just do response & do "del response  # unused"
> 
> Changed to use "# pylint: disable=unused-argument".
> 
> > 
> > https://codereview.appspot.com/236070044/diff/20001/rules/rule.py
> > File rules/rule.py (right):
> > 
> > https://codereview.appspot.com/236070044/diff/20001/rules/rule.py#newcode18
> > rules/rule.py:18: """An optional base class for rule implementations.
> > Why optional? I usually choose to be as conservative about what APIs we
> support
> > as possible.
> 
> Acknowledged but see below.
> 
> > 
> > https://codereview.appspot.com/236070044/diff/20001/rules/rule.py#newcode21
> > rules/rule.py:21: rules are not strictly required to extend this class.
> > Ummm, let's just enforce that all rules must implement this interface. We
> > usually try to avoid duck typing.
> 
> Acknowledged but I prefer my design, which avoids the "from rules import
rule".

> 
> My design encapsulates the entire "rules/" package, which can (and SHOULD) be
> ignored by the top-level classes (replay.py, httpproxy.py, ...).

> 
> > 
> > https://codereview.appspot.com/236070044/diff/20001/rules_parser.py
> > File rules_parser.py (right):
> > 
> >
https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode18
> > rules_parser.py:18: [{"comment": ignored_value},
> > I like this
> > 
> >
https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode20
> > rules_parser.py:20: {"rule_class_name2": {"arg1": value, "arg2": value,
...}},
> > You may want add documentation for the ordering of applying these rules.
> 
> No, rules aren't (exactly) applied in order.  All rules of a given type (e.g.,
> "log_url")
> ARE applied in order, but different rules are applied independently.
> 
> E.g.:
>    [{"RuleA": ...}, {"RuleB": ...}]
> doesn't imply that RuleA runs before RuleB, because (e.g.) the implementation
in
> httpproxy is:
>       apply all RuleB's
>       ... do stuff ...
>       apply all RuleA's
> 
> See my next comment for more info.
> 
> > 
> >
https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode63
> > rules_parser.py:63: def Find(self, rule_type_name):
> > Why do we need this API? I thought all we need is an Apply(requrest,
response)
> > method that will chain all the rules contain in this together?
> 
> No, rules are applied at different points in the code.  You can't simply apply
> them all
> in one spot. My original pull request (May 5th) shows the exact spots in both
> httpclient.py and httpproxy.py:
>    https://github.com/chromium/web-page-replay/pull/49/files#diff-3
>    https://github.com/chromium/web-page-replay/pull/49/files#diff-2
> 
> > 
> >
https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode86
> > rules_parser.py:86: class _Rule(object):
> > Not sure why we need this? Can the callsite just do:
> > 
> > Rules(...).Find('log_url').ApplyRule(...)?
> 
> No, you can have more than one 'log_url' rule, so the _Rule iterates over the
> list and
> calls ApplyRule on each one.
> 
> Also, the Rule API is:
>     ApplyRule(return_value, request, response) --> (should_stop, return_value)
> not:
>     ApplyRule(request, response) --> return_value
> The _Rule handles the "should_stop" handling.  The separate return_value
allows
> chained rules to incrementally build the return_value, e.g.:
>     https://github.com/chromium/web-page-replay/pull/49/files#diff-13
> where each RemoveHeader in the chain can modify the pending return_value.
> 
> > 
> >
https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode158
> > rules_parser.py:158: clazz = getattr(module, classname)
> > assert clazz issubclass of rule.Rule
> 
> No, I prefer the duck typing, as noted above.


Can you reply to my comments in the original places so it's easier to follow the
discussion?

Sign in to reply to this message.

zwr

Replied in original places https://codereview.appspot.com/236070044/diff/20001/rules/log_url.py File rules/log_url.py (right): https://codereview.appspot.com/236070044/diff/20001/rules/log_url.py#newcode39 rules/log_url.py:39: def ApplyRule(self, return_value, request, _): ...

8 years, 9 months ago (2015-07-29 21:47:01 UTC) #19

nednguyen

I don't understand the part about why you want to avoid "from rules import rule" ...

8 years, 9 months ago (2015-07-29 21:51:21 UTC) #20

zwr

On 2015/07/29 21:51:21, nednguyen wrote: > I don't understand the part about why you want ...

8 years, 9 months ago (2015-07-30 22:13:34 UTC) #21

zwr

https://codereview.appspot.com/236070044/diff/80001/documentation/Rules.md File documentation/Rules.md (right): https://codereview.appspot.com/236070044/diff/80001/documentation/Rules.md#newcode85 documentation/Rules.md:85: On 2015/07/29 21:51:21, nednguyen wrote: > Add comments about ...

8 years, 9 months ago (2015-07-30 22:14:16 UTC) #22

https://codereview.appspot.com/236070044/diff/80001/documentation/Rules.md
File documentation/Rules.md (right):

https://codereview.appspot.com/236070044/diff/80001/documentation/Rules.md#ne...
documentation/Rules.md:85: 
On 2015/07/29 21:51:21, nednguyen wrote:
> Add comments about how all the rules of the same type are chained together and
> applied in the same order as they appear in the my_rules.json

Done.

https://codereview.appspot.com/236070044/diff/80001/rules/log_url.py
File rules/log_url.py (right):

https://codereview.appspot.com/236070044/diff/80001/rules/log_url.py#newcode39
rules/log_url.py:39: def ApplyRule(self, return_value, request,
unused_response):
On 2015/07/29 21:51:21, nednguyen wrote:
> keep the name "repsonse" and add "del response" below. We deliberately use
this
> style to avoid the situation where callsite does "ApplyRule(...,
> response=Foo())" which will explode if you change the param name.

Done, very cool.  I did a quick code search for  ^\s+del\s\w+$  and didn't find
many uses, but I can't think of any downsides and there are several upsides
(e.g., detects name changes).

https://codereview.appspot.com/236070044/diff/80001/rules_parser.py
File rules_parser.py (right):

https://codereview.appspot.com/236070044/diff/80001/rules_parser.py#newcode89
rules_parser.py:89: def __init__(self, *rules):
On 2015/07/29 21:51:21, nednguyen wrote:
> nit: just use (self, rules) is good enough. 

Done, though I also had to change the above caller from _Rule(*matches) to
_Rule(matches).

https://codereview.appspot.com/236070044/diff/80001/rules_parser.py#newcode158
rules_parser.py:158: clazz = getattr(module, classname)
On 2015/07/29 21:51:21, nednguyen wrote:
> If you don't ignore the case clazz not subclass of rule.Rule, what if someone
> specifies the "rule" as the modulename?

E.g.,
      [{'rules.log_url': {}}]
?

This'll raise an error:
    1: rules.log_url lacks IsType and ApplyRule
If I hack rules/log_url.py to add:
    def IsType(): pass
    def ApplyRule(): pass
    def __call__(): pass
then I'll still get an error:
      ....
        rule = clazz(**args)
    TypeError: 'module' object is not callable
So I don't think this is a problem.

Someone could modify rules/log_url.py from:
    class LogUrl(rule.Rule): ...
to:
    class LogUrl(object): ...
but I'm explicitly allowing this, as previously noted.

Sign in to reply to this message.

nednguyen

On 2015/07/30 22:13:34, zwr wrote: > On 2015/07/29 21:51:21, nednguyen wrote: > > I don't ...

8 years, 9 months ago (2015-07-30 22:21:19 UTC) #23

zwr

8 years, 9 months ago (2015-07-31 14:42:32 UTC) #25

Thanks, I've created a pull request: 
https://github.com/chromium/web-page-replay/pull/52

Sign in to reply to this message.

Issue 236070044: Add initial rules_parser. (Closed)

Description

Patch Set 1 #

Patch Set 2 : changed format from ast to json #

Patch Set 3 : Fixed for reply 16 #

Patch Set 4 : Undo the "rules applied in order" doc #

Patch Set 5 : Added replay calls to load rules #

Patch Set 6 : Fixed per reply20 #

Patch Set 7 : Minor fix for reply20 #

Messages