|
|
Created:
9 years ago by zwr Modified:
8 years, 9 months ago Base URL:
https://github.com/chromium/web-page-replay.git@master Visibility:
Public. |
DescriptionAdd initial rules_parser.
BUG=
Patch Set 1 #
Total comments: 11
Patch Set 2 : changed format from ast to json #
Total comments: 16
Patch Set 3 : Fixed for reply 16 #Patch Set 4 : Undo the "rules applied in order" doc #Patch Set 5 : Added replay calls to load rules #
Total comments: 8
Patch Set 6 : Fixed per reply20 #Patch Set 7 : Minor fix for reply20 #
MessagesTotal messages: 25
Sorry for long wait time. I have some concern about the overall design goal as shown in the comments. https://codereview.appspot.com/236070044/diff/1/documentation/Rules.md File documentation/Rules.md (right): https://codereview.appspot.com/236070044/diff/1/documentation/Rules.md#newcode56 documentation/Rules.md:56: import rules Why is this 'import rules'? needed? I want to clarify my idea is that anyone can come in and add their custom rules without requiring to change web-page-replay's code base. i.e: if I fetch web-page-replay repo on my local environment, I should be able to define custom_rule.py somewhere and import it in my_rules.py. https://codereview.appspot.com/236070044/diff/1/rules_parser.py File rules_parser.py (right): https://codereview.appspot.com/236070044/diff/1/rules_parser.py#newcode26 rules_parser.py:26: By design, only the above 'rules' import and primitive args are supported. Why? Web-page-replay customers should be able to define any rule they want and take responsibility for making sure that the rules good.
Sign in to reply to this message.
https://codereview.appspot.com/236070044/diff/1/documentation/Rules.md File documentation/Rules.md (right): https://codereview.appspot.com/236070044/diff/1/documentation/Rules.md#newcode56 documentation/Rules.md:56: import rules Well, rules should be in a module, so we need some kind of import MODULE_NAME so it can later do add_rule(MODULE_NAME....) The rules_parser.py currently limits the allowed_imports to {'rules'}, b/c I think it's unsafe to import arbitrary modules. I'll modify the constructor to accept allowed_imports as an arg, where {'*'} means "allow all". https://codereview.appspot.com/236070044/diff/1/rules_parser.py File rules_parser.py (right): https://codereview.appspot.com/236070044/diff/1/rules_parser.py#newcode26 rules_parser.py:26: By design, only the above 'rules' import and primitive args are supported. Two reasons: 1) I want to limit the API to a clean subset, similar to BUILD files. All custom code belongs in rule classes, *not* the config file. 2) I want to restrict the API to a safe subset. In latency_lab we let users submit their rules as part of their test via a "--rules_path" flag: http://cs/file:wpr_controller.py%20rules_path and we don't allow tests to run arbitrary Python code on our servers. If you insist, I can add a flag to disable the _ValidateModule call, but I think that's a bad idea.
Sign in to reply to this message.
https://codereview.appspot.com/236070044/diff/1/documentation/Rules.md File documentation/Rules.md (right): https://codereview.appspot.com/236070044/diff/1/documentation/Rules.md#newcode56 documentation/Rules.md:56: import rules On 2015/06/09 15:27:42, zwr wrote: > Well, rules should be in a module, so we need some kind of > import MODULE_NAME > so it can later do > add_rule(MODULE_NAME....) > > The rules_parser.py currently limits the allowed_imports to {'rules'}, b/c I > think it's unsafe to import arbitrary modules. I'll modify the constructor to > accept allowed_imports as an arg, where {'*'} means "allow all". I don't understand this safety model. If what you want is to prevent the malformed rule crashes wpr at all cost, then I don't think it's possible. For example, what if people just run wpr with --rules_path <my_rules> whereas my_rules refer to the bogus value? I think the main direction we should take here should be leaning toward flexibility for users and reducing load on wpr's codebase, i.e: if you come up with a crazy new rule, wpr should enable you to config that rule without touching github.com/chromium/web-page-replay . On the safety matter, the best thing we can do is make it easy for debugging when things go wrong via clear stack trace + debugging messages. https://codereview.appspot.com/236070044/diff/1/rules_parser.py File rules_parser.py (right): https://codereview.appspot.com/236070044/diff/1/rules_parser.py#newcode26 rules_parser.py:26: By design, only the above 'rules' import and primitive args are supported. On 2015/06/09 15:27:42, zwr wrote: > Two reasons: > > 1) > I want to limit the API to a clean subset, similar to BUILD files. All custom > code belongs in rule classes, *not* the config file. > > 2) > I want to restrict the API to a safe subset. In latency_lab we let users submit > their rules as part of their test via a "--rules_path" flag: > http://cs/file:wpr_controller.py%20rules_path > and we don't allow tests to run arbitrary Python code on our servers. > > If you insist, I can add a flag to disable the _ValidateModule call, but I think > that's a bad idea. I agree that it's worrisome if any of your users can submit arbitrary python code that crash your system. However, that's a design constraint of latency_lab, not web-page-replay. One thing you can do to make your system safer is by defining s.t like latency_lab_rules that know how to read json files at some location and translate that to your custom rules. Your end users will be sending json to your service instead of python.
Sign in to reply to this message.
https://codereview.appspot.com/236070044/diff/1/documentation/Rules.md File documentation/Rules.md (right): https://codereview.appspot.com/236070044/diff/1/documentation/Rules.md#newcode56 documentation/Rules.md:56: import rules I want an invalid rules file to kill the server. That's fine. I *don't* want a rules file to corrupt the host (e.g. modify the file system, open sockets, etc), except for predefined rules in "rules/*.py" (which we've already vetted). That'd make it unsafe to run arbitrary rule_file input. As I've noted elsewhere, I see rule files as analogous to BUILD files, which use a similarly restricted subset of Python. The restriction both keeps things clean and supports the above safeties. If we someday need more flexibility, we can either add a "turn safeties off" flag or create a new package (e.g. "custom", not in allowed_imports by default) and do: import custom add_rule(custom.NewRule(src=""" class MyRule(rules.Rule): ... """)) https://codereview.appspot.com/236070044/diff/1/rules_parser.py File rules_parser.py (right): https://codereview.appspot.com/236070044/diff/1/rules_parser.py#newcode26 rules_parser.py:26: By design, only the above 'rules' import and primitive args are supported. > read json files at some location and > translate that to your custom rules. Your end users will be sending json to your > service instead of python. Ugh, I'd rather not pre-validate the input on the latency_lab side. I'd prefer to keep this self-contained, in case the rule syntax evolves. I don't want a second syntax. Yes, this is a design constraint. I think it's a good one that offers several benefits and few limitations, and even then could easily be turned off via a flag.
Sign in to reply to this message.
https://codereview.appspot.com/236070044/diff/1/rules_parser.py File rules_parser.py (right): https://codereview.appspot.com/236070044/diff/1/rules_parser.py#newcode26 rules_parser.py:26: By design, only the above 'rules' import and primitive args are supported. On 2015/06/09 18:22:35, zwr wrote: > > read json files at some location and > > translate that to your custom rules. Your end users will be sending json to > your > > service instead of python. > > Ugh, I'd rather not pre-validate the input on the latency_lab side. I'd prefer > to keep this self-contained, in case the rule syntax evolves. I don't want a > second syntax. If the rule syntax evolves, I suppose it will be handle by latency_lab's setup and not by adding more custom logic to web-page-replay. There may be way on your side to avoid maintaining two set of rules in parallel. Anyway, now I see that you allow your customers to submit the config file, I can clearly see why you want that config to json in the first place. To make this productive for collaboration, can you create a doc that include the bigger picture of web-page-replay features + constraints from latency_lab server side? The design constraints we are seeing are: 1) I don't want the rules in web-page-replay to keep growing as more and more customers bring up more use cases. 2) LatencyLab doesn't want customers to send unsafe configs that can mess up the server. I believe there are primitives we can build in web-page-replay that allows LatencyLab to be both flexible in adding more routing rule without 1) sking for web-page-replay's owners approval 2)compromising the safety. Having a design doc so we can work on this together is the first good step. > > Yes, this is a design constraint. I think it's a good one that offers several > benefits and few limitations, and even then could easily be turned off via a > flag.
Sign in to reply to this message.
https://codereview.appspot.com/236070044/diff/1/rules_parser.py File rules_parser.py (right): https://codereview.appspot.com/236070044/diff/1/rules_parser.py#newcode26 rules_parser.py:26: By design, only the above 'rules' import and primitive args are supported. On 2015/06/09 20:22:09, nednguyen wrote: > On 2015/06/09 18:22:35, zwr wrote: > > > read json files at some location and > > > translate that to your custom rules. Your end users will be sending json to > > your > > > service instead of python. > > > > Ugh, I'd rather not pre-validate the input on the latency_lab side. I'd > prefer > > to keep this self-contained, in case the rule syntax evolves. I don't want a > > second syntax. > > If the rule syntax evolves, I suppose it will be handle by latency_lab's setup > and not by adding more custom logic to web-page-replay. > > There may be way on your side to avoid maintaining two set of rules in parallel. > > Anyway, now I see that you allow your customers to submit the config file, I can > clearly see why you want that config to json in the first place. > > To make this productive for collaboration, can you create a doc that include the > bigger picture of web-page-replay features + constraints from latency_lab server > side? > > The design constraints we are seeing are: > 1) I don't want the rules in web-page-replay to keep growing as more and more > customers bring up more use cases. > 2) LatencyLab doesn't want customers to send unsafe configs that can mess up the > server. > > I believe there are primitives we can build in web-page-replay that allows > LatencyLab to be both flexible in adding more routing rule without 1) sking for > web-page-replay's owners approval 2)compromising the safety. Having a design > doc so we can work on this together is the first good step. > > > > > Yes, this is a design constraint. I think it's a good one that offers several > > benefits and few limitations, and even then could easily be turned off via a > > flag. SGTM, I'll (a) create an offline doc to discuss and (b) add a "Design" section to documentation/Rules.md
Sign in to reply to this message.
On 2015/06/09 20:26:44, zwr wrote: > https://codereview.appspot.com/236070044/diff/1/rules_parser.py > File rules_parser.py (right): > > https://codereview.appspot.com/236070044/diff/1/rules_parser.py#newcode26 > rules_parser.py:26: By design, only the above 'rules' import and primitive args > are supported. > On 2015/06/09 20:22:09, nednguyen wrote: > > On 2015/06/09 18:22:35, zwr wrote: > > > > read json files at some location and > > > > translate that to your custom rules. Your end users will be sending json > to > > > your > > > > service instead of python. > > > > > > Ugh, I'd rather not pre-validate the input on the latency_lab side. I'd > > prefer > > > to keep this self-contained, in case the rule syntax evolves. I don't want > a > > > second syntax. > > > > If the rule syntax evolves, I suppose it will be handle by latency_lab's setup > > and not by adding more custom logic to web-page-replay. > > > > There may be way on your side to avoid maintaining two set of rules in > parallel. > > > > Anyway, now I see that you allow your customers to submit the config file, I > can > > clearly see why you want that config to json in the first place. > > > > To make this productive for collaboration, can you create a doc that include > the > > bigger picture of web-page-replay features + constraints from latency_lab > server > > side? > > > > The design constraints we are seeing are: > > 1) I don't want the rules in web-page-replay to keep growing as more and more > > customers bring up more use cases. > > 2) LatencyLab doesn't want customers to send unsafe configs that can mess up > the > > server. > > > > I believe there are primitives we can build in web-page-replay that allows > > LatencyLab to be both flexible in adding more routing rule without 1) sking > for > > web-page-replay's owners approval 2)compromising the safety. Having a design > > doc so we can work on this together is the first good step. > > > > > > > > Yes, this is a design constraint. I think it's a good one that offers > several > > > benefits and few limitations, and even then could easily be turned off via a > > > flag. > > SGTM, I'll (a) create an offline doc to discuss and (b) add a "Design" section > to documentation/Rules.md
Sign in to reply to this message.
Offline design doc: https://docs.google.com/document/d/1DKRenYTzFIUJ5T5Tg8vYpnliy4omAUkBRMQaNt8qIgE I'll update documentation/Rules.md once we've settled on a design.
Sign in to reply to this message.
Thanks for putting up the doc! I will review this as soon as possible. On Tue, Jun 9, 2015 at 3:04 PM <zwr@google.com> wrote: > Offline design doc: > > https://docs.google.com/document/d/1DKRenYTzFIUJ5T5Tg8vYpnliy4omAUkBRMQaNt8qIgE > > I'll update documentation/Rules.md once we've settled on a design. > > https://codereview.appspot.com/236070044/ >
Sign in to reply to this message.
Hi Ned, Zoe is going to be out for a month starting next week. Is it possible to expedite the review? We have an intern working on WPR directly, with 2 other interns doing stuff related to WPR, so we'd like to get it in a reasonable order in the source tree. Thanks! Michael
Sign in to reply to this message.
Yes, I will keep this in mind. Feel free to ping me through chat if you find I don't answer your code review. On Tue, Jun 23, 2015 at 6:39 AM <klm@google.com> wrote: > Hi Ned, Zoe is going to be out for a month starting next week. Is it > possible to expedite the review? > We have an intern working on WPR directly, with 2 other interns doing > stuff related to WPR, so we'd like to get it in a reasonable order in > the source tree. > Thanks! > Michael > > https://codereview.appspot.com/236070044/ >
Sign in to reply to this message.
https://codereview.appspot.com/236070044/diff/1/rules_parser.py File rules_parser.py (right): https://codereview.appspot.com/236070044/diff/1/rules_parser.py#newcode26 rules_parser.py:26: By design, only the above 'rules' import and primitive args are supported. On 2015/06/09 20:26:44, zwr wrote: > On 2015/06/09 20:22:09, nednguyen wrote: > > On 2015/06/09 18:22:35, zwr wrote: > > > > read json files at some location and > > > > translate that to your custom rules. Your end users will be sending json > to > > > your > > > > service instead of python. > > > > > > Ugh, I'd rather not pre-validate the input on the latency_lab side. I'd > > prefer > > > to keep this self-contained, in case the rule syntax evolves. I don't want > a > > > second syntax. > > > > If the rule syntax evolves, I suppose it will be handle by latency_lab's setup > > and not by adding more custom logic to web-page-replay. > > > > There may be way on your side to avoid maintaining two set of rules in > parallel. > > > > Anyway, now I see that you allow your customers to submit the config file, I > can > > clearly see why you want that config to json in the first place. > > > > To make this productive for collaboration, can you create a doc that include > the > > bigger picture of web-page-replay features + constraints from latency_lab > server > > side? > > > > The design constraints we are seeing are: > > 1) I don't want the rules in web-page-replay to keep growing as more and more > > customers bring up more use cases. > > 2) LatencyLab doesn't want customers to send unsafe configs that can mess up > the > > server. > > > > I believe there are primitives we can build in web-page-replay that allows > > LatencyLab to be both flexible in adding more routing rule without 1) sking > for > > web-page-replay's owners approval 2)compromising the safety. Having a design > > doc so we can work on this together is the first good step. > > > > > > > > Yes, this is a design constraint. I think it's a good one that offers > several > > > benefits and few limitations, and even then could easily be turned off via a > > > flag. > > SGTM, I'll (a) create an offline doc to discuss and (b) add a "Design" section > to documentation/Rules.md My concern with this is not many people are familiar with python's ast library, hence increase the cost of maintaining this custom parser. Should we switch to a json file for the input file and keep the rest the same? [ {"rule_class_name1": { "arg1": value, "arg2": value, .... } }, {"rule_class_name2": { "arg1": value, "arg2": value, .... } }... ] e.g: the input file will be s.t like: [ { "SendStatus": { "url": "example\\.com/ss.*", "status": 204 } }, { "ModifyUrl": { "url": "(example\\.com)(/.*)", "new_url": "{1}" } } ] The parser then will just read the json file and find the right classes (rules.SendStatus & rules.ModifyUrl in this case) in rules/ and pass in the params with their corresponding values. This makes the processing of consuming input from end users safe, without compromising the flexibility of adding a new rule.
Sign in to reply to this message.
Hi, I'm back from my 4-week leave. I've changed the rules file from Python to JSON, as requested. PTAL.
Sign in to reply to this message.
https://codereview.appspot.com/236070044/diff/20001/rules/log_url.py File rules/log_url.py (right): https://codereview.appspot.com/236070044/diff/20001/rules/log_url.py#newcode39 rules/log_url.py:39: def ApplyRule(self, return_value, request, _): Instead of _, you can just do response & do "del response # unused" https://codereview.appspot.com/236070044/diff/20001/rules/rule.py File rules/rule.py (right): https://codereview.appspot.com/236070044/diff/20001/rules/rule.py#newcode18 rules/rule.py:18: """An optional base class for rule implementations. Why optional? I usually choose to be as conservative about what APIs we support as possible. https://codereview.appspot.com/236070044/diff/20001/rules/rule.py#newcode21 rules/rule.py:21: rules are not strictly required to extend this class. Ummm, let's just enforce that all rules must implement this interface. We usually try to avoid duck typing. https://codereview.appspot.com/236070044/diff/20001/rules_parser.py File rules_parser.py (right): https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode18 rules_parser.py:18: [{"comment": ignored_value}, I like this https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode20 rules_parser.py:20: {"rule_class_name2": {"arg1": value, "arg2": value, ...}}, You may want add documentation for the ordering of applying these rules. https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode63 rules_parser.py:63: def Find(self, rule_type_name): Why do we need this API? I thought all we need is an Apply(requrest, response) method that will chain all the rules contain in this together? https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode86 rules_parser.py:86: class _Rule(object): Not sure why we need this? Can the callsite just do: Rules(...).Find('log_url').ApplyRule(...)? https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode158 rules_parser.py:158: clazz = getattr(module, classname) assert clazz issubclass of rule.Rule
Sign in to reply to this message.
Hi, I've added the minimal replay.py and httpproxy.py integration to this review, to better point out exactly where the rules are applied. On 2015/07/28 20:23:53, nednguyen wrote: > https://codereview.appspot.com/236070044/diff/20001/rules/log_url.py > File rules/log_url.py (right): > > https://codereview.appspot.com/236070044/diff/20001/rules/log_url.py#newcode39 > rules/log_url.py:39: def ApplyRule(self, return_value, request, _): > Instead of _, you can just do response & do "del response # unused" Changed to use "# pylint: disable=unused-argument". > > https://codereview.appspot.com/236070044/diff/20001/rules/rule.py > File rules/rule.py (right): > > https://codereview.appspot.com/236070044/diff/20001/rules/rule.py#newcode18 > rules/rule.py:18: """An optional base class for rule implementations. > Why optional? I usually choose to be as conservative about what APIs we support > as possible. Acknowledged but see below. > > https://codereview.appspot.com/236070044/diff/20001/rules/rule.py#newcode21 > rules/rule.py:21: rules are not strictly required to extend this class. > Ummm, let's just enforce that all rules must implement this interface. We > usually try to avoid duck typing. Acknowledged but I prefer my design, which avoids the "from rules import rule". My design encapsulates the entire "rules/" package, which can (and SHOULD) be ignored by the top-level classes (replay.py, httpproxy.py, ...). > > https://codereview.appspot.com/236070044/diff/20001/rules_parser.py > File rules_parser.py (right): > > https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode18 > rules_parser.py:18: [{"comment": ignored_value}, > I like this > > https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode20 > rules_parser.py:20: {"rule_class_name2": {"arg1": value, "arg2": value, ...}}, > You may want add documentation for the ordering of applying these rules. No, rules aren't (exactly) applied in order. All rules of a given type (e.g., "log_url") ARE applied in order, but different rules are applied independently. E.g.: [{"RuleA": ...}, {"RuleB": ...}] doesn't imply that RuleA runs before RuleB, because (e.g.) the implementation in httpproxy is: apply all RuleB's ... do stuff ... apply all RuleA's See my next comment for more info. > > https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode63 > rules_parser.py:63: def Find(self, rule_type_name): > Why do we need this API? I thought all we need is an Apply(requrest, response) > method that will chain all the rules contain in this together? No, rules are applied at different points in the code. You can't simply apply them all in one spot. My original pull request (May 5th) shows the exact spots in both httpclient.py and httpproxy.py: https://github.com/chromium/web-page-replay/pull/49/files#diff-3 https://github.com/chromium/web-page-replay/pull/49/files#diff-2 > > https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode86 > rules_parser.py:86: class _Rule(object): > Not sure why we need this? Can the callsite just do: > > Rules(...).Find('log_url').ApplyRule(...)? No, you can have more than one 'log_url' rule, so the _Rule iterates over the list and calls ApplyRule on each one. Also, the Rule API is: ApplyRule(return_value, request, response) --> (should_stop, return_value) not: ApplyRule(request, response) --> return_value The _Rule handles the "should_stop" handling. The separate return_value allows chained rules to incrementally build the return_value, e.g.: https://github.com/chromium/web-page-replay/pull/49/files#diff-13 where each RemoveHeader in the chain can modify the pending return_value. > > https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode158 > rules_parser.py:158: clazz = getattr(module, classname) > assert clazz issubclass of rule.Rule No, I prefer the duck typing, as noted above.
Sign in to reply to this message.
On 2015/07/29 20:28:26, zwr wrote: > Hi, I've added the minimal replay.py and httpproxy.py integration to this > review, to better point out exactly where the rules are applied. > > On 2015/07/28 20:23:53, nednguyen wrote: > > https://codereview.appspot.com/236070044/diff/20001/rules/log_url.py > > File rules/log_url.py (right): > > > > https://codereview.appspot.com/236070044/diff/20001/rules/log_url.py#newcode39 > > rules/log_url.py:39: def ApplyRule(self, return_value, request, _): > > Instead of _, you can just do response & do "del response # unused" > > Changed to use "# pylint: disable=unused-argument". > > > > > https://codereview.appspot.com/236070044/diff/20001/rules/rule.py > > File rules/rule.py (right): > > > > https://codereview.appspot.com/236070044/diff/20001/rules/rule.py#newcode18 > > rules/rule.py:18: """An optional base class for rule implementations. > > Why optional? I usually choose to be as conservative about what APIs we > support > > as possible. > > Acknowledged but see below. > > > > > https://codereview.appspot.com/236070044/diff/20001/rules/rule.py#newcode21 > > rules/rule.py:21: rules are not strictly required to extend this class. > > Ummm, let's just enforce that all rules must implement this interface. We > > usually try to avoid duck typing. > > Acknowledged but I prefer my design, which avoids the "from rules import rule". > > My design encapsulates the entire "rules/" package, which can (and SHOULD) be > ignored by the top-level classes (replay.py, httpproxy.py, ...). > > > > > https://codereview.appspot.com/236070044/diff/20001/rules_parser.py > > File rules_parser.py (right): > > > > https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode18 > > rules_parser.py:18: [{"comment": ignored_value}, > > I like this > > > > https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode20 > > rules_parser.py:20: {"rule_class_name2": {"arg1": value, "arg2": value, ...}}, > > You may want add documentation for the ordering of applying these rules. > > No, rules aren't (exactly) applied in order. All rules of a given type (e.g., > "log_url") > ARE applied in order, but different rules are applied independently. > > E.g.: > [{"RuleA": ...}, {"RuleB": ...}] > doesn't imply that RuleA runs before RuleB, because (e.g.) the implementation in > httpproxy is: > apply all RuleB's > ... do stuff ... > apply all RuleA's > > See my next comment for more info. > > > > > https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode63 > > rules_parser.py:63: def Find(self, rule_type_name): > > Why do we need this API? I thought all we need is an Apply(requrest, response) > > method that will chain all the rules contain in this together? > > No, rules are applied at different points in the code. You can't simply apply > them all > in one spot. My original pull request (May 5th) shows the exact spots in both > httpclient.py and httpproxy.py: > https://github.com/chromium/web-page-replay/pull/49/files#diff-3 > https://github.com/chromium/web-page-replay/pull/49/files#diff-2 > > > > > https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode86 > > rules_parser.py:86: class _Rule(object): > > Not sure why we need this? Can the callsite just do: > > > > Rules(...).Find('log_url').ApplyRule(...)? > > No, you can have more than one 'log_url' rule, so the _Rule iterates over the > list and > calls ApplyRule on each one. > > Also, the Rule API is: > ApplyRule(return_value, request, response) --> (should_stop, return_value) > not: > ApplyRule(request, response) --> return_value > The _Rule handles the "should_stop" handling. The separate return_value allows > chained rules to incrementally build the return_value, e.g.: > https://github.com/chromium/web-page-replay/pull/49/files#diff-13 > where each RemoveHeader in the chain can modify the pending return_value. > > > > > https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode158 > > rules_parser.py:158: clazz = getattr(module, classname) > > assert clazz issubclass of rule.Rule > > No, I prefer the duck typing, as noted above. Can you reply to my comments in the original places so it's easier to follow the discussion?
Sign in to reply to this message.
Replied in original places https://codereview.appspot.com/236070044/diff/20001/rules/log_url.py File rules/log_url.py (right): https://codereview.appspot.com/236070044/diff/20001/rules/log_url.py#newcode39 rules/log_url.py:39: def ApplyRule(self, return_value, request, _): Changed to use "# pylint: disable=unused-argument". https://codereview.appspot.com/236070044/diff/20001/rules/rule.py File rules/rule.py (right): https://codereview.appspot.com/236070044/diff/20001/rules/rule.py#newcode18 rules/rule.py:18: """An optional base class for rule implementations. Acknowledged but see below. https://codereview.appspot.com/236070044/diff/20001/rules/rule.py#newcode21 rules/rule.py:21: rules are not strictly required to extend this class. Acknowledged but I prefer my design, which avoids the "from rules import rule". My design encapsulates the entire "rules/" package, which can (and SHOULD) be ignored by the top-level classes (replay.py, httpproxy.py, ...). https://codereview.appspot.com/236070044/diff/20001/rules_parser.py File rules_parser.py (right): https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode18 rules_parser.py:18: [{"comment": ignored_value}, On 2015/07/28 20:23:53, nednguyen wrote: > I like this Acknowledged. https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode20 rules_parser.py:20: {"rule_class_name2": {"arg1": value, "arg2": value, ...}}, No, rules aren't (exactly) applied in order. All rules of a given type (e.g., "log_url") ARE applied in order, but different rules are applied independently. E.g.: [{"RuleA": ...}, {"RuleB": ...}] doesn't imply that RuleA runs before RuleB, because (e.g.) the implementation in httpproxy is: apply all RuleB's ... do stuff ... apply all RuleA's See my next comment for more info. https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode63 rules_parser.py:63: def Find(self, rule_type_name): No, rules are applied at different points in the code. You can't simply apply them all in one spot. My original pull request (May 5th) shows the exact spots in both httpclient.py and httpproxy.py: https://github.com/chromium/web-page-replay/pull/49/files#diff-3 https://github.com/chromium/web-page-replay/pull/49/files#diff-2 https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode86 rules_parser.py:86: class _Rule(object): No, you can have more than one 'log_url' rule, so the _Rule iterates over the list and calls ApplyRule on each one. Also, the Rule API is: ApplyRule(return_value, request, response) --> (should_stop, return_value) not: ApplyRule(request, response) --> return_value The _Rule handles the "should_stop" handling. The separate return_value allows chained rules to incrementally build the return_value, e.g.: https://github.com/chromium/web-page-replay/pull/49/files#diff-13 https://codereview.appspot.com/236070044/diff/20001/rules_parser.py#newcode158 rules_parser.py:158: clazz = getattr(module, classname) No, I prefer the duck typing, as noted above.
Sign in to reply to this message.
I don't understand the part about why you want to avoid "from rules import rule" (which can be done by moving class Rule into rules/__init__.py) & how using duck typing allows to encapsulate rule classes in the rules package. https://codereview.appspot.com/236070044/diff/80001/documentation/Rules.md File documentation/Rules.md (right): https://codereview.appspot.com/236070044/diff/80001/documentation/Rules.md#ne... documentation/Rules.md:85: Add comments about how all the rules of the same type are chained together and applied in the same order as they appear in the my_rules.json https://codereview.appspot.com/236070044/diff/80001/rules/log_url.py File rules/log_url.py (right): https://codereview.appspot.com/236070044/diff/80001/rules/log_url.py#newcode39 rules/log_url.py:39: def ApplyRule(self, return_value, request, unused_response): keep the name "repsonse" and add "del response" below. We deliberately use this style to avoid the situation where callsite does "ApplyRule(..., response=Foo())" which will explode if you change the param name. https://codereview.appspot.com/236070044/diff/80001/rules_parser.py File rules_parser.py (right): https://codereview.appspot.com/236070044/diff/80001/rules_parser.py#newcode89 rules_parser.py:89: def __init__(self, *rules): nit: just use (self, rules) is good enough. https://codereview.appspot.com/236070044/diff/80001/rules_parser.py#newcode158 rules_parser.py:158: clazz = getattr(module, classname) If you don't ignore the case clazz not subclass of rule.Rule, what if someone specifies the "rule" as the modulename?
Sign in to reply to this message.
On 2015/07/29 21:51:21, nednguyen wrote: > I don't understand the part about why you want to avoid "from rules import rule" > (which can be done by moving class Rule into rules/__init__.py) & how using duck > typing allows to encapsulate rule classes in the rules package. The duck typing makes it easier to define new rules in completely separate repos. E.g., in repoX, I can create: my_rules/__init__.py # from my_rules.my_rule import MyRule my_rules/my_rule.py # class MyRule(object): def IsType(...): ...; def ApplyRule(...): ... and allow them via: python replay.py ... --allowed_rule_imports=rules,my_rules where this option defaults to "rules".
Sign in to reply to this message.
https://codereview.appspot.com/236070044/diff/80001/documentation/Rules.md File documentation/Rules.md (right): https://codereview.appspot.com/236070044/diff/80001/documentation/Rules.md#ne... documentation/Rules.md:85: On 2015/07/29 21:51:21, nednguyen wrote: > Add comments about how all the rules of the same type are chained together and > applied in the same order as they appear in the my_rules.json Done. https://codereview.appspot.com/236070044/diff/80001/rules/log_url.py File rules/log_url.py (right): https://codereview.appspot.com/236070044/diff/80001/rules/log_url.py#newcode39 rules/log_url.py:39: def ApplyRule(self, return_value, request, unused_response): On 2015/07/29 21:51:21, nednguyen wrote: > keep the name "repsonse" and add "del response" below. We deliberately use this > style to avoid the situation where callsite does "ApplyRule(..., > response=Foo())" which will explode if you change the param name. Done, very cool. I did a quick code search for ^\s+del\s\w+$ and didn't find many uses, but I can't think of any downsides and there are several upsides (e.g., detects name changes). https://codereview.appspot.com/236070044/diff/80001/rules_parser.py File rules_parser.py (right): https://codereview.appspot.com/236070044/diff/80001/rules_parser.py#newcode89 rules_parser.py:89: def __init__(self, *rules): On 2015/07/29 21:51:21, nednguyen wrote: > nit: just use (self, rules) is good enough. Done, though I also had to change the above caller from _Rule(*matches) to _Rule(matches). https://codereview.appspot.com/236070044/diff/80001/rules_parser.py#newcode158 rules_parser.py:158: clazz = getattr(module, classname) On 2015/07/29 21:51:21, nednguyen wrote: > If you don't ignore the case clazz not subclass of rule.Rule, what if someone > specifies the "rule" as the modulename? E.g., [{'rules.log_url': {}}] ? This'll raise an error: 1: rules.log_url lacks IsType and ApplyRule If I hack rules/log_url.py to add: def IsType(): pass def ApplyRule(): pass def __call__(): pass then I'll still get an error: .... rule = clazz(**args) TypeError: 'module' object is not callable So I don't think this is a problem. Someone could modify rules/log_url.py from: class LogUrl(rule.Rule): ... to: class LogUrl(object): ... but I'm explicitly allowing this, as previously noted.
Sign in to reply to this message.
On 2015/07/30 22:13:34, zwr wrote: > On 2015/07/29 21:51:21, nednguyen wrote: > > I don't understand the part about why you want to avoid "from rules import > rule" > > (which can be done by moving class Rule into rules/__init__.py) & how using > duck > > typing allows to encapsulate rule classes in the rules package. > > The duck typing makes it easier to define new rules in completely separate > repos. > > E.g., in repoX, I can create: > my_rules/__init__.py # from my_rules.my_rule import MyRule > my_rules/my_rule.py # class MyRule(object): def IsType(...): ...; def > ApplyRule(...): ... > and allow them via: > python replay.py ... --allowed_rule_imports=rules,my_rules > where this option defaults to "rules". This is cool & valid use case.
Sign in to reply to this message.
lgtm
Sign in to reply to this message.
Thanks, I've created a pull request: https://github.com/chromium/web-page-replay/pull/52
Sign in to reply to this message.
|