|
|
|
Created:
12 years, 9 months ago by ihab.awad Modified:
12 years, 8 months ago CC:
google-caja-discuss_googlegroups.com Base URL:
http://google-caja.googlecode.com/svn/trunk/ Visibility:
Public. |
DescriptionTest incoming guest source for embedded Unicode characters. If it does
have these, we force a re-rendering from a parse tree, which turns any
such characters inside string literals into escaped strings. If any such
characters remain, we go ahead and fail as before
Patch Set 1 #Patch Set 2 : Support literal Unicode in string literals in ES5 #
Total comments: 16
Patch Set 3 : Support literal Unicode in string literals in ES5 #
Total comments: 2
Patch Set 4 : Support literal Unicode in string literals in ES5 #
Total comments: 1
Patch Set 5 : Support literal Unicode in string literals in ES5 #
MessagesTotal messages: 23
Test incoming guest source for embedded Unicode characters. If it does have these, we force a re-rendering from a parse tree, which turns any such characters inside string literals into escaped strings. If any such characters remain, we go ahead and fail as before
Sign in to reply to this message.
https://codereview.appspot.com/10205043/diff/3001/src/com/google/caja/ses/sta... File src/com/google/caja/ses/startSES.js (right): https://codereview.appspot.com/10205043/diff/3001/src/com/google/caja/ses/sta... src/com/google/caja/ses/startSES.js:181: * limited character set that SES should process; otherwise, do nothing. How about making this a method hanging off of atLeastFreeVarNames, instead, since the charset is more about atLeastFreeVarNames's self-confidence than anything necessary for SES in general? https://codereview.appspot.com/10205043/diff/3001/src/com/google/caja/ses/sta... src/com/google/caja/ses/startSES.js:934: } catch (e) { Please don't use exceptions for control flow; it will be tedious in debugging. Add the information to the return value. https://codereview.appspot.com/10205043/diff/3001/tests/com/google/caja/plugi... File tests/com/google/caja/plugin/es53-test-cajajs-invocation.js (right): https://codereview.appspot.com/10205043/diff/3001/tests/com/google/caja/plugi... tests/com/google/caja/plugin/es53-test-cajajs-invocation.js:242: if (inES5Mode) Don't do this any more; use jsunitRegisterIf instead. https://codereview.appspot.com/10205043/diff/3001/tests/com/google/caja/plugi... tests/com/google/caja/plugin/es53-test-cajajs-invocation.js:243: jsunitRegister('testUnicodeInCodeFails', function testUnicodeInCodeFails() { I think this test case is more appropriately grouped in es53-test-language-guest.html. It doesn't need to be a separate load since script blocks parse and fail independently, and also since it is ES5-mode only you can just invoke eval instead (it's a reasonable assumption, I would say, that <script> evals go through the visible SES eval operations). https://codereview.appspot.com/10205043/diff/3001/tests/com/google/caja/plugi... tests/com/google/caja/plugin/es53-test-cajajs-invocation.js:246: '<script> var te\ufeffst = 1; x++; </script>' + // should not run I would put the x++ first, just on the off chance the other thing is misinterpreted as something valid but which throws at runtime. https://codereview.appspot.com/10205043/diff/3001/tests/com/google/caja/plugi... tests/com/google/caja/plugin/es53-test-cajajs-invocation.js:250: caja.load(div, xhrUriPolicy, function (frame) { no space before ( https://codereview.appspot.com/10205043/diff/3001/tests/com/google/caja/plugi... tests/com/google/caja/plugin/es53-test-cajajs-invocation.js:257: .run(function(result) { This should be a jsunitCallback. https://codereview.appspot.com/10205043/diff/3001/tests/com/google/caja/plugi... tests/com/google/caja/plugin/es53-test-cajajs-invocation.js:264: if (inES5Mode) ditto on all above comments
Sign in to reply to this message.
Before commenting on the specifics, I have a more general question. Why not do something like limitSrcCharset, encoding all unicode characters, but do it unconditionally, not as a test, but as a parserless source-to-source rewrite prior to other mitigations and calling atLeastFreeVarNames? Perhaps in the mitigateSrcGotchas wrapper function in startSES (as opposed to the lower level one in mitigateGotchas, which is not necessarily available)? If this or something like it could work it would be a lot simpler and cheaper.
Sign in to reply to this message.
Interesting. So to recap: what I want to do is encode Unicode characters *inside string literals* only. The parse tree rendering does this for me for free; I just have to force it to happen when I see Unicode, which is why I use the LIMIT_SRC regex. What would be the alternative?
Sign in to reply to this message.
I know I'm probably asking something obvious, but this whole Unicode area is something I haven't paid much attention to. What I'm asking is, what's the harm in escaping Unicode characters everywhere, whether in string literals or not? Doing so will also escape those in string literals. And this can be done without parsing, and so even when a parser is absent. On Wed, Jun 12, 2013 at 7:42 AM, <ihab.awad@gmail.com> wrote: > Interesting. So to recap: what I want to do is encode Unicode characters > *inside string literals* only. The parse tree rendering does this for me > for free; I just have to force it to happen when I see Unicode, which is > why I use the LIMIT_SRC regex. What would be the alternative? > > > https://codereview.appspot.**com/10205043/<https://codereview.appspot.com/102... > > -- > > ---You received this message because you are subscribed to the Google > Groups "Google Caja Discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to google-caja-discuss+**unsubscribe@googlegroups.com<google-caja-discuss%2Bunsu... > . > For more options, visit https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/o... > . > > > -- Cheers, --MarkM
Sign in to reply to this message.
Ah I understand now. Right. Given input where [U] stands for an embedded Unicode character: var te[U]st = 1; var x = 'te[U]st'; The resulting, indiscriminately escaped code will look something like: var te\uFEFFst = 1; var x = 'te\uFEFFst'; We would then throw a syntax error on the first line. 1. It would be a somewhat whacky error, and hard to trace back to the original issue. 2. We would have to reproduce the Unicode escaping logic somewhere in our code, whereas the parse tree renderer already has it for us. 3. We would have to worry about whether this constitutes "rewriting" and warrants adding our boilerplate comment or not. So to clarify: This CL is really a hack to support a particular jQuery file. The file was originally written with escaped Unicode in string literals. However, it was subsequently minified, and the minifier turned these into literal UTF-8 sequences. We would like to support this file (and others like it) as-is so we can do automated testing of jQuery versions without having to change the jQuery code.... I dunno -- it seems to me that, if SES operates without a parser, it's not unreasonable to say, please give us "clean" code kthxbai! ...
Sign in to reply to this message.
On Wed, Jun 12, 2013 at 8:07 AM, <ihab.awad@gmail.com> wrote: > Ah I understand now. Right. Given input where [U] stands for an embedded > Unicode character: > > var te[U]st = 1; > var x = 'te[U]st'; > > The resulting, indiscriminately escaped code will look something like: > > var te\uFEFFst = 1; > var x = 'te\uFEFFst'; > > We would then throw a syntax error on the first line. > > 1. It would be a somewhat whacky error, and hard to trace back to the > original issue. > > 2. We would have to reproduce the Unicode escaping logic somewhere in > our code, whereas the parse tree renderer already has it for us. > What about the escaping logic that's already in limitSrcCharset ? > > 3. We would have to worry about whether this constitutes "rewriting" and > warrants adding our boilerplate comment or not. > > So to clarify: This CL is really a hack to support a particular jQuery > file. The file was originally written with escaped Unicode in string > literals. However, it was subsequently minified, and the minifier turned > these into literal UTF-8 sequences. We would like to support this file > (and others like it) as-is so we can do automated testing of jQuery > versions without having to change the jQuery code.... > > I dunno -- it seems to me that, if SES operates without a parser, it's > not unreasonable to say, please give us "clean" code kthxbai! ... > > https://codereview.appspot.**com/10205043/<https://codereview.appspot.com/102... > -- Cheers, --MarkM
Sign in to reply to this message.
On 2013/06/12 15:16:48, Mark S. Miller wrote: > What about the escaping logic that's already in limitSrcCharset ? Hm. Ok, so I guess my question is, if we can do that, why did we choose the more conservative approach (of unconditionally rejecting anything with weird whitespace, anywhere in the program) earlier? Was it simply more conservative than necessary? My inclination is to say, yes it was more conservative than necessary, but just wondering.
Sign in to reply to this message.
[+msamuel] Mike? Was this you? On Wed, Jun 12, 2013 at 9:04 AM, <ihab.awad@gmail.com> wrote: > > On 2013/06/12 15:16:48, Mark S. Miller wrote: > >> What about the escaping logic that's already in limitSrcCharset ? >> > > Hm. Ok, so I guess my question is, if we can do that, why did we choose > the more conservative approach (of unconditionally rejecting anything > with weird whitespace, anywhere in the program) earlier? Was it simply > more conservative than necessary? My inclination is to say, yes it was > more conservative than necessary, but just wondering. > > https://codereview.appspot.**com/10205043/<https://codereview.appspot.com/102... > -- Cheers, --MarkM
Sign in to reply to this message.
On 2013/06/12 15:07:54, ihab.awad wrote: > Ah I understand now. Right. Given input where [U] stands for an embedded Unicode > character: > > var te[U]st = 1; > var x = 'te[U]st'; > > The resulting, indiscriminately escaped code will look something like: > > var te\uFEFFst = 1; > var x = 'te\uFEFFst'; > > We would then throw a syntax error on the first line. er, that isn't a syntax error. it's legal and does the right thing.
Sign in to reply to this message.
On 2013/06/12 16:29:11, felix8a wrote:
> er, that isn't a syntax error. it's legal and does the right thing.
True! :) 'Cuz it's in an identifier. So that leaves more questions ...
1. We'd need to do the right thing with atLeastFreeVarNames given an identifier
that has a Unicode escape sequence. We would need to put into the scope object
of our defensive "with () { ... }", not a property name with a backslash-u, but
a property name with an actual Unicode value in it.
2. We'd need to know that a browser's interpreter, which by the comments in the
original code cannot be trusted to do the right thing with "var te[u]st", will
do the right thing with "var te\uFEFFst".
3. We'd need to figure out what happens when Unicode characters are not within
an identifier. In other words, if we get:
var test [U] = 1;
and we rewrite it to:
var test \uFEFF = 1;
then what happens? This is *not* legal and should fail. At the very least, we'd
need to know that it fails safe (in other words, in whatever way is necessary so
that atLeastFreeVarNames and the defensive "with() { ... }" is not defeated).
Sign in to reply to this message.
And actually --
On 2013/06/12 16:35:30, ihab.awad wrote:
> 1. We'd need to do the right thing with atLeastFreeVarNames given an
identifier
> that has a Unicode escape sequence. We would need to put into the scope object
> of our defensive "with () { ... }", not a property name with a backslash-u,
but
> a property name with an actual Unicode value in it.
By that logic alone, atLeastFreeVarNames as currently written would fail unsafe,
because it would create a property name with a backslash-u in it. We'd have to
rewrite atLeastFreeVarNames to actually parse and interpret the backslash-u
sequences in the candidate identifiers it finds, before setting them as
properties on the scope object of the defensive "with() { ... }".
Sign in to reply to this message.
On 2013/06/12 16:37:42, ihab.awad wrote: > By that logic alone, atLeastFreeVarNames as currently written would > fail unsafe ... Based on this realization, I am inclined to leave the basic form of this CL as-is. => erights@ ? <=
Sign in to reply to this message.
On 2013/06/12 18:42:07, ihab.awad wrote: > On 2013/06/12 16:37:42, ihab.awad wrote: > > By that logic alone, atLeastFreeVarNames as currently written would > > fail unsafe ... > > Based on this realization, I am inclined to leave the basic form of this CL > as-is. > > => erights@ ? <= I don't believe we yet have an adequate answer to my question. If we can get effect we need without parsing, that would be much better. But if we're not gonna resolve that soon, I don't wanna block on that. So, LGTM.
Sign in to reply to this message.
Test incoming guest source for embedded Unicode characters. If it does have these, we force a re-rendering from a parse tree, which turns any such characters inside string literals into escaped strings. If any such characters remain, we go ahead and fail as before
Sign in to reply to this message.
https://codereview.appspot.com/10205043/diff/3001/src/com/google/caja/ses/sta... File src/com/google/caja/ses/startSES.js (right): https://codereview.appspot.com/10205043/diff/3001/src/com/google/caja/ses/sta... src/com/google/caja/ses/startSES.js:181: * limited character set that SES should process; otherwise, do nothing. On 2013/06/11 23:48:49, kpreid_google wrote: > How about making this a method hanging off of atLeastFreeVarNames, instead, > since the charset is more about atLeastFreeVarNames's self-confidence than > anything necessary for SES in general? I'd honestly rather not mess with things that way, hanging functions off functions or making the data structure of atLeastFreeVarNames more complicated. Yes this exposes some gory details, but so be it. https://codereview.appspot.com/10205043/diff/3001/src/com/google/caja/ses/sta... src/com/google/caja/ses/startSES.js:934: } catch (e) { On 2013/06/11 23:48:49, kpreid_google wrote: > Please don't use exceptions for control flow; it will be tedious in debugging. > Add the information to the return value. Done. https://codereview.appspot.com/10205043/diff/3001/tests/com/google/caja/plugi... File tests/com/google/caja/plugin/es53-test-cajajs-invocation.js (right): https://codereview.appspot.com/10205043/diff/3001/tests/com/google/caja/plugi... tests/com/google/caja/plugin/es53-test-cajajs-invocation.js:242: if (inES5Mode) On 2013/06/11 23:48:49, kpreid_google wrote: > Don't do this any more; use jsunitRegisterIf instead. Done. https://codereview.appspot.com/10205043/diff/3001/tests/com/google/caja/plugi... tests/com/google/caja/plugin/es53-test-cajajs-invocation.js:243: jsunitRegister('testUnicodeInCodeFails', function testUnicodeInCodeFails() { Hm. Good point but this way, it's in a JS file rather than HTML, so it seems that the encoding issues might be less scary. I'm tending to leave as-is.... https://codereview.appspot.com/10205043/diff/3001/tests/com/google/caja/plugi... tests/com/google/caja/plugin/es53-test-cajajs-invocation.js:246: '<script> var te\ufeffst = 1; x++; </script>' + // should not run On 2013/06/11 23:48:49, kpreid_google wrote: > I would put the x++ first, just on the off chance the other thing is > misinterpreted as something valid but which throws at runtime. Done. https://codereview.appspot.com/10205043/diff/3001/tests/com/google/caja/plugi... tests/com/google/caja/plugin/es53-test-cajajs-invocation.js:250: caja.load(div, xhrUriPolicy, function (frame) { On 2013/06/11 23:48:49, kpreid_google wrote: > no space before ( Done. https://codereview.appspot.com/10205043/diff/3001/tests/com/google/caja/plugi... tests/com/google/caja/plugin/es53-test-cajajs-invocation.js:257: .run(function(result) { On 2013/06/11 23:48:49, kpreid_google wrote: > This should be a jsunitCallback. Done. https://codereview.appspot.com/10205043/diff/3001/tests/com/google/caja/plugi... tests/com/google/caja/plugin/es53-test-cajajs-invocation.js:264: if (inES5Mode) On 2013/06/11 23:48:49, kpreid_google wrote: > ditto on all above comments Done.
Sign in to reply to this message.
https://codereview.appspot.com/10205043/diff/20001/src/com/google/caja/ses/st... File src/com/google/caja/ses/startSES.js (right): https://codereview.appspot.com/10205043/diff/20001/src/com/google/caja/ses/st... src/com/google/caja/ses/startSES.js:934: if (limitSrcCharset(modSrc).error) { This condition feels too fragile: it will quietly fail insecure if we change the format of the return value of limitSrcCharset such that it no longer has a 'error' field and don't update this to match. For example, the alternative without changing anything else would be !('programSrc' in limitSrcCharset(modSrc)), but that doesn't look particularly nice — not sure what to do.
Sign in to reply to this message.
Test incoming guest source for embedded Unicode characters. If it does have these, we force a re-rendering from a parse tree, which turns any such characters inside string literals into escaped strings. If any such characters remain, we go ahead and fail as before
Sign in to reply to this message.
https://codereview.appspot.com/10205043/diff/20001/src/com/google/caja/ses/st... File src/com/google/caja/ses/startSES.js (right): https://codereview.appspot.com/10205043/diff/20001/src/com/google/caja/ses/st... src/com/google/caja/ses/startSES.js:934: if (limitSrcCharset(modSrc).error) { On 2013/07/01 19:14:49, kpreid_google wrote: > This condition feels too fragile: it will quietly fail insecure if we change the > format of the return value of limitSrcCharset such that it no longer has a > 'error' field and don't update this to match. > > For example, the alternative without changing anything else would be > !('programSrc' in limitSrcCharset(modSrc)), but that doesn't look particularly > nice — not sure what to do. Done.
Sign in to reply to this message.
LGTM https://codereview.appspot.com/10205043/diff/37001/src/com/google/caja/ses/st... File src/com/google/caja/ses/startSES.js (right): https://codereview.appspot.com/10205043/diff/37001/src/com/google/caja/ses/st... src/com/google/caja/ses/startSES.js:198: * feature parity with the ES5/3 runtime at the price of larger Not part of this CL but this comment caught my eye. We should no longer be thinking of ES5/3 as the source of our chosen feature set.
Sign in to reply to this message.
Test incoming guest source for embedded Unicode characters. If it does have these, we force a re-rendering from a parse tree, which turns any such characters inside string literals into escaped strings. If any such characters remain, we go ahead and fail as before
Sign in to reply to this message.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
