Returns an expression that is structurally the same as e, but with
identifiers renamed using the alpha renaming.
If the input expression contains synthetic references or declarations, then
those synthetic identifiers will not be rewritten.
So after alpha renaming, matching references and declarations with
non-synthetic identifiers is trivial.
This renaming does not affect property names, so any relationship between
a property name and a global variable will be broken. For this reason,
all free variables must be specified in the input context or
freeSynthetics set.
Submitted @3729
I have not looked in detail at the rules in AlphaRenamingRewriter or the tests in ...
16 years, 6 months ago
(2009-09-10 04:11:19 UTC)
#2
I have not looked in detail at the rules in AlphaRenamingRewriter or the tests
in AlphaRenamingTest.
Will do these next. Meanwhile, here are the remarks I have so far.
http://codereview.appspot.com/110096/diff/3023/3028
File src/com/google/caja/parser/quasiliteral/AlphaRenaming.java (right):
http://codereview.appspot.com/110096/diff/3023/3028#newcode48
Line 48: * that has no masking identifiers, and that only has free variables
within an
What is a "masking identifier"?
http://codereview.appspot.com/110096/diff/3023/3028#newcode65
Line 65: * non-synthetic identifiers is trivial.
You seem to be describing some process but I don't get the flow. Perhaps a small
worked example would help? Also see my questions below about the API -- perhaps
an example here would clarify?
http://codereview.appspot.com/110096/diff/3023/3028#newcode75
Line 75: * used to generate new names for local variables.
Is this description correct? If so, I don't understand it.
http://codereview.appspot.com/110096/diff/3023/3028#newcode81
Line 81: Expression e, NameContext<String, ?> ctxt, Set<String> freeSynthetics,
First of all, I would have expected "ctxt" and "freeSynthetics" to be named,
respectively, something like "renamableEntryPoints" and
"nonRenamableEntryPoints" -- since that seems to be the distinction of interest
for the purposes of this class. If that would not be a useful nomenclature, then
I misunderstand the use case; perhaps that should be clarified in the class or
method docs.
That said, and more broadly, I don't understand the need for the "ctxt"
argument. Perhaps other code (yet to be submitted for review) clarifies this,
but it would be nice if this class could explain itself all by its lonesome. As
I see it, we have some chunk of code like the following, where the stuff in
double brackets is subject to renaming:
var a = 1, b;
b = [[ (function(theArg) { return theArg + 1; })(a); ]]
where I expect "a" and "b" are not renameable, since they are provided by the
stuff in which the [[ ... ]] expression is embedded, but "theArg" can be renamed
to something.
If so, then all we need for the API of this class here is just "freeSynthetics"
(aka "nonRenamableEntryPoints"), right? Or not? If not, why?
And finally, if the variables in the embedding ("a" and "b" in my example) are
themselves being alpha renamed, then why does this class not operate on
something bigger than just the teeny little Expression? Why not save all the
alpha renaming until you have a whole chunk (like, a Block or a Program or what
not) and run the renaming on that? This would presumably ensure that everything
got renamed in one go and all you'd need would be "freeSynthetics" (aka
"nonRenamableEntryPoints").
http://codereview.appspot.com/110096/diff/3023/3028#newcode103
Line 103: // If the input NameContext contains (foo => a, bar => b) then the
program
This creates in my mind more confusion about the input NameContext; see remarks
above.
http://codereview.appspot.com/110096/diff/3023/3028#newcode138
Line 138: if (!outers.contains(name) && isOuter(name, s)) {
Don't need "!outers.contains(name)" since "outers" is a Set.
http://codereview.appspot.com/110096/diff/3023/3028#newcode164
Line 164: final class AlphaRenamingRewriter extends Rewriter {
It would be nice if this were in a separate file....
http://codereview.appspot.com/110096/diff/3023/3027
File src/com/google/caja/parser/quasiliteral/NameContext.java (right):
http://codereview.appspot.com/110096/diff/3023/3027#newcode32
Line 32: * scoped, so we alpha-rename all variables to prevent collisions.
"... we alpha-rename all variables ..." seems overly general; do you mean only
the variables introduced by the IHTML?
http://codereview.appspot.com/110096/diff/3023/3027#newcode34
Line 34: * @param <NAME> a type that can work as a hashtable key
Also describe <BINDING> param.
http://codereview.appspot.com/110096/diff/3023/3027#newcode82
Line 82: public NameContext<NAME, BINDING> subScope() {
Rename this method to "newChildContext" or "makeChildContext"? Especially to
make it complementary to "getParentContext".
http://codereview.appspot.com/110096/diff/3023/3031
File src/com/google/caja/util/SafeIdentifierMaker.java (right):
http://codereview.appspot.com/110096/diff/3023/3031#newcode25
Line 25: * {@link SafeIdentifierMaker#isSafeIdentifier unsafe}
The method "isSafeIdentifier" and the doc comment "unsafe" have reverse Boolean
sense. :)
http://codereview.appspot.com/110096/diff/3023/3031#newcode26
Line 26: * identifiers.
One idea might be to describe here why generating a lexicographic sequence from
a custom alphabet is better than simply doing (say) "v0", "v1", "v2", ...,
"v3941", .... I assume the reason is that, with a larger alphabet (54 chars, as
opposed to 10 for decimal and 16 for hex), you can have smaller, "denser" names.
If so, a comment about this might help the reader get oriented as to the purpose
for this class.
http://codereview.appspot.com/110096/diff/3023/3031#newcode38
Line 38: this(charRanges('a', 'z', 'A', 'Z'));
Given that a larger alphabet means "denser" and shorter identifers, it behooves
one to choose the largest alphabet possible. If you make "isSafeIdentifier"
reject ones starting with a digit, you can use '0'..'9' as well, boosting your
|alphabet| from 54 to 64. Perhaps a few other ASCII characters could be
similarly pressed into service?
http://codereview.appspot.com/110096/diff/3023/3031#newcode58
Line 58: if (isSafeIdentifier(word)) { return word; }
This algorithm of "generate guff and throw away the bad eggs" is very good. :)
http://codereview.appspot.com/110096/diff/3023/3031#newcode70
Line 70: private static char[] charRanges(char... startsAndEnds) {
"assert((startsAndEnds %2) == 0)"? For documentation more than anything else....
http://codereview.appspot.com/110096/diff/3023/3024
File tests/com/google/caja/parser/quasiliteral/AlphaRenamingTest.java (right):
http://codereview.appspot.com/110096/diff/3023/3024#newcode303
Line 303: private void assertRenamed(String golden, String input, String...
globals)
Based on the rest of our tests, I'm used to the input coming first, then the
expected output. It was hard to read the tests until I reoriented myself.
snapshotted http://codereview.appspot.com/110096/diff/3023/3028 File src/com/google/caja/parser/quasiliteral/AlphaRenaming.java (right): http://codereview.appspot.com/110096/diff/3023/3028#newcode48 Line 48: * that has no masking identifiers, and ...
16 years, 6 months ago
(2009-09-10 19:14:44 UTC)
#3
snapshotted
http://codereview.appspot.com/110096/diff/3023/3028
File src/com/google/caja/parser/quasiliteral/AlphaRenaming.java (right):
http://codereview.appspot.com/110096/diff/3023/3028#newcode48
Line 48: * that has no masking identifiers, and that only has free variables
within an
On 2009/09/10 04:11:19, ihab.awad wrote:
> What is a "masking identifier"?
An identifier I' masks another identifier I if I' has the same name as I, and I'
is defined in a scope that is wholly contained by the scope in which I is
defined.
So the inner x masks the outer x.
var x = 4;
return (function (x) {
return x;
})(3);
After alpha renaming, this might become
var a = 4;
return (function (b) {
return b;
})(3);
http://codereview.appspot.com/110096/diff/3023/3028#newcode65
Line 65: * non-synthetic identifiers is trivial.
On 2009/09/10 04:11:19, ihab.awad wrote:
> You seem to be describing some process but I don't get the flow. Perhaps a
small
> worked example would help? Also see my questions below about the API --
perhaps
> an example here would clarify?
See the unittests for examples, and examples at
http://en.wikipedia.org/wiki/AlphaRenaminghttp://codereview.appspot.com/110096/diff/3023/3028#newcode75
Line 75: * used to generate new names for local variables.
On 2009/09/10 04:11:19, ihab.awad wrote:
> Is this description correct? If so, I don't understand it.
It is correct but unclear. I rewrote it.
http://codereview.appspot.com/110096/diff/3023/3028#newcode81
Line 81: Expression e, NameContext<String, ?> ctxt, Set<String> freeSynthetics,
On 2009/09/10 04:11:19, ihab.awad wrote:
> First of all, I would have expected "ctxt" and "freeSynthetics" to be named,
> respectively, something like "renamableEntryPoints" and
> "nonRenamableEntryPoints" -- since that seems to be the distinction of
interest
Those names sound fine, except that they're identifiers, not points.
> for the purposes of this class. If that would not be a useful nomenclature,
then
> I misunderstand the use case; perhaps that should be clarified in the class or
> method docs.
>
> That said, and more broadly, I don't understand the need for the "ctxt"
> argument. Perhaps other code (yet to be submitted for review) clarifies this,
I hope my rewrite of @param ctxt clarifies that.
> but it would be nice if this class could explain itself all by its lonesome.
As
> I see it, we have some chunk of code like the following, where the stuff in
> double brackets is subject to renaming:
>
> var a = 1, b;
> b = [[ (function(theArg) { return theArg + 1; })(a); ]]
>
> where I expect "a" and "b" are not renameable, since they are provided by the
> stuff in which the [[ ... ]] expression is embedded, but "theArg" can be
renamed
> to something.
So it's similar to that but a little more complex. I added an example at the
top of the file to clarify.
> If so, then all we need for the API of this class here is just
"freeSynthetics"
> (aka "nonRenamableEntryPoints"), right? Or not? If not, why?
No, because ctxt provides a mapping of old names to new, not just a set of
allowed names.
> And finally, if the variables in the embedding ("a" and "b" in my example) are
> themselves being alpha renamed, then why does this class not operate on
> something bigger than just the teeny little Expression? Why not save all the
> alpha renaming until you have a whole chunk (like, a Block or a Program or
what
> not) and run the renaming on that? This would presumably ensure that
everything
> got renamed in one go and all you'd need would be "freeSynthetics" (aka
> "nonRenamableEntryPoints").
Because when generating a program by building structure around user supplied
code, you have to be able to declare identifiers in the structure code and have
some guarantees that the user-defined snippets of code can't muck with those
identifiers you declare. If you generate all the code and then do the renaming,
it's simply too late.
http://codereview.appspot.com/110096/diff/3023/3028#newcode138
Line 138: if (!outers.contains(name) && isOuter(name, s)) {
On 2009/09/10 04:11:19, ihab.awad wrote:
> Don't need "!outers.contains(name)" since "outers" is a Set.
Done.
http://codereview.appspot.com/110096/diff/3023/3028#newcode164
Line 164: final class AlphaRenamingRewriter extends Rewriter {
On 2009/09/10 04:11:19, ihab.awad wrote:
> It would be nice if this were in a separate file....
Done.
http://codereview.appspot.com/110096/diff/3023/3027
File src/com/google/caja/parser/quasiliteral/NameContext.java (right):
http://codereview.appspot.com/110096/diff/3023/3027#newcode32
Line 32: * scoped, so we alpha-rename all variables to prevent collisions.
On 2009/09/10 04:11:19, ihab.awad wrote:
> "... we alpha-rename all variables ..." seems overly general; do you mean only
> the variables introduced by the IHTML?
Nope. We rename them all.
http://codereview.appspot.com/110096/diff/3023/3027#newcode34
Line 34: * @param <NAME> a type that can work as a hashtable key
On 2009/09/10 04:11:19, ihab.awad wrote:
> Also describe <BINDING> param.
Done.
http://codereview.appspot.com/110096/diff/3023/3027#newcode82
Line 82: public NameContext<NAME, BINDING> subScope() {
On 2009/09/10 04:11:19, ihab.awad wrote:
> Rename this method to "newChildContext" or "makeChildContext"? Especially to
> make it complementary to "getParentContext".
Done.
http://codereview.appspot.com/110096/diff/3023/3031
File src/com/google/caja/util/SafeIdentifierMaker.java (right):
http://codereview.appspot.com/110096/diff/3023/3031#newcode38
Line 38: this(charRanges('a', 'z', 'A', 'Z'));
On 2009/09/10 04:11:19, ihab.awad wrote:
> Given that a larger alphabet means "denser" and shorter identifers, it
behooves
> one to choose the largest alphabet possible. If you make "isSafeIdentifier"
> reject ones starting with a digit, you can use '0'..'9' as well, boosting your
> |alphabet| from 54 to 64. Perhaps a few other ASCII characters could be
> similarly pressed into service?
Sure. I want to make sure we don't output any identifiers that end with "__",
so maybe we should keep '_' out of the list.
The problem with skipping 0-9 for first character positions and not subsequent
is that we generate a string and reverse it so that we get a nice lexicographic
progression. That means that we don't know which letter is the last until we
generate it. There's probably an algorithm that does that but it's not a simple
variation on this one.
http://codereview.appspot.com/110096/diff/3023/3031#newcode70
Line 70: private static char[] charRanges(char... startsAndEnds) {
On 2009/09/10 04:11:19, ihab.awad wrote:
> "assert((startsAndEnds %2) == 0)"? For documentation more than anything
else....
Done and documented the method.
http://codereview.appspot.com/110096/diff/3023/3024
File tests/com/google/caja/parser/quasiliteral/AlphaRenamingTest.java (right):
http://codereview.appspot.com/110096/diff/3023/3024#newcode303
Line 303: private void assertRenamed(String golden, String input, String...
globals)
On 2009/09/10 04:11:19, ihab.awad wrote:
> Based on the rest of our tests, I'm used to the input coming first, then the
> expected output. It was hard to read the tests until I reoriented myself.
The signature for assertEquals is
void assertEquals(Object expected, Object actual)
and anybody reversing the arguments is wrong.
See
http://junit.sourceforge.net/javadoc/org/junit/Assert.html#assertEquals(java....
LGTM http://codereview.appspot.com/110096/diff/2030/3041 File src/com/google/caja/parser/quasiliteral/AlphaRenaming.java (right): http://codereview.appspot.com/110096/diff/2030/3041#newcode62 Line 62: * not to conflict with generated code ...
16 years, 6 months ago
(2009-09-14 20:13:18 UTC)
#4
http://codereview.appspot.com/110096/diff/2030/3041 File src/com/google/caja/parser/quasiliteral/AlphaRenaming.java (right): http://codereview.appspot.com/110096/diff/2030/3041#newcode62 Line 62: * not to conflict with generated code names. ...
16 years, 6 months ago
(2009-09-15 15:10:57 UTC)
#5
http://codereview.appspot.com/110096/diff/2030/3041
File src/com/google/caja/parser/quasiliteral/AlphaRenaming.java (right):
http://codereview.appspot.com/110096/diff/2030/3041#newcode62
Line 62: * not to conflict with generated code names.
On 2009/09/14 20:13:18, ihab.awad wrote:
> Perhaps explain why this class takes an Expression and not something else? Can
> just cut & paste from your review comments to me? ;)
Done.
http://codereview.appspot.com/110096/diff/2030/3041#newcode132
Line 132: Expression e, NameContext<String, ?> ctxt, Set<String> freeSynthetics,
On 2009/09/14 20:13:18, ihab.awad wrote:
> So you don't think it's a good idea to rename "ctxt" and "freeSynthetics"
> (whatever we end up calling them)?
Done.
http://codereview.appspot.com/110096/diff/2030/3040
File src/com/google/caja/parser/quasiliteral/AlphaRenamingRewriter.java (right):
http://codereview.appspot.com/110096/diff/2030/3040#newcode60
Line 60: matches="/* Reference */ @r",
On 2009/09/14 20:13:18, ihab.awad wrote:
> /* Expression */ ??
Done.
http://codereview.appspot.com/110096/diff/2030/3036
File tests/com/google/caja/parser/quasiliteral/AlphaRenamingTest.java (right):
http://codereview.appspot.com/110096/diff/2030/3036#newcode63
Line 63: public final void testPropertNames() throws Exception {
On 2009/09/14 20:13:18, ihab.awad wrote:
> testPropert*y*Names
Done.
http://codereview.appspot.com/110096/diff/2030/3036#newcode104
Line 104: );
On 2009/09/14 20:13:18, ihab.awad wrote:
> Move closing paren 1 line up?
Done.
http://codereview.appspot.com/110096/diff/2030/3036#newcode113
Line 113: + " var c = b;"
On 2009/09/14 20:13:18, ihab.awad wrote:
> Fwiw, the reason for this rewriting -- the addition of the function name as a
> "var" -- was not clear to me from reading the code. I suspect it has to do
with
> trying to regularize the underlying semantics of function name scoping in JS
...
> but how precisely? By capturing the reference to the enclosing function even
if
> that symbol is assigned to later on in the containing scope?
Exactly.
I'm trying to recognize that the name used to refer to the function inside it's
body is effectively a different declaration than the one introduced in its
parent scope.
Since they're different they have exactly the behavior you describe.
http://codereview.appspot.com/110096/diff/2030/3036#newcode178
Line 178: + " function c() {"
I think you identified a bug. The declaration
var c = b
should not be there since it clobbers the hoisted local declaration function
c().
In squarefree,
function f() { var f; return f; }
'undefined' === typeof f()
On 2009/09/14 20:13:18, ihab.awad wrote:
> What is the rationale for not coming up with *yet* another variable here, so
> that it would be --
>
> (function () {
> function b() {
> var c = b;
> function d() {
> var e = d;
> return e, a;
> }
> }
> return b;
> })
>
> ?
http://codereview.appspot.com/110096/diff/2030/3036#newcode233
Line 233: + " function c(e) { var d = c; return e * e; }"
On 2009/09/14 20:13:18, ihab.awad wrote:
> We should have a TODO for optimizing out the function name assignment "var" if
> that name is unused. It would be too messy to do now but maybe, in a future
> society, we would have a scope representation that would allow that to be done
> simply.
Done.
Addressed in CL 116114 since this was committed. http://codereview.appspot.com/110096/diff/4003/4006 File src/com/google/caja/lexer/Keyword.java (right): http://codereview.appspot.com/110096/diff/4003/4006#newcode118 Line 118: ...
16 years, 6 months ago
(2009-09-15 19:00:22 UTC)
#7
http://codereview.appspot.com/110096/diff/4003/4012 File src/com/google/caja/util/SafeIdentifierMaker.java (right): http://codereview.appspot.com/110096/diff/4003/4012#newcode64 Line 64: public boolean isSafeIdentifier(String ident) { David-Sarah responded: > ...
16 years, 6 months ago
(2009-09-16 02:47:04 UTC)
#8
http://codereview.appspot.com/110096/diff/4003/4012
File src/com/google/caja/util/SafeIdentifierMaker.java (right):
http://codereview.appspot.com/110096/diff/4003/4012#newcode64
Line 64: public boolean isSafeIdentifier(String ident) {
David-Sarah responded:
> Right, but I meant that there is a maintenance and security review
> issue with having these checks in multiple places.
Ok. So you're arguing for code consolidation. I'll poke around and see if I
can find a way to consolidate, location-wise at least, the various identifier
predicates.
Issue 110096: An Alpha Renamer that makes it safe to embed user code in generated code.
(Closed)
Created 16 years, 7 months ago by MikeSamuel
Modified 16 years, 6 months ago
Reviewers: ihab.awad, DavidSarah
Base URL: http://google-caja.googlecode.com/svn/trunk/
Comments: 65