DescriptionThe JS lexer elides backslash-newline sequences at an early stage,
before tokenizing. This seems to be fantasy. It's not in Ecmascript,
and I can't find any JS implementation that treats backslash-newline
as a continuation.
That behavior causes unexpected effects when a // comment ends with
a backslash, like in escodegen.js as described in issue 1868.
So, this CL eliminates the weirdness.
InputElementSplitter is where the backslash-newline elision happens.
Deleting that is straightforward.
Ecmascript does say that backslash-newline in strings gets elided,
so the rest of this CL is about supporting that.
Our JS lexical tokens hold the original source code's char sequence,
so at the lexer level nothing needs to be done with the backslash-newline,
except for fixing up the lexer tests to match reality.
At the JS parser level, StringLiteral nodes have the logic to convert
the source code text to an actual value. Everyone defers to that,
and adding handling of backslash-newline there is straightforward.
Ecmascript is strict about "use strict" directives. Escape sequences
and backslash-newline are not allowed in the directive. Our existing
logic handles that fine. I just added some testcases to verify that.
Stray backslashes in JS will become a WORD token or part of a WORD
token, which is consistent with how we handle \u escapes. These will
be rejected at the parser level. I don't see any particular reason
to reject backslashes in the lexer level, so I left that alone.
Patch Set 1 #MessagesTotal messages: 3
|