Add code to the HTML parser to fix entities that are missing semicolons.
Now, <p>1 < 2</p> will be treated as equivalent to <p>1 < </p>.
Submitted @4246
Nice change :) http://codereview.appspot.com/1978041/diff/1/6 File src/com/google/caja/parser/html/Html5ElementStack.java (right): http://codereview.appspot.com/1978041/diff/1/6#newcode528 src/com/google/caja/parser/html/Html5ElementStack.java:528: + ")(?=$|[^;0-9A-Za-z])" The not followed needs ...
15 years, 10 months ago
(2010-08-12 05:03:00 UTC)
#2
Nice change :)
http://codereview.appspot.com/1978041/diff/1/6
File src/com/google/caja/parser/html/Html5ElementStack.java (right):
http://codereview.appspot.com/1978041/diff/1/6#newcode528
src/com/google/caja/parser/html/Html5ElementStack.java:528: +
")(?=$|[^;0-9A-Za-z])"
The not followed needs to be different for Ā and d
For d, the negative lookahead should be "(?![;0-9])" so that da is
treated as da (conformant with what webkit does)
For Ā, the negative lookahead should be "(?![;0-9a-fA-F])".
http://codereview.appspot.com/1978041/diff/1/2
File tests/com/google/caja/parser/html/DomParserTest.java (right):
http://codereview.appspot.com/1978041/diff/1/2#newcode1251
tests/com/google/caja/parser/html/DomParserTest.java:1251: public final void
testEntities() throws Exception {
You should also add a test for CDATA entities not getting modified.
http://codereview.appspot.com/1978041/diff/1/6 File src/com/google/caja/parser/html/Html5ElementStack.java (right): http://codereview.appspot.com/1978041/diff/1/6#newcode528 src/com/google/caja/parser/html/Html5ElementStack.java:528: + ")(?=$|[^;0-9A-Za-z])" On 2010/08/12 05:03:00, gagan.goku wrote: > The ...
15 years, 10 months ago
(2010-08-12 19:02:51 UTC)
#3
http://codereview.appspot.com/1978041/diff/1/6
File src/com/google/caja/parser/html/Html5ElementStack.java (right):
http://codereview.appspot.com/1978041/diff/1/6#newcode528
src/com/google/caja/parser/html/Html5ElementStack.java:528: +
")(?=$|[^;0-9A-Za-z])"
On 2010/08/12 05:03:00, gagan.goku wrote:
> The not followed needs to be different for Ā and d
>
> For d, the negative lookahead should be "(?![;0-9])" so that da is
> treated as da (conformant with what webkit does)
> For Ā, the negative lookahead should be "(?![;0-9a-fA-F])".
Done.
http://codereview.appspot.com/1978041/diff/1/2
File tests/com/google/caja/parser/html/DomParserTest.java (right):
http://codereview.appspot.com/1978041/diff/1/2#newcode1251
tests/com/google/caja/parser/html/DomParserTest.java:1251: public final void
testEntities() throws Exception {
On 2010/08/12 05:03:00, gagan.goku wrote:
> You should also add a test for CDATA entities not getting modified.
Done.
lgtm. Nice way of testing that entity fixup will not happen inside script tag because ...
15 years, 10 months ago
(2010-08-13 00:24:49 UTC)
#4
lgtm.
Nice way of testing that entity fixup will not happen inside script tag because
its contents are treated as CDATA.
Is my understand correct that cdata is skipped because in
html5elementStack.java, we do entityFixup only for tokens of type
"HtmlTokenType.TEXT" ?
http://codereview.appspot.com/1978041/diff/7001/2007
File src/com/google/caja/parser/html/Html5ElementStack.java (right):
http://codereview.appspot.com/1978041/diff/7001/2007#newcode523
src/com/google/caja/parser/html/Html5ElementStack.java:523: +
"[0-9]{1,7}(?=$|[^;0-9])"
Isnt "(?![;0-9])" cleaner ?
On 2010/08/13 00:24:49, gagan.goku wrote: > lgtm. > > Nice way of testing that entity ...
15 years, 10 months ago
(2010-08-13 01:25:54 UTC)
#5
On 2010/08/13 00:24:49, gagan.goku wrote:
> lgtm.
>
> Nice way of testing that entity fixup will not happen inside script tag
because
> its contents are treated as CDATA.
> Is my understand correct that cdata is skipped because in
> html5elementStack.java, we do entityFixup only for tokens of type
> "HtmlTokenType.TEXT" ?
Yes.
> http://codereview.appspot.com/1978041/diff/7001/2007
> File src/com/google/caja/parser/html/Html5ElementStack.java (right):
>
> http://codereview.appspot.com/1978041/diff/7001/2007#newcode523
> src/com/google/caja/parser/html/Html5ElementStack.java:523: +
> "[0-9]{1,7}(?=$|[^;0-9])"
> Isnt "(?![;0-9])" cleaner ?
Done.
Issue 1978041: Fix entity name handling in HTML.
(Closed)
Created 15 years, 10 months ago by MikeSamuel
Modified 15 years, 10 months ago
Reviewers: Jasvir, gagan.goku
Base URL: http://google-caja.googlecode.com/svn/trunk/
Comments: 8