Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(282)

Issue 165082: Replace htmlparser 1.0.7 with htmlparser 1.2.1 (Closed)

Can't Edit
Can't Publish+Mail
Start Review
Created:
16 years, 3 months ago by MikeSamuel
Modified:
16 years, 3 months ago
Reviewers:
johnfargo, Jasvir, fargo
CC:
google-caja-discuss_googlegroups.com
Base URL:
http://google-caja.googlecode.com/svn/trunk/
Visibility:
Public.

Description

This modifies htmlparser1.2.1 to expose ElementName and AttributeName. This means that we're parsing an up-to-date version of HTML5, but the new parser delays creating text nodes, which leads to bad file positions. This should be fixable in later CLs, but is a defect of the current implementation. Submitted @3914

Patch Set 1 #

Patch Set 2 : Replace htmlparser 1.0.7 with htmlparser 1.2.1 #

Total comments: 21

Patch Set 3 : Replace htmlparser 1.0.7 with htmlparser 1.2.1 #

Unified diffs Side-by-side diffs Delta from patch set Stats (+569 lines, -291 lines) Patch
M build.xml View 1 2 8 chunks +24 lines, -12 lines 0 comments Download
M src/com/google/caja/lexer/FilePosition.java View 1 2 1 chunk +1 line, -1 line 0 comments Download
M src/com/google/caja/parser/html/CajaTreeBuilder.java View 1 2 15 chunks +193 lines, -107 lines 0 comments Download
M src/com/google/caja/parser/html/DomParser.java View 1 2 3 chunks +6 lines, -6 lines 0 comments Download
M src/com/google/caja/parser/html/Html5ElementStack.java View 1 2 9 chunks +145 lines, -62 lines 0 comments Download
M src/com/google/caja/parser/html/OpenElementStack.java View 1 2 1 chunk +1 line, -1 line 0 comments Download
M tests/com/google/caja/parser/html/DomParserTest.java View 1 2 36 chunks +113 lines, -100 lines 0 comments Download
D third_party/java/htmlparser/htmlparser.jar View Binary file 0 comments Download
M third_party/java/htmlparser/src/nu/validator/htmlparser/impl/AttributeName.java View 1 2 2 chunks +52 lines, -0 lines 0 comments Download
M third_party/java/htmlparser/src/nu/validator/htmlparser/impl/ElementName.java View 1 2 2 chunks +32 lines, -0 lines 0 comments Download
M third_party/java/htmlparser/src/nu/validator/htmlparser/impl/HtmlAttributes.java View 1 2 1 chunk +1 line, -1 line 0 comments Download
M tools/myvn View 1 chunk +1 line, -1 line 0 comments Download

Messages

Total messages: 7
MikeSamuel
16 years, 3 months ago (2009-12-07 23:47:21 UTC) #1
Jasvir
LGTM http://codereview.appspot.com/165082/diff/1014/21 File third_party/java/htmlparser/src/nu/validator/htmlparser/impl/AttributeName.java (right): http://codereview.appspot.com/165082/diff/1014/21#newcode269 third_party/java/htmlparser/src/nu/validator/htmlparser/impl/AttributeName.java:269: private static int stringToHash(String s) { Is speed ...
16 years, 3 months ago (2009-12-11 02:54:20 UTC) #2
johnfargo
Not the most thorough review, since I'm no expert on the underlying impl, but I ...
16 years, 3 months ago (2009-12-11 03:27:35 UTC) #3
MikeSamuel
http://codereview.appspot.com/165082/diff/1014/17 File src/com/google/caja/parser/html/CajaTreeBuilder.java (right): http://codereview.appspot.com/165082/diff/1014/17#newcode101 src/com/google/caja/parser/html/CajaTreeBuilder.java:101: && htmlLocalName.equals(child.getLocalName())) { On 2009/12/11 03:27:35, johnfargo wrote: > ...
16 years, 3 months ago (2009-12-11 04:35:19 UTC) #4
johnfargo
LGTM Thanks for the quick response. On Thu, Dec 10, 2009 at 8:35 PM, <mikesamuel@gmail.com> ...
16 years, 3 months ago (2009-12-11 04:51:22 UTC) #5
MikeSamuel
2009/12/10 <johnfargo@gmail.com>: > Not the most thorough review, since I'm no expert on the underlying ...
16 years, 3 months ago (2009-12-11 04:59:01 UTC) #6
johnfargo
16 years, 3 months ago (2009-12-11 05:27:57 UTC) #7
Great thx - I wasn't necessarily going one place or another with that line
of discussion, just checking for my own benefit (though tinted by the fact
that I've only really been deep into the patch-up code). The direction
you're suggesting sounds great; feel free to loop me into any discussions if
you find value in doing so.

On Thu, Dec 10, 2009 at 8:59 PM, Mike Samuel <mikesamuel@gmail.com> wrote:

> 2009/12/10  <johnfargo@gmail.com>:
> > Not the most thorough review, since I'm no expert on the underlying
> > impl, but I did have a few questions on APIs used.
> >
> > Speaking of the underlying lib, what would you say is the main value we
> > get out of it right now? Do you use the HTML5-validation pieces, or
> > mostly just the SAX-ish callback pattern constructing the DOM? Just
> > wondering what the ongoing ratio is between <lines of code patching up
> > lib oddity> / <lines of code used by lib> ;)
>
> The underlying lib gets right all the adoption agency algorithm stuff
> and the format element stack.  This is not a small portion of the
> HTML5 parsing spec.  And by using this library, we can guarantee that
> we are compatible with mozilla, which I think is moving to using this
> for HTML/SVG/MathML parsing.
>
> I'm going to start a discussion about whether we can get a treebuilder
> that publishes events like
>    nodeProduced(Node n, int nodeStart, int nodeEnd)
> so that we can get everything we want without wrapping -- once it
> supports full embedded XML, like <os:*> tags.
>
>
> > http://codereview.appspot.com/165082/diff/1014/17
> > File src/com/google/caja/parser/html/CajaTreeBuilder.java (right):
> >
> > http://codereview.appspot.com/165082/diff/1014/17#newcode101
> > src/com/google/caja/parser/html/CajaTreeBuilder.java:101: &&
> > htmlLocalName.equals(child.getLocalName())) {
> > important to test case-insensitively?
> >
> > http://codereview.appspot.com/165082/diff/1014/17#newcode254
> > src/com/google/caja/parser/html/CajaTreeBuilder.java:254: pos = null;
> > unnecessary else-block
> >
> > http://codereview.appspot.com/165082/diff/1014/17#newcode261
> > src/com/google/caja/parser/html/CajaTreeBuilder.java:261: if
> > (el.hasAttributeNS(ns, localName)) { continue; }
> > when would this happen?
> >
> > http://codereview.appspot.com/165082/diff/1014/17#newcode265
> > src/com/google/caja/parser/html/CajaTreeBuilder.java:265: if (pos !=
> > null) {
> > && needsDebugData, presumably
> >
> > http://codereview.appspot.com/165082/diff/1014/17#newcode430
> > src/com/google/caja/parser/html/CajaTreeBuilder.java:430: if
> > (startPos.source().equals(InputSource.UNKNOWN)) {
> > null check needed?
> >
> > http://codereview.appspot.com/165082/diff/1014/17#newcode475
> > src/com/google/caja/parser/html/CajaTreeBuilder.java:475: if
> > ("body".equals(unpopped.getTagName())
> > ignore case?
> >
> > http://codereview.appspot.com/165082/diff/1014/17#newcode487
> > src/com/google/caja/parser/html/CajaTreeBuilder.java:487: &&
> > HTML_NAMESPACE.equals(unpopped.getNamespaceURI())) {
> > ignore case?
> >
> > http://codereview.appspot.com/165082/diff/1014/18
> > File src/com/google/caja/parser/html/Html5ElementStack.java (right):
> >
> > http://codereview.appspot.com/165082/diff/1014/18#newcode92
> > src/com/google/caja/parser/html/Html5ElementStack.java:92:
> > builder.setFragmentContext(null);
> > FMI, what's this do?
> >
> > http://codereview.appspot.com/165082/diff/1014/18#newcode411
> > src/com/google/caja/parser/html/Html5ElementStack.java:411:
> > builder.headClosed();
> > also fmi, what do these calls do? stuff like this always makes me
> > nervous since I suspect random elements are being synthesized and
> > reordered :)
> >
> > http://codereview.appspot.com/165082/diff/1014/16
> > File src/com/google/caja/parser/html/OpenElementStack.java (right):
> >
> > http://codereview.appspot.com/165082/diff/1014/16#newcode76
> > src/com/google/caja/parser/html/OpenElementStack.java:76:
> > sorry about these...
> >
> > http://codereview.appspot.com/165082
> >
>
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b