|
Summary:
* Overhauls the GadgetHtmlParser base class and associated test cases
* Tweaks the Neko-based HTML parser implementation
* Introduces new Caja-based HTML parser
This fairly substantial CL reworks the HTML parsing system to better represent (though not fully yet) the way that HTML is handled within gadgets: as tag soup, cleaned up via custom rules after the fact into a legitimate, well-formed document. It's a step toward treating concrete GadgetHtmlParser implementations purely as fragment parsers.
Change detail:
* All parsing tests factored into base test classes with concrete tests largely just providing a concrete parser implementation.
- HTML-equivalence method added utilizing the (fantastic) diff_match_patch library, which ignores whitespace, case, and attributing-encoding differences.
* GadgetHtmlParser now does significant cleanup of the DOM it retrieves from parseDomImpl(...), which BTW will soon go the way of the dodo in favor of always using parseFragmentImpl(...)
- Creates head element and populates it with all style elements (only), as putting these here cannot break rendering and because HTML requires <style> in head.
- Creates body element as well.
- Combines multiple <head> elements together, if present.
- Prepends head with elements that occurred above a <head> element that occurred in source, if any.
- Combines multiple <body> elements together, if present.
- Prepends and appends, respectively, elements found before and after the first <body> tag and after the first <head> tag, and elements found after the first <body> tag, without any <head> or <body> parent, to the <body> tag (that was a mouthful).
- As noted above, stuffs all <style> elements found in <body> at the end of <head>
- If OpenSocial-type <script> elements are treated per spec (ie. having only text, no children), reprocesses this text as HTML and adds as children for template processing.
* Introduces CajaHtmlParser
- Still has parseDomImpl method, mostly for API compatibility (short-term) with Neko-based HtmlParser implementation, which has subtle differences btw parseDomImpl and parseFragmentImpl which I want to clean up in a follow-up CL (again, obviating the need for parseDomImpl altogether).
- Delegates to Caja's DomParser class's parseFragment() method for most parsing needs
- Depends on: http://codereview.appspot.com/157089/show to retain comments and fully-formed documents (working w/ Caja folks to get this committed)
This CL is NOT fully complete as yet, but I'm sending it out for a look by the community. In addition to the Caja CL dependency, the CL implicitly makes Shindig require Java 1.6 b/c of its use of LinkedList.pop() (Deque interface) and diff_match_patch, whose Maven JAR is built against 1.6. I'll fix both of these, unless a move to 1.6 is deemed reasonable by all.
Total comments: 5
|
Unified diffs |
Side-by-side diffs |
Delta from patch set |
Stats (+1135 lines, -470 lines) |
Patch |
 |
|
features/src/main/javascript/features/caja/feature.xml
|
View
|
1
2
3
4
|
1 chunk |
+1 line, -1 line |
0 comments
|
Download
|
 |
|
java/gadgets/pom.xml
|
View
|
1
2
3
4
|
1 chunk |
+4 lines, -0 lines |
1 comment
|
Download
|
 |
|
java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java
|
View
|
1
2
3
4
|
6 chunks |
+156 lines, -48 lines |
4 comments
|
Download
|
 |
|
java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/caja/CajaHtmlParser.java
|
View
|
1
2
3
4
|
1 chunk |
+136 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/nekohtml/NekoSimplifiedHtmlParser.java
|
View
|
1
2
3
4
|
1 chunk |
+1 line, -2 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/CajaContentRewriter.java
|
View
|
1
2
3
4
|
1 chunk |
+0 lines, -1 line |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/AbstractParserAndSerializerTest.java
|
View
|
1
2
3
4
|
1 chunk |
+58 lines, -18 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/AbstractParsingTestBase.java
|
View
|
1
2
3
4
|
1 chunk |
+114 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/AbstractSocialMarkupHtmlParserTest.java
|
View
|
1
2
3
|
1 chunk |
+198 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/CompactHtmlSerializerTest.java
|
View
|
1
2
3
4
|
2 chunks |
+20 lines, -12 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/caja/CajaCompactHtmlSerializerTest.java
|
View
|
|
1 chunk |
+32 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/caja/CajaParserAndSerializerTest.java
|
View
|
|
1 chunk |
+32 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/caja/CajaSocialMarkupHtmlParserTest.java
|
View
|
|
1 chunk |
+32 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/nekohtml/NekoCompactHtmlSerializerTest.java
|
View
|
|
1 chunk |
+36 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/nekohtml/NekoParserAndSerializeTest.java
|
View
|
1
2
3
4
|
1 chunk |
+39 lines, -39 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/nekohtml/SocialMarkupHtmlParserTest.java
|
View
|
1
2
3
4
|
1 chunk |
+5 lines, -128 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test.html
|
View
|
1
2
3
4
|
1 chunk |
+0 lines, -26 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-expected.html
|
View
|
1
2
3
4
|
1 chunk |
+0 lines, -23 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-fragment2.html
|
View
|
1
2
3
4
|
1 chunk |
+0 lines, -2 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-fragment2-expected.html
|
View
|
1
2
3
4
|
1 chunk |
+0 lines, -2 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-fulldocnodoctype-expected.html
|
View
|
|
1 chunk |
+8 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-leadingscript-expected.html
|
View
|
1
2
3
4
|
1 chunk |
+5 lines, -1 line |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-socialmarkup.html
|
View
|
1
2
3
4
|
1 chunk |
+0 lines, -19 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-with-ampersands.html
|
View
|
1
2
3
4
|
1 chunk |
+0 lines, -11 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-with-ampersands-expected.html
|
View
|
1
2
3
4
|
1 chunk |
+0 lines, -8 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-with-iecond-comments.html
|
View
|
1
2
3
4
|
1 chunk |
+0 lines, -30 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-with-iecond-comments-expected.html
|
View
|
1
2
3
4
|
1 chunk |
+0 lines, -4 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-with-specialtags.html
|
View
|
1
2
3
4
|
1 chunk |
+0 lines, -61 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/nekohtml/test-with-specialtags-expected.html
|
View
|
1
2
3
4
|
1 chunk |
+0 lines, -33 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test.html
|
View
|
|
1 chunk |
+26 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-expected.html
|
View
|
|
1 chunk |
+23 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment.html
|
View
|
|
1 chunk |
+2 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment-expected.html
|
View
|
|
1 chunk |
+2 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment2.html
|
View
|
|
1 chunk |
+2 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fragment2-expected.html
|
View
|
|
1 chunk |
+2 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fulldocnodoctype.html
|
View
|
|
1 chunk |
+7 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-fulldocnodoctype-expected.html
|
View
|
|
1 chunk |
+7 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-headnobody.html
|
View
|
|
1 chunk |
+5 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-headnobody-expected.html
|
View
|
|
1 chunk |
+5 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-socialmarkup.html
|
View
|
|
1 chunk |
+19 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-ampersands.html
|
View
|
|
1 chunk |
+11 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-ampersands-expected.html
|
View
|
|
1 chunk |
+8 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-iecond-comments.html
|
View
|
|
1 chunk |
+30 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-iecond-comments-expected.html
|
View
|
|
1 chunk |
+4 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-specialtags.html
|
View
|
|
1 chunk |
+61 lines, -0 lines |
0 comments
|
Download
|
 |
|
java/gadgets/src/test/resources/org/apache/shindig/gadgets/parse/test-with-specialtags-expected.html
|
View
|
|
1 chunk |
+33 lines, -0 lines |
0 comments
|
Download
|
 |
|
pom.xml
|
View
|
1
2
3
4
|
2 chunks |
+11 lines, -1 line |
0 comments
|
Download
|
Total messages: 16
|