Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(2154)

Issue 2119043: Adding an html parser router

Can't Edit
Can't Publish+Mail
Start Review
Created:
13 years, 7 months ago by gagan.goku
Modified:
13 years, 6 months ago
CC:
cool-shindig-committers_googlegroups.com, anupama.dutta, satya3656, Kuntal Loya
Base URL:
http://svn.apache.org/repos/asf/shindig/trunk/
Visibility:
Public.

Description

Adding a GadgetHtmlParserRouter which can route to Caja / Neko parser based on how its is configured. This allows us to do stuff like: @Override public Document parseDomForEncodingDetection(String source) throws GadgetException { for (GadgetHtmlParser parser : ImmutableList.of(nekoParser, cajaParser, vanillaCajaHtmlParser)) { try { return parser.parseDom(source); } catch (GadgetException e) { // Ignore. } } return null; }

Patch Set 1 #

Patch Set 2 : adding parseDomForEncodingDetection #

Patch Set 3 : using caja parser for accel #

Patch Set 4 : using caja parser for accel #

Total comments: 10

Patch Set 5 : addressing anupamas comments #

Patch Set 6 : fixing small bug #

Total comments: 4

Patch Set 7 : addressing anupamas comments #

Patch Set 8 : adding vanilla caja parser #

Total comments: 2

Patch Set 9 : adding test and addressing satyas comment #

Patch Set 10 : svn up #

Patch Set 11 : little refactoring #

Total comments: 8

Patch Set 12 : addressing kuntals comments #

Patch Set 13 : svn up #

Patch Set 14 : trying out another variant #

Patch Set 15 : trying out another variant #

Patch Set 16 : reverting back to clean state #

Total comments: 2
Unified diffs Side-by-side diffs Delta from patch set Stats (+202 lines, -2 lines) Patch
M java/common/conf/shindig.properties View 1 2 3 4 5 6 7 8 9 10 11 12 1 chunk +3 lines, -0 lines 1 comment Download
M java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java View 1 2 3 4 5 6 7 8 9 10 11 12 1 chunk +1 line, -1 line 0 comments Download
A java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java View 1 2 3 4 5 6 7 8 9 10 14 15 1 chunk +106 lines, -0 lines 1 comment Download
M java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/ParseModule.java View 1 2 3 4 5 6 7 8 9 10 11 12 1 chunk +0 lines, -1 line 0 comments Download
A java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouterTest.java View 9 10 11 1 chunk +92 lines, -0 lines 0 comments Download

Messages

Total messages: 22
gagan.goku
13 years, 7 months ago (2010-09-06 18:22:00 UTC) #1
gagan.goku
13 years, 7 months ago (2010-09-06 18:28:18 UTC) #2
gagan.goku
13 years, 7 months ago (2010-09-07 01:14:17 UTC) #3
gagan.goku
13 years, 7 months ago (2010-09-07 01:15:13 UTC) #4
anupama.dutta
http://codereview.appspot.com/2119043/diff/9001/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java (right): http://codereview.appspot.com/2119043/diff/9001/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java#newcode79 java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java:79: public Document parseDomForEncodingDetection(String source) throws GadgetException { Remove this ...
13 years, 7 months ago (2010-09-14 12:07:50 UTC) #5
gagan.goku
13 years, 7 months ago (2010-09-15 02:58:48 UTC) #6
gagan.goku
http://codereview.appspot.com/2119043/diff/9001/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java (right): http://codereview.appspot.com/2119043/diff/9001/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java#newcode79 java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java:79: public Document parseDomForEncodingDetection(String source) throws GadgetException { On 2010/09/14 ...
13 years, 7 months ago (2010-09-15 03:30:14 UTC) #7
gagan.goku
13 years, 7 months ago (2010-09-15 04:00:16 UTC) #8
anupama.dutta
Minor comments below. http://codereview.appspot.com/2119043/diff/18001/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java (right): http://codereview.appspot.com/2119043/diff/18001/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java#newcode30 java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:30: * Router that delegates to appropriate ...
13 years, 7 months ago (2010-09-15 05:01:33 UTC) #9
gagan.goku
http://codereview.appspot.com/2119043/diff/18001/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java (right): http://codereview.appspot.com/2119043/diff/18001/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java#newcode30 java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:30: * Router that delegates to appropriate html parser. On ...
13 years, 7 months ago (2010-09-16 04:16:34 UTC) #10
satya3656
http://codereview.appspot.com/2119043/diff/27001/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java (right): http://codereview.appspot.com/2119043/diff/27001/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java#newcode52 java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:52: super(null); Won't this lead to NPE if the code ...
13 years, 7 months ago (2010-09-20 09:52:17 UTC) #11
gagan.goku
http://codereview.appspot.com/2119043/diff/27001/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java (right): http://codereview.appspot.com/2119043/diff/27001/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java#newcode52 java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:52: super(null); On 2010/09/20 09:52:17, satya3656 wrote: > Won't this ...
13 years, 7 months ago (2010-09-28 07:05:12 UTC) #12
Kuntal Loya
http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java (right): http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java#newcode79 java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:79: protected GadgetHtmlParser getParser() { Is getParser required for anything ...
13 years, 7 months ago (2010-10-04 10:21:03 UTC) #13
gagan.goku
http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java (right): http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java#newcode79 java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:79: protected GadgetHtmlParser getParser() { On 2010/10/04 10:21:03, Kuntal Loya ...
13 years, 7 months ago (2010-10-04 11:58:59 UTC) #14
Kuntal Loya
On Mon, Oct 4, 2010 at 5:28 PM, <gagan.goku@gmail.com> wrote: > > > http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java > ...
13 years, 7 months ago (2010-10-04 12:08:03 UTC) #15
Kuntal Loya
On Mon, Oct 4, 2010 at 5:37 PM, Kuntal Loya <kuntal.loya@gmail.com> wrote: > > > ...
13 years, 7 months ago (2010-10-04 12:58:55 UTC) #16
chirag
http://codereview.appspot.com/2119043/diff/63001/java/common/conf/shindig.properties File java/common/conf/shindig.properties (right): http://codereview.appspot.com/2119043/diff/63001/java/common/conf/shindig.properties#newcode155 java/common/conf/shindig.properties:155: shindig.html.parser=neko Doesn't guice offer the same solution? Currently, we ...
13 years, 7 months ago (2010-10-05 01:02:01 UTC) #17
gagan.goku
On 2010/10/05 01:02:01, chirag wrote: > http://codereview.appspot.com/2119043/diff/63001/java/common/conf/shindig.properties > File java/common/conf/shindig.properties (right): > > http://codereview.appspot.com/2119043/diff/63001/java/common/conf/shindig.properties#newcode155 > ...
13 years, 7 months ago (2010-10-05 01:27:46 UTC) #18
chirag
Wouldn't Google Guice annotations[1] would solve this problem. Example Bind: bind(GadgetHtmlParser.class) .annotatedWith(Names.named("parser-accel")) .to(CajaHtmlParser.class); Inject: @Inject ...
13 years, 7 months ago (2010-10-05 01:34:31 UTC) #19
Paul Lindner
Another option is to use a MapBinder. This would allow other modules to add a ...
13 years, 6 months ago (2010-10-05 09:12:44 UTC) #20
henry.saputra
http://codereview.appspot.com/2119043/diff/63001/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java (right): http://codereview.appspot.com/2119043/diff/63001/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java#newcode65 java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:65: public void setHtmlParserFromName(String htmlParserName) { My 2 cents is ...
13 years, 6 months ago (2010-10-05 18:02:14 UTC) #21
gagan.goku
13 years, 6 months ago (2010-10-08 02:53:31 UTC) #22
Hi guys

Thanks for your valuable comments. I realize that the motivation for this
change was not communicated correctly.
To give you more insight, we have overridden the GadgetHtmlParser with a
similar router which is capable of routing to caja / neko depending on the
request context. This is very helpful because it allows us to say things
like "use caja parser for accel container, neko otherwise".
Also, the logic for deciding which parser to use is consolidated into 1
place, so we dont muck up multiple places.
Also, it extends GadgetHtmlParser so that the existing shindig injections of
GadgetHtmlParser dont need to change at all.

Part of the motivation for this change was to move out the router to
shindig, and override the decision specific stuff in our installation. This
helps because other systems (like chirag's use case for caja parser) reuse
the functionality.

Also, while debugging, we found web pages which dont specify the encoding in
the response headers, but do specify it as part of the <meta
Http-Equiv="Content-Type"> tag. For accel use case, it is very important to
parse the content encoding correctly, otherwise the default encoding is
assumed, and the page looks like garbage. Sadly there is no easy of knowing
that we messed up there.

We have seen caja throw up on lots of web pages, and neko mess up the dom,
but still parse it to some extent.
Hence the motivation for
parseDomForCharsetDetection<http://codereview.appspot.com/1883041/>
change
which we would override in the router to cycle through all available
parsers, so that we get some dom which can be used to extract the page
encoding.

Given the above reasons, i urge all to give it some more thought. Meanwhile,
we'l try to implement parseDomForCharsetDetection in our installation first
and come back to this change later when we have more experiences to share.

Thanks lots for the feedback.
Gagan

On Tue, Oct 5, 2010 at 11:32 PM, <henry.saputra@gmail.com> wrote:

>
>
>
http://codereview.appspot.com/2119043/diff/63001/java/gadgets/src/main/java/o...
>
> File
>
>
java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java
> (right):
>
>
>
http://codereview.appspot.com/2119043/diff/63001/java/gadgets/src/main/java/o...
>
>
java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:65:
> public void setHtmlParserFromName(String htmlParserName) {
> My 2 cents is I am not too sure about additional setHtmlParserFromName
> public method in this class which is not expose in the
> GadgetHTMLRenderer.
>
> If iterating through existing parsers is really desired we could
> probably make the router as factory which does not extend the
> GadgetHTMLParser instead. Because essentially thats what the router is
> doing, right?
> And we could also use the Map binding as Paul suggested in the
> implementation.
>
>
> http://codereview.appspot.com/2119043/
>
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b