|
|
Created:
13 years, 7 months ago by gagan.goku Modified:
13 years, 6 months ago CC:
cool-shindig-committers_googlegroups.com, anupama.dutta, satya3656, Kuntal Loya Base URL:
http://svn.apache.org/repos/asf/shindig/trunk/ Visibility:
Public. |
DescriptionAdding a GadgetHtmlParserRouter which can route to Caja / Neko parser based on how its is configured.
This allows us to do stuff like:
@Override
public Document parseDomForEncodingDetection(String source) throws GadgetException {
for (GadgetHtmlParser parser : ImmutableList.of(nekoParser, cajaParser, vanillaCajaHtmlParser)) {
try {
return parser.parseDom(source);
} catch (GadgetException e) {
// Ignore.
}
}
return null;
}
Patch Set 1 #Patch Set 2 : adding parseDomForEncodingDetection #Patch Set 3 : using caja parser for accel #Patch Set 4 : using caja parser for accel #
Total comments: 10
Patch Set 5 : addressing anupamas comments #Patch Set 6 : fixing small bug #
Total comments: 4
Patch Set 7 : addressing anupamas comments #Patch Set 8 : adding vanilla caja parser #
Total comments: 2
Patch Set 9 : adding test and addressing satyas comment #Patch Set 10 : svn up #Patch Set 11 : little refactoring #
Total comments: 8
Patch Set 12 : addressing kuntals comments #Patch Set 13 : svn up #Patch Set 14 : trying out another variant #Patch Set 15 : trying out another variant #Patch Set 16 : reverting back to clean state #
Total comments: 2
MessagesTotal messages: 22
http://codereview.appspot.com/2119043/diff/9001/java/gadgets/src/main/java/or... File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java (right): http://codereview.appspot.com/2119043/diff/9001/java/gadgets/src/main/java/or... java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java:79: public Document parseDomForEncodingDetection(String source) throws GadgetException { Remove this from this CL, so that we focus only the GadgetHtmlParserRouter addition here. http://codereview.appspot.com/2119043/diff/9001/java/gadgets/src/main/java/or... File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java (right): http://codereview.appspot.com/2119043/diff/9001/java/gadgets/src/main/java/or... java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:42: Remove extra newline? http://codereview.appspot.com/2119043/diff/9001/java/gadgets/src/main/java/or... java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:52: public void setHtmlParserFromName(@Named("gadget.html.parser") String htmlParserName) { Shouldn't this be part of the constructor, else what value will this.parser have if parseDom or parseFragment is called without calling the set method? http://codereview.appspot.com/2119043/diff/9001/java/gadgets/src/main/java/or... java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:78: "HtmlParserRouter.parseDomImpl is not supported"); HtmlParserRouter -> GadgetHtmlParserRouter http://codereview.appspot.com/2119043/diff/9001/java/gadgets/src/main/java/or... java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:85: "HtmlParserRouter.parseFragmentImpl is not supported"); HtmlParserRouter -> GadgetHtmlParserRouter
Sign in to reply to this message.
http://codereview.appspot.com/2119043/diff/9001/java/gadgets/src/main/java/or... File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java (right): http://codereview.appspot.com/2119043/diff/9001/java/gadgets/src/main/java/or... java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java:79: public Document parseDomForEncodingDetection(String source) throws GadgetException { On 2010/09/14 12:07:52, anupama.dutta wrote: > Remove this from this CL, so that we focus only the GadgetHtmlParserRouter > addition here. Done. http://codereview.appspot.com/2119043/diff/9001/java/gadgets/src/main/java/or... File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java (right): http://codereview.appspot.com/2119043/diff/9001/java/gadgets/src/main/java/or... java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:42: On 2010/09/14 12:07:52, anupama.dutta wrote: > Remove extra newline? Done. http://codereview.appspot.com/2119043/diff/9001/java/gadgets/src/main/java/or... java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:52: public void setHtmlParserFromName(@Named("gadget.html.parser") String htmlParserName) { On 2010/09/14 12:07:52, anupama.dutta wrote: > Shouldn't this be part of the constructor, else what value will this.parser have > if parseDom or parseFragment is called without calling the set method? currently its null, but your suggestion is better. Done. http://codereview.appspot.com/2119043/diff/9001/java/gadgets/src/main/java/or... java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:78: "HtmlParserRouter.parseDomImpl is not supported"); On 2010/09/14 12:07:52, anupama.dutta wrote: > HtmlParserRouter -> GadgetHtmlParserRouter Done. http://codereview.appspot.com/2119043/diff/9001/java/gadgets/src/main/java/or... java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:85: "HtmlParserRouter.parseFragmentImpl is not supported"); On 2010/09/14 12:07:52, anupama.dutta wrote: > HtmlParserRouter -> GadgetHtmlParserRouter Done.
Sign in to reply to this message.
Minor comments below. http://codereview.appspot.com/2119043/diff/18001/java/gadgets/src/main/java/o... File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java (right): http://codereview.appspot.com/2119043/diff/18001/java/gadgets/src/main/java/o... java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:30: * Router that delegates to appropriate html parser. Could you mention here that Neko is the default parser? http://codereview.appspot.com/2119043/diff/18001/java/gadgets/src/main/java/o... File java/gadgets/src/main/java/org/apache/shindig/gadgets/rewrite/AccelResponseRewriterRegistry.java (right): http://codereview.appspot.com/2119043/diff/18001/java/gadgets/src/main/java/o... java/gadgets/src/main/java/org/apache/shindig/gadgets/rewrite/AccelResponseRewriterRegistry.java:42: htmlParser.setHtmlParserFromName(GadgetHtmlParserRouter.Parser.CAJA.name()); I am slightly confused about dependency injection usage in general. So, maybe my query here will be silly :( Can't we use the constructor route here, or is it not possible to pass the CAJA value instead of the already injected value of @Named("gadgets.html.parser")?
Sign in to reply to this message.
http://codereview.appspot.com/2119043/diff/18001/java/gadgets/src/main/java/o... File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java (right): http://codereview.appspot.com/2119043/diff/18001/java/gadgets/src/main/java/o... java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:30: * Router that delegates to appropriate html parser. On 2010/09/15 05:01:33, anupama.dutta wrote: > Could you mention here that Neko is the default parser? Done. http://codereview.appspot.com/2119043/diff/18001/java/gadgets/src/main/java/o... File java/gadgets/src/main/java/org/apache/shindig/gadgets/rewrite/AccelResponseRewriterRegistry.java (right): http://codereview.appspot.com/2119043/diff/18001/java/gadgets/src/main/java/o... java/gadgets/src/main/java/org/apache/shindig/gadgets/rewrite/AccelResponseRewriterRegistry.java:42: htmlParser.setHtmlParserFromName(GadgetHtmlParserRouter.Parser.CAJA.name()); On 2010/09/15 05:01:33, anupama.dutta wrote: > I am slightly confused about dependency injection usage in general. So, maybe my > query here will be silly :( > Can't we use the constructor route here, or is it not possible to pass the CAJA > value instead of the already injected value of @Named("gadgets.html.parser")? File reverted from this change for now. Your suggestion is a good one. I could do something like: @Named("cajaParser") GadgetHtmlParser instead. and in the ParseModule, we could bind it to something like: GadgetHtmlParserRouter cajRouter; cajaRouter.setHtmlParserFromName(GadgetHtmlParserRouter.Parser.CAJA.name()); bind(GadgetHtmlParser.class) .annotatedWith(Names.named("cajaParser")) .to(cajaRouter);
Sign in to reply to this message.
http://codereview.appspot.com/2119043/diff/27001/java/gadgets/src/main/java/o... File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java (right): http://codereview.appspot.com/2119043/diff/27001/java/gadgets/src/main/java/o... java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:52: super(null); Won't this lead to NPE if the code hits GadgetHtmlParser.errorDom()
Sign in to reply to this message.
http://codereview.appspot.com/2119043/diff/27001/java/gadgets/src/main/java/o... File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java (right): http://codereview.appspot.com/2119043/diff/27001/java/gadgets/src/main/java/o... java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:52: super(null); On 2010/09/20 09:52:17, satya3656 wrote: > Won't this lead to NPE if the code hits GadgetHtmlParser.errorDom() your right. Nice catch. i had never tested this with any parser other than caja. Added a test for the same.
Sign in to reply to this message.
http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/main/java/o... File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java (right): http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/main/java/o... java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:79: protected GadgetHtmlParser getParser() { Is getParser required for anything else but testing? http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/test/java/o... File java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouterTest.java (right): http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/test/java/o... java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouterTest.java:49: String html = "<html><head>" Keep a const variable, and use the same for all tests. http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/test/java/o... java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouterTest.java:54: assertTrue(router.getParser().getClass().getName().indexOf("NekoSimplifiedHtmlParser") > 0); assertEquals(NekoSimplifiedHtmlParser.class.getName();, router.getParser().getClass().getName()) Or just check if the parser is an instance of NekoSimplifiedHtmlParser. http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/test/java/o... java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouterTest.java:83: router.setHtmlParserFromName(GadgetHtmlParserRouter.Parser.VANILLA_CAJA.toString()); Add a test when the name is a random foo and it defaults to neko parser.
Sign in to reply to this message.
http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/main/java/o... File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java (right): http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/main/java/o... java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:79: protected GadgetHtmlParser getParser() { On 2010/10/04 10:21:03, Kuntal Loya wrote: > Is getParser required for anything else but testing? This method is the one that should be called to get the parser to be used. Note that the main public functions like: parseDom() parseFragment() use the parser returned by getParser() and call the appropriate dom parsing functions. Keeping this method separate give implementations the ability to provide their own parser selection logic. http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/test/java/o... File java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouterTest.java (right): http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/test/java/o... java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouterTest.java:49: String html = "<html><head>" On 2010/10/04 10:21:03, Kuntal Loya wrote: > Keep a const variable, and use the same for all tests. Done. http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/test/java/o... java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouterTest.java:54: assertTrue(router.getParser().getClass().getName().indexOf("NekoSimplifiedHtmlParser") > 0); On 2010/10/04 10:21:03, Kuntal Loya wrote: > assertEquals(NekoSimplifiedHtmlParser.class.getName();, > router.getParser().getClass().getName()) > Or just check if the parser is an instance of NekoSimplifiedHtmlParser. Thanks for the much simpler way of testing this :) Done. http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/test/java/o... java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouterTest.java:83: router.setHtmlParserFromName(GadgetHtmlParserRouter.Parser.VANILLA_CAJA.toString()); On 2010/10/04 10:21:03, Kuntal Loya wrote: > Add a test when the name is a random foo and it defaults to neko parser. Added a test case called testDefaultParserIsNekoInCaseSpecifiedParseInvalid Done.
Sign in to reply to this message.
On Mon, Oct 4, 2010 at 5:28 PM, <gagan.goku@gmail.com> wrote: > > > http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/main/java/o... > File > > java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java > (right): > > > http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/main/java/o... > > java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:79: > protected GadgetHtmlParser getParser() { > On 2010/10/04 10:21:03, Kuntal Loya wrote: > >> Is getParser required for anything else but testing? >> > > This method is the one that should be called to get the parser to be > used. > Note that the main public functions like: > parseDom() > parseFragment() > > use the parser returned by getParser() and call the appropriate dom > parsing functions. > > Keeping this method separate give implementations the ability to provide > their own parser selection logic. My question still stays - who is using the parser and why? If someone mentions the ParserRouter and the parserString, they should be just calling the main methods parseDom, parseFragment etc. Is there anything else they would do with the underlying parser? Moreover, parserRouter is an extension of GadgetHtmlParser which is a parser, the trick is, we are doing a routing business inside, but for the API user, it is to behave as a parser (which one depends on what he's configuring it to use whenever he wants). The main methods, parseDom, parseFragment can use a private method to get the parser, or use the this.parser variable itself. The parser selection logic flexibility should be via the name of the parser, the way it is done in setHtmlParserByName, right? > > > > http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/test/java/o... > File > > java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouterTest.java > (right): > > > http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/test/java/o... > > java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouterTest.java:49: > String html = "<html><head>" > On 2010/10/04 10:21:03, Kuntal Loya wrote: > >> Keep a const variable, and use the same for all tests. >> > > Done. > > > > http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/test/java/o... > > java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouterTest.java:54: > > assertTrue(router.getParser().getClass().getName().indexOf("NekoSimplifiedHtmlParser") > >> 0); >> > On 2010/10/04 10:21:03, Kuntal Loya wrote: > >> assertEquals(NekoSimplifiedHtmlParser.class.getName();, >> router.getParser().getClass().getName()) >> Or just check if the parser is an instance of >> > NekoSimplifiedHtmlParser. > > Thanks for the much simpler way of testing this :) > Done. > > > > http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/test/java/o... > > java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouterTest.java:83: > > router.setHtmlParserFromName(GadgetHtmlParserRouter.Parser.VANILLA_CAJA.toString()); > On 2010/10/04 10:21:03, Kuntal Loya wrote: > >> Add a test when the name is a random foo and it defaults to neko >> > parser. > > Added a test case called > testDefaultParserIsNekoInCaseSpecifiedParseInvalid > Done. > > > http://codereview.appspot.com/2119043/ >
Sign in to reply to this message.
On Mon, Oct 4, 2010 at 5:37 PM, Kuntal Loya <kuntal.loya@gmail.com> wrote: > > > On Mon, Oct 4, 2010 at 5:28 PM, <gagan.goku@gmail.com> wrote: > >> >> >> http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/main/java/o... >> File >> >> java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java >> (right): >> >> >> http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/main/java/o... >> >> java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:79: >> protected GadgetHtmlParser getParser() { >> On 2010/10/04 10:21:03, Kuntal Loya wrote: >> >>> Is getParser required for anything else but testing? >>> >> >> This method is the one that should be called to get the parser to be >> used. >> Note that the main public functions like: >> parseDom() >> parseFragment() >> >> use the parser returned by getParser() and call the appropriate dom >> parsing functions. >> >> Keeping this method separate give implementations the ability to provide >> their own parser selection logic. > > > My question still stays - who is using the parser and why? If someone > mentions the ParserRouter and the parserString, they should be just calling > the main methods parseDom, parseFragment etc. Is there anything else they > would do with the underlying parser? Moreover, parserRouter is an extension > of GadgetHtmlParser which is a parser, the trick is, we are doing a routing > business inside, but for the API user, it is to behave as a parser (which > one depends on what he's configuring it to use whenever he wants). > > The main methods, parseDom, parseFragment can use a private method to get > the parser, or use the this.parser variable itself. > > The parser selection logic flexibility should be via the name of the > parser, the way it is done in setHtmlParserByName, right? > Had a discussion with Gagan, and looks like getParser method should stay in order to maintain the flexibility. lgtm. > > >> >> >> >> http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/test/java/o... >> File >> >> java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouterTest.java >> (right): >> >> >> http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/test/java/o... >> >> java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouterTest.java:49: >> String html = "<html><head>" >> On 2010/10/04 10:21:03, Kuntal Loya wrote: >> >>> Keep a const variable, and use the same for all tests. >>> >> >> Done. >> >> >> >> http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/test/java/o... >> >> java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouterTest.java:54: >> >> assertTrue(router.getParser().getClass().getName().indexOf("NekoSimplifiedHtmlParser") >> >>> 0); >>> >> On 2010/10/04 10:21:03, Kuntal Loya wrote: >> >>> assertEquals(NekoSimplifiedHtmlParser.class.getName();, >>> router.getParser().getClass().getName()) >>> Or just check if the parser is an instance of >>> >> NekoSimplifiedHtmlParser. >> >> Thanks for the much simpler way of testing this :) >> Done. >> >> >> >> http://codereview.appspot.com/2119043/diff/31002/java/gadgets/src/test/java/o... >> >> java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouterTest.java:83: >> >> router.setHtmlParserFromName(GadgetHtmlParserRouter.Parser.VANILLA_CAJA.toString()); >> On 2010/10/04 10:21:03, Kuntal Loya wrote: >> >>> Add a test when the name is a random foo and it defaults to neko >>> >> parser. >> >> Added a test case called >> testDefaultParserIsNekoInCaseSpecifiedParseInvalid >> Done. >> >> >> http://codereview.appspot.com/2119043/ >> > >
Sign in to reply to this message.
http://codereview.appspot.com/2119043/diff/63001/java/common/conf/shindig.pro... File java/common/conf/shindig.properties (right): http://codereview.appspot.com/2119043/diff/63001/java/common/conf/shindig.pro... java/common/conf/shindig.properties:155: shindig.html.parser=neko Doesn't guice offer the same solution? Currently, we can switch between parsers by updating ParseModule#configure with bind(GadgetHtmlParser.class).to(NekoSimplifiedHtmlParser.class);
Sign in to reply to this message.
On 2010/10/05 01:02:01, chirag wrote: > http://codereview.appspot.com/2119043/diff/63001/java/common/conf/shindig.pro... > File java/common/conf/shindig.properties (right): > > http://codereview.appspot.com/2119043/diff/63001/java/common/conf/shindig.pro... > java/common/conf/shindig.properties:155: shindig.html.parser=neko > Doesn't guice offer the same solution? > Currently, we can switch between parsers by updating ParseModule#configure with > > bind(GadgetHtmlParser.class).to(NekoSimplifiedHtmlParser.class); When you bind GadgetHtmlParser to say NekoSimplifiedHtmlParser, you lose the ability to use neko for say proxy servlet (because existing gadgets are using it) and caja for say accel servlet. Also, adding the router opens more possibilities like: 1) Parsing dom for encoding detection: You have around 1024 bytes of the initial content of a page and you want to determine the encoding. You can try to parse the html and look for <meta http-equiv="Content-Type"> To make this parsing more robust, you could cycle through all the available parsers and choose the one which does not throw an exception.
Sign in to reply to this message.
Wouldn't Google Guice annotations[1] would solve this problem. Example Bind: bind(GadgetHtmlParser.class) .annotatedWith(Names.named("parser-accel")) .to(CajaHtmlParser.class); Inject: @Inject public Accel(@Named("parser-acce") GadgetHtmlParser parser, ... http://code.google.com/p/google-guice/wiki/BindingAnnotations On Mon, Oct 4, 2010 at 6:27 PM, <gagan.goku@gmail.com> wrote: > On 2010/10/05 01:02:01, chirag wrote: > > http://codereview.appspot.com/2119043/diff/63001/java/common/conf/shindig.pro... >> >> File java/common/conf/shindig.properties (right): > > > http://codereview.appspot.com/2119043/diff/63001/java/common/conf/shindig.pro... >> >> java/common/conf/shindig.properties:155: shindig.html.parser=neko >> Doesn't guice offer the same solution? >> Currently, we can switch between parsers by updating > > ParseModule#configure with > >> bind(GadgetHtmlParser.class).to(NekoSimplifiedHtmlParser.class); > > When you bind GadgetHtmlParser to say NekoSimplifiedHtmlParser, you lose > the ability to use neko for say proxy servlet (because existing gadgets > are using it) and caja for say accel servlet. > Also, adding the router opens more possibilities like: > > 1) Parsing dom for encoding detection: You have around 1024 bytes of the > initial content of a page and you want to determine the encoding. > You can try to parse the html and look for <meta > http-equiv="Content-Type"> > To make this parsing more robust, you could cycle through all the > available parsers and choose the one which does not throw an exception. > > > http://codereview.appspot.com/2119043/ >
Sign in to reply to this message.
Another option is to use a MapBinder. This would allow other modules to add a custom parser to this independently of the router class. http://google-guice.googlecode.com/svn/trunk/javadoc/com/google/inject/multib...
Sign in to reply to this message.
http://codereview.appspot.com/2119043/diff/63001/java/gadgets/src/main/java/o... File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java (right): http://codereview.appspot.com/2119043/diff/63001/java/gadgets/src/main/java/o... java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:65: public void setHtmlParserFromName(String htmlParserName) { My 2 cents is I am not too sure about additional setHtmlParserFromName public method in this class which is not expose in the GadgetHTMLRenderer. If iterating through existing parsers is really desired we could probably make the router as factory which does not extend the GadgetHTMLParser instead. Because essentially thats what the router is doing, right? And we could also use the Map binding as Paul suggested in the implementation.
Sign in to reply to this message.
Hi guys Thanks for your valuable comments. I realize that the motivation for this change was not communicated correctly. To give you more insight, we have overridden the GadgetHtmlParser with a similar router which is capable of routing to caja / neko depending on the request context. This is very helpful because it allows us to say things like "use caja parser for accel container, neko otherwise". Also, the logic for deciding which parser to use is consolidated into 1 place, so we dont muck up multiple places. Also, it extends GadgetHtmlParser so that the existing shindig injections of GadgetHtmlParser dont need to change at all. Part of the motivation for this change was to move out the router to shindig, and override the decision specific stuff in our installation. This helps because other systems (like chirag's use case for caja parser) reuse the functionality. Also, while debugging, we found web pages which dont specify the encoding in the response headers, but do specify it as part of the <meta Http-Equiv="Content-Type"> tag. For accel use case, it is very important to parse the content encoding correctly, otherwise the default encoding is assumed, and the page looks like garbage. Sadly there is no easy of knowing that we messed up there. We have seen caja throw up on lots of web pages, and neko mess up the dom, but still parse it to some extent. Hence the motivation for parseDomForCharsetDetection<http://codereview.appspot.com/1883041/> change which we would override in the router to cycle through all available parsers, so that we get some dom which can be used to extract the page encoding. Given the above reasons, i urge all to give it some more thought. Meanwhile, we'l try to implement parseDomForCharsetDetection in our installation first and come back to this change later when we have more experiences to share. Thanks lots for the feedback. Gagan On Tue, Oct 5, 2010 at 11:32 PM, <henry.saputra@gmail.com> wrote: > > > http://codereview.appspot.com/2119043/diff/63001/java/gadgets/src/main/java/o... > > File > > java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java > (right): > > > http://codereview.appspot.com/2119043/diff/63001/java/gadgets/src/main/java/o... > > java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParserRouter.java:65: > public void setHtmlParserFromName(String htmlParserName) { > My 2 cents is I am not too sure about additional setHtmlParserFromName > public method in this class which is not expose in the > GadgetHTMLRenderer. > > If iterating through existing parsers is really desired we could > probably make the router as factory which does not extend the > GadgetHTMLParser instead. Because essentially thats what the router is > doing, right? > And we could also use the Map binding as Paul suggested in the > implementation. > > > http://codereview.appspot.com/2119043/ >
Sign in to reply to this message.
|