http://codereview.appspot.com/14041/diff/1/2 File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/caja/CajaCssSanitizer.java (right): http://codereview.appspot.com/14041/diff/1/2#newcode41 Line 41: public class CajaCssSanitizer { I'm a little concerned ...
16 years, 7 months ago
(2009-02-03 07:33:58 UTC)
#2
http://codereview.appspot.com/14041/diff/1/2
File
java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/caja/CajaCssSanitizer.java
(right):
http://codereview.appspot.com/14041/diff/1/2#newcode41
Line 41: public class CajaCssSanitizer {
I'm a little concerned about tying this too closely with Caja. This effectively
forces a caja dependency even on someone who doesn't want to allow CSS. It
should be possible to extract a CssSanitizer interface from this.
http://codereview.appspot.com/14041/diff/1/2#newcode85
Line 85: + ((CssTree.Property) ancestorChain.node).getPropertyName());
INFO is probably too high a warning level here. FINE or lower seems to make more
sense. Otherwise this is just log spam for most gadgets.
http://codereview.appspot.com/14041/diff/1/10
File
java/gadgets/src/main/java/org/apache/shindig/gadgets/rewrite/CssRewriter.java
(right):
http://codereview.appspot.com/14041/diff/1/10#newcode45
Line 45: public class CssRewriter {
The relationship between CssRewriter and CSSContentRewriter isn't terribly clear
(other than CSSContentRewriter implementing the ContentRewriter interface). The
naming is also inconsistent. Now seems like a good time to clean that up.
http://codereview.appspot.com/14041/diff/1/10#newcode53
Line 53: private static final Pattern urlMatcher =
Is this necessary any longer?
Updated as per comments http://codereview.appspot.com/14041/diff/1/2 File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/caja/CajaCssSanitizer.java (right): http://codereview.appspot.com/14041/diff/1/2#newcode41 Line 41: public class CajaCssSanitizer { ...
16 years, 7 months ago
(2009-02-03 17:51:47 UTC)
#3
Updated as per comments
http://codereview.appspot.com/14041/diff/1/2
File
java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/caja/CajaCssSanitizer.java
(right):
http://codereview.appspot.com/14041/diff/1/2#newcode41
Line 41: public class CajaCssSanitizer {
On 2009/02/03 07:33:58, etnu00 wrote:
> I'm a little concerned about tying this too closely with Caja. This
effectively
> forces a caja dependency even on someone who doesn't want to allow CSS. It
> should be possible to extract a CssSanitizer interface from this.
Not really without having an implementation agnostic CSS DOM interface which I
dont think makes sense at this point. Nor would it be useful without a viable
alternative CSS parser. Ill track this in JIRA
http://codereview.appspot.com/14041/diff/1/2#newcode85
Line 85: + ((CssTree.Property) ancestorChain.node).getPropertyName());
On 2009/02/03 07:33:58, etnu00 wrote:
> INFO is probably too high a warning level here. FINE or lower seems to make
more
> sense. Otherwise this is just log spam for most gadgets.
Done.
http://codereview.appspot.com/14041/diff/1/10
File
java/gadgets/src/main/java/org/apache/shindig/gadgets/rewrite/CssRewriter.java
(right):
http://codereview.appspot.com/14041/diff/1/10#newcode45
Line 45: public class CssRewriter {
On 2009/02/03 07:33:58, etnu00 wrote:
> The relationship between CssRewriter and CSSContentRewriter isn't terribly
clear
> (other than CSSContentRewriter implementing the ContentRewriter interface).
The
> naming is also inconsistent. Now seems like a good time to clean that up.
Indeed, done. Also removed all lexer based rewriter artifacts.
http://codereview.appspot.com/14041/diff/1/10#newcode53
Line 53: private static final Pattern urlMatcher =
On 2009/02/03 07:33:58, etnu00 wrote:
> Is this necessary any longer?
Nope. gone via refactor
http://codereview.appspot.com/14041/diff/1042/48 File ../trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/http/HttpRequest.java (right): http://codereview.appspot.com/14041/diff/1042/48#newcode208 Line 208: public boolean isSanitizeContent() { On 2009/02/04 18:39:18, awiner ...
16 years, 7 months ago
(2009-02-04 23:18:43 UTC)
#7
http://codereview.appspot.com/14041/diff/1042/48
File
../trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/http/HttpRequest.java
(right):
http://codereview.appspot.com/14041/diff/1042/48#newcode208
Line 208: public boolean isSanitizeContent() {
On 2009/02/04 18:39:18, awiner wrote:
> an awkward name; maybe isSanitizationNeeded()?
Agreed. Using isSanitizationRequested
http://codereview.appspot.com/14041/diff/1042/44
File
../trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/GadgetHtmlParser.java
(right):
http://codereview.appspot.com/14041/diff/1042/44#newcode49
Line 49: static final Logger logger =
Logger.getLogger(GadgetHtmlParser.class.getName());
On 2009/02/04 18:39:18, awiner wrote:
> private
Done.
http://codereview.appspot.com/14041/diff/1042/44#newcode53
Line 53: public static final String CSS_DOM = "CSS-DOM";
On 2009/02/04 18:39:18, awiner wrote:
> CSS_DOM_USER_DATA? CSS_DOM_USER_DATA_KEY?
>
CSS_DOM_USER_DATA_KEY sounds right
http://codereview.appspot.com/14041/diff/1042/44#newcode57
Line 57: private CajaCssParser cajaCssParser = new CajaCssParser();
On 2009/02/04 18:39:18, awiner wrote:
> final
Done.
http://codereview.appspot.com/14041/diff/1042/44#newcode111
Line 111: parseCss(document);
On 2009/02/04 18:39:18, awiner wrote:
> I don't see why this should be harcoded in the parse; can't it be in a
rewriter
> or a rewriter of its own? Or just do this lazily for each style element.
Agreed. I will decouple this and use the same caching approach we use for HTML
documents. Laregly because proxied content will create a very large number of
duplicate CSS structures.
http://codereview.appspot.com/14041/diff/1042/44#newcode170
Line 170: * TODO: Consider sanitizing style attrs if we need ot whitelist them
On 2009/02/04 18:39:18, awiner wrote:
> ot -> to
removed
http://codereview.appspot.com/14041/diff/1042/44#newcode171
Line 171: * @param doc
On 2009/02/04 18:39:18, awiner wrote:
> empty @params should just be deleted
removed
http://codereview.appspot.com/14041/diff/1042/44#newcode177
Line 177: .getElementsByTagNameCaseInsensitive(doc, Sets.newHashSet("style"));
On 2009/02/04 18:39:18, awiner wrote:
> ImmutableSet.of("style") is more efficient
the mind is willing but the flesh is weak
http://codereview.appspot.com/14041/diff/1042/44#newcode183
Line 183: // Log Css parse exceptions but reatin the DOM
On 2009/02/04 18:39:18, awiner wrote:
> reatin -> retain
removed
http://codereview.appspot.com/14041/diff/1042/41
File
../trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/caja/CajaCssParser.java
(right):
http://codereview.appspot.com/14041/diff/1042/41#newcode48
Line 48: private static final URI FAKE_SOURCE =
URI.create("http://www.example.org");
On 2009/02/04 18:39:18, awiner wrote:
> I think example.org should only be used for examples. This should be an
> intentionally false URI. Need some Javadoc here that this URI is not actually
> going to be read from.
Done
http://codereview.appspot.com/14041/diff/1042/41#newcode56
Line 56: public CssTree.StyleSheet parseDom(String content) throws
GadgetException {
On 2009/02/04 18:39:18, awiner wrote:
> perhaps name parseIntoDom(); this doesn't actually parse DOM.
using same signature as in HTML. Unchanged
http://codereview.appspot.com/14041/diff/1042/39
File
../trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/caja/CajaCssSanitizer.java
(right):
http://codereview.appspot.com/14041/diff/1042/39#newcode53
Line 53: private static Set<String> ALLOWED_URI_SCHEMES =
ImmutableSet.of("http", "https");
On 2009/02/04 18:39:18, awiner wrote:
> final
Done.
http://codereview.appspot.com/14041/diff/1042/39#newcode55
Line 55: private static CajaCssParser PARSER = new CajaCssParser();
On 2009/02/04 18:39:18, awiner wrote:
> final. Any chance of using @Inject to build this (stateless) object instead
of
> ugly singletons? Also, CajaCssParser has no state, so it's cheap to create,
why
> bother with a static singleton at all?
Done. Needed because of updating caching model anyway
http://codereview.appspot.com/14041/diff/1042/39#newcode80
Line 80: styleElem.getParentNode().removeChild(styleElem);
On 2009/02/04 18:39:18, awiner wrote:
> is this case going to come up ever? Would rather have simpler code.
It does. Many style tags contain just an import, when that gets removed theyre
empty
http://codereview.appspot.com/14041/diff/1042/39#newcode93
Line 93: final CssSchema schema = CssSchema.getDefaultCss21Schema(new
SimpleMessageQueue());
On 2009/02/04 18:39:18, awiner wrote:
> is the schema threadsafe? If so, make an instance variable.
Done.
http://codereview.appspot.com/14041/diff/1042/39#newcode97
Line 97: if
(!schema.isPropertyAllowed(((CssTree.Property)ancestorChain.node).getPropertyName()))
{
On 2009/02/04 18:39:18, awiner wrote:
> space after ), here and below for consistency
Done.
http://codereview.appspot.com/14041/diff/1042/39#newcode99
Line 99: logger.log(Level.FINE, "Removing property "
On 2009/02/04 18:39:18, awiner wrote:
> don't do string manipulation in Level.FINE without wrapping in
> if (logger.isLoggable(Level.FINE)) { ... }
> ... here and below.
Done.
http://codereview.appspot.com/14041/diff/1042/39#newcode117
Line 117: } catch (Throwable t) {
On 2009/02/04 18:39:18, awiner wrote:
> never catch Throwable. RuntimeException is more like it; but even so it'd be
> better to only catch exceptions that can be explicitly thrown by Uri.parse()
Switched to RuntimeException. A certain level of paranoia is warranted here.
http://codereview.appspot.com/14041/diff/1042/39#newcode127
Line 127: CssTree.Import importDecl = (CssTree.Import) ancestorChain.node;
On 2009/02/04 18:39:18, awiner wrote:
> should we validate URIs of import declarations too? Is any of that happening
in
> the link rewriter?
Yes, the sanitizing link rewriter is quite paranoid.
http://codereview.appspot.com/14041/diff/1042/39#newcode138
Line 138: * @param chain of nodes
On 2009/02/04 18:39:18, awiner wrote:
> @param chain chain of nodes
Done.
http://codereview.appspot.com/14041/diff/1042/39#newcode144
Line 144:
((AbstractParseTreeNode)chain.getParentNode()).removeChild(chain.node);
On 2009/02/04 18:39:18, awiner wrote:
> why isn't it safe to trim at a property level?
>
A declaration can have multiple properties some of which are dependent,
generally safer to strip at this level. A declaration is not the entire rule but
just a an "x : y" pair.
http://codereview.appspot.com/14041/diff/1042/62
File
../trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/caja/CajaCssUtils.java
(right):
http://codereview.appspot.com/14041/diff/1042/62#newcode39
Line 39: result.add((T) child);
On 2009/02/04 18:39:18, awiner wrote:
> use nodeType.cast(child) instead of a cast (avoids compiler warnings)
Done.
http://codereview.appspot.com/14041/diff/1042/62#newcode53
Line 53: descendants.add((T) ancestorChain.node);
On 2009/02/04 18:39:18, awiner wrote:
> ditto
Done.
http://codereview.appspot.com/14041/diff/1042/55
File
../trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/render/SanitizedRenderingContentRewriter.java
(right):
http://codereview.appspot.com/14041/diff/1042/55#newcode81
Line 81: private static Map<String, Set<String>> PROXY_IMAGE_ATTRIBUTES =
On 2009/02/04 18:39:18, awiner wrote:
> final. And could be Map<String, ImmutableSet<String>> without any harm,
saving
> the cast below
Done.
http://codereview.appspot.com/14041/diff/1042/55#newcode86
Line 86: private final CajaCssSanitizer cssSanitizier;
On 2009/02/04 18:39:18, awiner wrote:
> cssSanitizier -> cssSanitizer
pardon my french.
http://codereview.appspot.com/14041/diff/1042/55#newcode98
Line 98: cssSanitizier = new CajaCssSanitizer();
On 2009/02/04 18:39:18, awiner wrote:
> why not inject these two?
Removed need to inject parser. Sanitizer injected
http://codereview.appspot.com/14041/diff/1042/55#newcode137
Line 137: ContentRewriterFeature rewriterFeature = new
ContentRewriterFeature(null,
On 2009/02/04 18:39:18, awiner wrote:
> move to a method on ContentRewriterFeature?
Moved to ContentRewriterFeatureFactory
http://codereview.appspot.com/14041/diff/1042/55#newcode144
Line 144: * We dont actually rewrite the image we just ensure that it is in fact
a valid
On 2009/02/04 18:39:18, awiner wrote:
> some punctuation, please ;)
Done! eats(,) shoots and leaves.
http://codereview.appspot.com/14041/diff/1042/55#newcode184
Line 184: if (contentType == null ||
contentType.toLowerCase().startsWith("text/css")) {
On 2009/02/04 18:39:18, awiner wrote:
> suggest just "text/*" - as long CssParser.parseDom() definitely throws an
> exception if the content is not valid CSS. I've seen plenty of servers
> misconfigured for CSS mime types.
Agreed. Done.
http://codereview.appspot.com/14041/diff/1042/55#newcode191
Line 191: } catch (Throwable t) {
On 2009/02/04 18:39:18, awiner wrote:
> catch RuntimeException, not Throwable - except this worries the heck out of
me.
> Any bug in the CSS parser, sanitizer, serializer, etc. is simply going to be
> swallowed, making debugging a thorough joy. If we could, in any way, use a
> specific exception for CSS parsing failures, that would be safer.
>
Rewritten to a try..finally pattern that it a bit cleaner.
http://codereview.appspot.com/14041/diff/1042/55#newcode192
Line 192: logger.log(Level.INFO, "Error sanitizing CSS file " +
request.getUri(), t);
On 2009/02/04 18:39:18, awiner wrote:
> include the exception too in the log at a minimum
Using try..finally now
http://codereview.appspot.com/14041/diff/1042/55#newcode210
Line 210: private LinkRewriter cssRewriter;
On 2009/02/04 18:39:18, awiner wrote:
> all vars final
Done.
http://codereview.appspot.com/14041/diff/1042/55#newcode332
Line 332: } catch (Throwable t) {
On 2009/02/04 18:39:18, awiner wrote:
> RuntimeException
Done.
http://codereview.appspot.com/14041/diff/1042/63
File
../trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/rewrite/CSSContentRewriter.java
(right):
http://codereview.appspot.com/14041/diff/1042/63#newcode50
Line 50: public class CSSContentRewriter implements ContentRewriter {
On 2009/02/04 18:39:18, awiner wrote:
> CSSContentRewriter -> CssContentRewriter for consistency
Deferred because of hassle it will cause windows users.
http://codereview.appspot.com/14041/diff/1042/63#newcode52
Line 52: static CajaCssParser cssParser = new CajaCssParser();
On 2009/02/04 18:39:18, awiner wrote:
> just inject this.
Done.
http://codereview.appspot.com/14041/diff/1042/63#newcode83
Line 83: public static List<String> rewrite(Reader content, Uri source,
On 2009/02/04 18:39:18, awiner wrote:
> some Javadoc? Do all these functions need to be public? I think only the
> second one does
Done.
http://codereview.appspot.com/14041/diff/1042/63#newcode86
Line 86: boolean extractImports) {
On 2009/02/04 18:39:18, awiner wrote:
> weird wrapping strategy
Done.
http://codereview.appspot.com/14041/diff/1042/63#newcode105
Line 105: (CssTree.StyleSheet)styleNode.getUserData(GadgetHtmlParser.CSS_DOM);
On 2009/02/04 18:39:18, awiner wrote:
> space fater ) for consistency
Done.
http://codereview.appspot.com/14041/diff/1042/63#newcode107
Line 107: stylesheet = cssParser.parseDom(styleNode.getTextContent());
On 2009/02/04 18:39:18, awiner wrote:
> i've seen this code a couple of times: any reason not to put it on cssParser?
> cssParser.getDom(Element styleNode)
Cleaner now.
http://codereview.appspot.com/14041/diff/1042/63#newcode131
Line 131: if (extractImports && chain.node instanceof CssTree.Import) {
On 2009/02/04 18:39:18, awiner wrote:
> some inline comments would be helpful
Done.
http://codereview.appspot.com/14041/diff/1042/67
File
../trunk/java/gadgets/src/main/java/org/apache/shindig/gadgets/rewrite/ProxyingLinkRewriter.java
(right):
http://codereview.appspot.com/14041/diff/1042/67#newcode74
Line 74:
On 2009/02/04 18:39:18, awiner wrote:
> this file should be checked in separately - it doesn't depend on anything else
> in this patch
The sanitization comes after the rewriter phase, without this its would
double-rewrite so it is needed.
http://codereview.appspot.com/14041/diff/1042/58
File
../trunk/java/gadgets/src/test/java/org/apache/shindig/gadgets/parse/caja/CajaCssSanitizerTest.java
(right):
http://codereview.appspot.com/14041/diff/1042/58#newcode28
Line 28: * Sanitize CSS using Caja's property and function whitelist. On its own
it does not eliminate
On 2009/02/04 18:39:18, awiner wrote:
> Javadoc comment like this belongs on CajaCssSanitizer, not the test.
Comment has been verified to be wrong, removed.
http://codereview.appspot.com/14041/diff/1042/58#newcode64
Line 64: assertEquals(".xyz {\n}", parser.serialize(styleSheet));
On 2009/02/04 18:39:18, awiner wrote:
> ditto; parse the expected, then pretty-print, then compare
Done.
http://codereview.appspot.com/14041/diff/1042/58#newcode71
Line 71: assertEquals(".xyz {\n}", parser.serialize(styleSheet));
On 2009/02/04 18:39:18, awiner wrote:
> ditto
Done.
http://codereview.appspot.com/14041/diff/1042/58#newcode78
Line 78: assertEquals(".xyz {\n}", parser.serialize(styleSheet));
On 2009/02/04 18:39:18, awiner wrote:
> and ditto
Done.
Issue 14041: CSS DOM based rewriting and sanitization using Caja
Created 16 years, 7 months ago by louiscryan
Modified 16 years, 7 months ago
Reviewers: shindig-dev_incubator.apache.org, etnu00, awiner
Base URL: http://svn.apache.org/repos/asf/incubator/shindig/trunk/
Comments: 99