Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(28)

Issue 2181041: Let the html serializer handle escaping & in attributes correctly (Closed)

Can't Edit
Can't Publish+Mail
Start Review
Created:
13 years, 7 months ago by zhoresh
Modified:
13 years, 7 months ago
Reviewers:
gagan.goku, dev-remailer, Jasvir
Base URL:
http://svn.apache.org/repos/asf/shindig/trunk
Visibility:
Public.

Description

We found a case of a page that uses xhtml (page uses name-space) that has the concat request encoded twice: & was encoded to & Looking at the code it seems that there is a special case in the DefaultHtmlSerializer that does not escape '&' when an attribute is considered a url attribute. Ul attribute is 'src' or 'href' without name-space. As a result there is an extra encoding code in the ConcatVisitor. In our case the concat did encoding, and since the doc had name-space, the serializer did another encoding. The solution is to let the serializer always the encoding as needed and eliminate all the special code. This will also support the new CajaSerializer that do the same.

Patch Set 1 #

Total comments: 2

Patch Set 2 : Update test case and todo comment #

Patch Set 3 : Merge to tip #

Unified diffs Side-by-side diffs Delta from patch set Stats (+169 lines, -111 lines) Patch
java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/DefaultHtmlSerializer.java View 1 2 4 chunks +6 lines, -8 lines 0 comments Download
java/gadgets/src/main/java/org/apache/shindig/gadgets/render/RenderingGadgetRewriter.java View 1 2 9 chunks +11 lines, -11 lines 0 comments Download
java/gadgets/src/main/java/org/apache/shindig/gadgets/rewrite/ConcatVisitor.java View 1 2 1 chunk +1 line, -1 line 0 comments Download
java/gadgets/src/test/java/org/apache/shindig/gadgets/rewrite/ConcatVisitorTest.java View 2 29 chunks +151 lines, -91 lines 0 comments Download

Messages

Total messages: 10
zhoresh
13 years, 7 months ago (2010-09-14 00:05:41 UTC) #1
gagan.goku
Thanks for taking this up Ziv. Few thoughts: Please add a small test case as ...
13 years, 7 months ago (2010-09-14 02:53:21 UTC) #2
zhoresh
Update test case and todo comment
13 years, 7 months ago (2010-09-14 16:58:29 UTC) #3
zhoresh
On Mon, Sep 13, 2010 at 7:53 PM, <gagan.goku@gmail.com> wrote: > Thanks for taking this ...
13 years, 7 months ago (2010-09-14 17:06:54 UTC) #4
Jasvir
http://codereview.appspot.com/2181041/diff/1/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/DefaultHtmlSerializer.java File java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/DefaultHtmlSerializer.java (left): http://codereview.appspot.com/2181041/diff/1/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/DefaultHtmlSerializer.java#oldcode173 java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/DefaultHtmlSerializer.java:173: HtmlSerialization.URL_ATTRIBUTES.contains(attrName); On 2010/09/14 02:53:21, gagan.goku wrote: > Instead of ...
13 years, 7 months ago (2010-09-14 19:15:02 UTC) #5
Jasvir
LGTM
13 years, 7 months ago (2010-09-14 22:11:15 UTC) #6
gagan.goku
On 2010/09/14 19:15:02, jasvir wrote: > http://codereview.appspot.com/2181041/diff/1/java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/DefaultHtmlSerializer.java > File > java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/DefaultHtmlSerializer.java > (left): > > ...
13 years, 7 months ago (2010-09-15 02:19:04 UTC) #7
gagan.goku
On 2010/09/14 17:06:54, zhoresh wrote: > On Mon, Sep 13, 2010 at 7:53 PM, <mailto:gagan.goku@gmail.com> ...
13 years, 7 months ago (2010-09-15 02:21:20 UTC) #8
zhoresh
Merge to tip
13 years, 7 months ago (2010-09-15 17:14:39 UTC) #9
zhoresh
13 years, 7 months ago (2010-09-15 18:03:29 UTC) #10
On Tue, Sep 14, 2010 at 7:19 PM, <gagan.goku@gmail.com> wrote:

> On 2010/09/14 19:15:02, jasvir wrote:
>
>
>
http://codereview.appspot.com/2181041/diff/1/java/gadgets/src/main/java/org/a...
>
>> File
>>
>
>
>
java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/DefaultHtmlSerializer.java
>
>> (left):
>>
>
>
>
>
http://codereview.appspot.com/2181041/diff/1/java/gadgets/src/main/java/org/a...
>
>
>
java/gadgets/src/main/java/org/apache/shindig/gadgets/parse/DefaultHtmlSerializer.java:173:
>
>> HtmlSerialization.URL_ATTRIBUTES.contains(attrName);
>> On 2010/09/14 02:53:21, gagan.goku wrote:
>> > Instead of blindly escaping & to &amp;, shouldn't we peek into the
>>
> next 4
>
>> > characters and make sure they are not "amp;" ? Basically &blah ->
>>
> &amp;blah
>
>> > but &amp;blah -> &amp;blah
>>
>
>  Its correct to always escape.  The only reason not to might be to save
>>
> a couple
>
>> of characters but at the risk of missing an entity.  In addition to
>>
> "amp", there
>
>> are ~250 HTML4 entities and an even longer list of HTML5 entities.  In
>>
> addition,
>
>> there are numeric character entities (eg &#65;)
>>
>
>  >  If we are able to do that correctly, that might be a better and
>>
> more robust
>
>> > change.
>>
>
>  I do not understand why this would be more robust.
>>
>
> Say original url as written in html is:
> <img src="http://example.org/q=hello&amp;s=1<http://example.org/q=hello&s=1>"
> />
>
> Now when the parser parses this url, it can return a src attribute whose
> value can be either of:
> http://example.org/q=hello&amp;s=1 <http://example.org/q=hello&s=1>
> http://example.org/q=hello&s=1
>
> Note that a user could also create a DOM node and add an src attribute
> whose value contains &amp; because that is correct in html (like we
> currently do in ConcatVisitor and RenderingGadgetRewriter).
>
> If our serializer blindly escapes & to &amp; then the urls would look
> like:
> http://example.org/q=hello&amp;amp;s=1<http://example.org/q=hello&amp;s=1>
> http://example.org/q=hello&amp;s=1 <http://example.org/q=hello&s=1>
>
> The first url is double escaped and will throw a 404.
>
> Which is why im suggesting that in the serializer, instead of blindly
> escaping & to &amp; we check if the next 4 characters following & are
> "amp;" and skip if thats the case.
> This will be more robust because it can serialize a simple and an
> escaped version of the url.
>
>
That exactly the problem that this try to resolve.
Basically it the responsibility of the serializer to do the escaping, and
the user should not do escaping (unless he wants double escaping).
You might want double escaping when the value itself is another url.
This change make it very clear what is the responsibility of escaping and
make it easy to incorporate new serializers.


>
>  > Also, i think its important to think about why we escape only & and
>>
> ".
>
>> Shouldn't
>> > we also be escaping html unsafe characters like < >. Maybe we could
>>
> use a
>
>> > standard url escape class. These questions can be addressed later if
>>
> we are
>
>> not
>> > sure now, but it would be good to add a TODO for the same so we keep
>>
> track of
>
>> > it.
>> >
>> > What do you think ?
>>
>
>  These characters are escaped because they are attribute values.  The
>>
> escaping
>
>> for attribute values for XHTML is defined by the XML spec
>> (http://www.w3.org/TR/REC-xml/#AVNormalize).  For HTML5, the
>>
> characters to
>
>> escape are ampersand, apostrophes and double quotes for quoted strings
>>
>
>
> (
>
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html...
> )
>
>
>  I think its a very bad idea to escape more characters.  If you escape
>>
> more, then
>
>> either the user code has to decode those characters or they'll follow
>>
> the
>
>> standard.  If they do the former, they will be vulnerable when
>>
> interacting with
>
>> services that do escape correctly and if they do the latter they won't
>>
> get the
>
>> string that was intended.
>>
>
>
>
> http://codereview.appspot.com/2181041/
>

My client didn't have the latest which cause the viewer not to show the
updated tests correctly.
I updated the patch and will commit shortly.

Thanks for the review.
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b