|
|
Created:
14 years, 3 months ago by zhoresh Modified:
14 years, 3 months ago Base URL:
http://svn.apache.org/repos/asf/incubator/shindig/trunk/ Visibility:
Public. |
Description
Use request encoding when converting post data.
Patch Set 1 #
Total comments: 7
Patch Set 2 : Add toUpper for encoding #Patch Set 3 : Move post data convertion to separate function so it can be overrided #MessagesTotal messages: 13
http://codereview.appspot.com/186278/diff/1/2 File java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/MakeRequestHandler.java (right): http://codereview.appspot.com/186278/diff/1/2#newcode119 java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/MakeRequestHandler.java:119: encoding = "UTF-8"; Why would we ever receive a request that was not UTF-8 encoded? http://codereview.appspot.com/186278/diff/1/2#newcode129 java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/MakeRequestHandler.java:129: req.setPostBody(getParameter(request, POST_DATA_PARAM, "").getBytes(encoding)); paul lindner ran into nasty performance problems with code like this: http://paul.vox.com/library/post/the-mysteries-of-java-character-set-performa...
Sign in to reply to this message.
http://codereview.appspot.com/186278/diff/1/2 File java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/MakeRequestHandler.java (right): http://codereview.appspot.com/186278/diff/1/2#newcode119 java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/MakeRequestHandler.java:119: encoding = "UTF-8"; On 2010/01/22 18:16:57, beaton wrote: > Why would we ever receive a request that was not UTF-8 encoded? From the HttpServletRequest javadoc: Returns the name of the character encoding used in the body of this request. This method returns null if the request does not specify a character encoding http://codereview.appspot.com/186278/diff/1/2#newcode129 java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/MakeRequestHandler.java:129: req.setPostBody(getParameter(request, POST_DATA_PARAM, "").getBytes(encoding)); On 2010/01/22 18:16:57, beaton wrote: > paul lindner ran into nasty performance problems with code like this: > > http://paul.vox.com/library/post/the-mysteries-of-java-character-set-performa... If the other character set is only used sporadically, then yes, you'll be okay. Since most sites will tend to stick with one or the other character set this should be okay. Id recommend passing the charset value through a toUpperCase to reduce variance though...
Sign in to reply to this message.
Add toUpper for encoding
Sign in to reply to this message.
On Fri, Jan 22, 2010 at 1:03 PM, <lindner@inuus.com> wrote: > > http://codereview.appspot.com/186278/diff/1/2 > File > > java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/MakeRequestHandler.java > (right): > > http://codereview.appspot.com/186278/diff/1/2#newcode119 > > java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/MakeRequestHandler.java:119: > encoding = "UTF-8"; > On 2010/01/22 18:16:57, beaton wrote: > >> Why would we ever receive a request that was not UTF-8 encoded? >> > > From the HttpServletRequest javadoc: > > Returns the name of the character encoding used in the body of this > request. This method returns null if the request does not specify a > character encoding > > > http://codereview.appspot.com/186278/diff/1/2#newcode129 > > java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/MakeRequestHandler.java:129: > req.setPostBody(getParameter(request, POST_DATA_PARAM, > "").getBytes(encoding)); > On 2010/01/22 18:16:57, beaton wrote: > >> paul lindner ran into nasty performance problems with code like this: >> > > > > http://paul.vox.com/library/post/the-mysteries-of-java-character-set-performa... > > If the other character set is only used sporadically, then yes, you'll > be okay. Since most sites will tend to stick with one or the other > character set this should be okay. Id recommend passing the charset > value through a toUpperCase to reduce variance though... Good suggestion, and "other" charsets *should *only be used sporadically. Still, they might not, which leads me to ask the inevitable: what are the hurdles in Shindig upgrading to Java 6? And I quote Mr Lindner: "If you're using Java 6 you can use pre-calculated static Charset objects which gets you more concurrency. This works for getBytes() and some others. Of course this doesn't help with external class libraries... I added this to our local copy of the postgresql jdbc driver and it helped out a bit." ...among likely other benefits. > > > http://codereview.appspot.com/186278/show >
Sign in to reply to this message.
http://codereview.appspot.com/186278/diff/1/2 File java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/MakeRequestHandler.java (right): http://codereview.appspot.com/186278/diff/1/2#newcode119 java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/MakeRequestHandler.java:119: encoding = "UTF-8"; Sure. But why would our client javascript ever send us a makeRequest POST that was not UTF-8 encoded? I can't figure out a way to make that JS send anything that isn't UTF-8.
Sign in to reply to this message.
http://codereview.appspot.com/186278/diff/1/2 File java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/MakeRequestHandler.java (right): http://codereview.appspot.com/186278/diff/1/2#newcode119 java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/MakeRequestHandler.java:119: encoding = "UTF-8"; On 2010/01/24 23:09:49, beaton wrote: > Sure. But why would our client javascript ever send us a makeRequest POST that > was not UTF-8 encoded? > > I can't figure out a way to make that JS send anything that isn't UTF-8. In practice: the encoding will match the encoding of the original HTML page. HttpServletRequest.getCharacterEncoding() is fairly useless, as browsers don't send the character encoding with the POST (at least didn't, I've not checked if some of the more modern browsers do). If Shindig always encodes its generated HTML as UTF-8, then this will always be UTF-8. I don't have the Shindig code handy at the moment, so I'm not sure if that's true.
Sign in to reply to this message.
http://codereview.appspot.com/186278/diff/1/2 File java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/MakeRequestHandler.java (right): http://codereview.appspot.com/186278/diff/1/2#newcode119 java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/MakeRequestHandler.java:119: encoding = "UTF-8"; I love browser quirks. I think that when the browsers submit an HTML form, they use the codeset of the parent page. But XMLHttpRequest is different. Safari, IE, Chrome - Send the raw bytes specified by the client. Send the raw content-type header specified by the client. No guarantee that content-type header has a charset, or that the charset matches the data. Firefox - Send the raw bytes specified by the client. Always set charset=UTF-8 in the content-type header. No guarantee that the charset matches the data. I think that's a bug in FF. I didn't do browser OS or version testing, because I am lazy. I also didn't test the impact of OS locale settings. Test cases for browser behavior are here: http://sandblower.net/xhr-iso-8859-1-script-has-charset.html http://sandblower.net/xhr-iso-8859-1.html http://sandblower.net/xhr-utf8-script-has-charset.html http://sandblower.net/xhr-utf8.html I also experimented a bit with gadgets.io.makeRequest, and I do not understand the results. As far as I can tell, javascript magically decides that the string 'test=\xe4\xeb\xef\xf6\xfc' is latin-1, and then converts it to UTF-8 before submission. The submitted data (verified with a packet capture) is postData=test%3D%C3%A4%C3%AB%C3%AF%C3%B6%C3%BC. I'm completely mystified by how that happens, but it does happen on Safari, IE, FF, and Chrome. If we can rely on that behavior, we're golden. Always assume the data is UTF-8 encoded, and we're good to go.
Sign in to reply to this message.
Thanks for all the research here, but I think we are getting a bit off the subject of what I was trying to resolve here. To clarify the problem the attached change try to fix: If the charset is not specified for the function getBytes, it uses the JVM runtime charset. In our case it uses iso8859-1 instead of the utf-8, which results in question marks for all non English characters. The change specify the charset for getBytes so it won't depend on the runtime environment. The charset as you said will probably always be utf-8. As for the capability of sending anything other then utf-8 it seems that it can not be done at this point. But I don't see the harm in the way it is handled now. Someone ask me here about sending a binary data using post. Even if the API supports it, I don't think it can be done now because we use strings internally. Probably the long term change here would be to pass the stream as is without converting to string. Thanks -Ziv On Wed, Jan 27, 2010 at 10:52 AM, <beaton@google.com> wrote: > > http://codereview.appspot.com/186278/diff/1/2 > File > > java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/MakeRequestHandler.java > (right): > > http://codereview.appspot.com/186278/diff/1/2#newcode119 > > java/gadgets/src/main/java/org/apache/shindig/gadgets/servlet/MakeRequestHandler.java:119: > encoding = "UTF-8"; > I love browser quirks. I think that when the browsers submit an HTML > form, they use the codeset of the parent page. But XMLHttpRequest is > different. > > Safari, IE, Chrome - > Send the raw bytes specified by the client. > Send the raw content-type header specified by the client. > No guarantee that content-type header has a charset, or that the > charset matches the data. > > Firefox - > Send the raw bytes specified by the client. > Always set charset=UTF-8 in the content-type header. > No guarantee that the charset matches the data. > > I think that's a bug in FF. I didn't do browser OS or version testing, > because I am lazy. I also didn't test the impact of OS locale settings. > > Test cases for browser behavior are here: > > http://sandblower.net/xhr-iso-8859-1-script-has-charset.html > http://sandblower.net/xhr-iso-8859-1.html > http://sandblower.net/xhr-utf8-script-has-charset.html > http://sandblower.net/xhr-utf8.html > > I also experimented a bit with gadgets.io.makeRequest, and I do not > understand the results. As far as I can tell, javascript magically > decides that the string 'test=\xe4\xeb\xef\xf6\xfc' is latin-1, and then > converts it to UTF-8 before submission. The submitted data (verified > with a packet capture) is > postData=test%3D%C3%A4%C3%AB%C3%AF%C3%B6%C3%BC. > > I'm completely mystified by how that happens, but it does happen on > Safari, IE, FF, and Chrome. > > If we can rely on that behavior, we're golden. Always assume the data > is UTF-8 encoded, and we're good to go. > > > http://codereview.appspot.com/186278/show >
Sign in to reply to this message.
Move post data convertion to separate function so it can be overrided
Sign in to reply to this message.
point taken... I've been running my JVMs with UTF-8 charset forever, and didn't consider this. I'll apply this. Thanks
Sign in to reply to this message.
Thanks Paul, I was just about to get to it myself. On Thu, Jan 28, 2010 at 5:55 PM, <lindner@inuus.com> wrote: > point taken... I've been running my JVMs with UTF-8 charset forever, and > didn't consider this. I'll apply this. > > Thanks > > > > http://codereview.appspot.com/186278/show >
Sign in to reply to this message.
I committed this, however on 2nd thought I think we need to insure that patches get routed through Jira so that the 'grant license to apache' pieces work. zhoresh can you please file a jira and attach the latest version of the patch from codereview? Thanks! On Thu, Jan 28, 2010 at 6:10 PM, John Hjelmstad <fargo@google.com> wrote: > Thanks Paul, I was just about to get to it myself. > > > On Thu, Jan 28, 2010 at 5:55 PM, <lindner@inuus.com> wrote: > >> point taken... I've been running my JVMs with UTF-8 charset forever, and >> didn't consider this. I'll apply this. >> >> Thanks >> >> >> >> http://codereview.appspot.com/186278/show >> > >
Sign in to reply to this message.
|