Issue 2827: [issue3300] urllib.quote and unquote - Unicode issues

GvR

Hi Matt, Here's a code review of your patch. I'm leaning more and more towards ...

16 years, 10 months ago (2008-08-06 21:39:34 UTC) #1

Hi Matt,

Here's a code review of your patch.

I'm leaning more and more towards wanting this for 3.0, but I have some API
design issues and also some style nits.

I'm cross-linking this with the Python tracker issue, through the subject.

http://codereview.appspot.com/2827/diff/1/2
File Doc/library/urllib.parse.rst (right):

http://codereview.appspot.com/2827/diff/1/2#newcode198
Line 198: replaced by a placeholder character.
I don't think that's a good default.  I'd rather see it default to strict --
that's what encoding translates to everywhere else.  I believe that lenient
error handling by default can cause subtle security problems too, by hiding
problem characters from validation code.

http://codereview.appspot.com/2827/diff/1/2#newcode215
Line 215: An alias for :func:`quote`, intended for use with a :class:`bytes`
object
I'd rather see this as a wrapper that raises TypeError if the argument isn't a
bytes or bytearray instance.  Otherwise it's needless redundancy.

http://codereview.appspot.com/2827/diff/1/2#newcode223
Line 223: Replace ``%xx`` escapes by their single-character equivalent.
Should add what the argument type is -- I vote for str or bytes/bytearray.

http://codereview.appspot.com/2827/diff/1/2#newcode242
Line 242: .. function:: unquote_to_bytes(string)
Again, add what the argument type is.

http://codereview.appspot.com/2827/diff/1/4
File Lib/email/utils.py (right):

http://codereview.appspot.com/2827/diff/1/4#newcode224
Line 224: except:
An unqualified except clause is unacceptable here.  Why do you need this anyway?

http://codereview.appspot.com/2827/diff/1/5
File Lib/test/test_http_cookiejar.py (right):

http://codereview.appspot.com/2827/diff/1/5#newcode1450
Line 1450: "%3c%3c%0Anew%C3%A5/%C3%A5",
I'm guessing this test broke otherwise?  Given that this references an RFC, is
it correct to just fix it this way?

http://codereview.appspot.com/2827/diff/1/3
File Lib/urllib/parse.py (right):

http://codereview.appspot.com/2827/diff/1/3#newcode10
Line 10: "urlsplit", "urlunsplit"]
Please add all the quote/unquote versions here too.
(They were there in 2.5, but somehow got dropped from 3.0.

http://codereview.appspot.com/2827/diff/1/3#newcode265
Line 265: # Maps lowercase and uppercase variants (but not mixed case).
That sounds like a disaster.  Why would %aa and %AA be correct but not %aA and
%Aa?  (Even though the old code had the same problem.)

http://codereview.appspot.com/2827/diff/1/3#newcode283
Line 283: def unquote(s, encoding = "utf-8", errors = "replace"):
Please no spaces around the '=' when used for an argument default (or for a
keyword arg).

Also see my comment about defaulting to 'replace' in the doc file.

Finally -- let's be consistent about quotes.  It seems most of this file uses
single quotes, so lets stick to that (except docstrings always use double
quotes).

And more: what should a None value for encoding or errors mean?  IMO it should
mean "use the default".

http://codereview.appspot.com/2827/diff/1/3#newcode382
Line 382: safe = safe.encode('ascii', 'ignore')
Using errors='ignore' seems like a mistake -- it will hide errors.

I also wonder why safe should be limited to ASCII though.

http://codereview.appspot.com/2827/diff/1/3#newcode399
Line 399: if ' ' in s:
This test means that it won't work if the input is bytes.  E.g.

urllib.parse.quote_plus(b"abc def")

raises a TypeError.

Sign in to reply to this message.

mgiuca

Hi Guido, Thanks very much for this very detailed review. I've replied to the comments. ...

16 years, 10 months ago (2008-08-07 12:01:31 UTC) #2

Hi Guido,

Thanks very much for this very detailed review. I've replied to the comments. I
will make the changes as described below and send a new patch to the tracker.

http://codereview.appspot.com/2827/diff/1/2
File Doc/library/urllib.parse.rst (right):

http://codereview.appspot.com/2827/diff/1/2#newcode198
Line 198: replaced by a placeholder character.
Okay. I did errors='replace' because URLs often come from users (eg. typed into
a web browser) so it's most likely we're just going to want to ignore errors.

I have no strong opinion on this though.

http://codereview.appspot.com/2827/diff/1/2#newcode215
Line 215: An alias for :func:`quote`, intended for use with a :class:`bytes`
object
Yup, OK. I posted exactly that to the mailing list awhile back. I'll put that in
the next patch.

http://codereview.appspot.com/2827/diff/1/2#newcode223
Line 223: Replace ``%xx`` escapes by their single-character equivalent.
Currently I think unquote only accepts a str. Should it also accept a
bytes/bytearray?

http://codereview.appspot.com/2827/diff/1/4
File Lib/email/utils.py (right):

http://codereview.appspot.com/2827/diff/1/4#newcode224
Line 224: except:
I *think* it's to catch the case where charset=None (the default). It's an awful
way to do it, I agree.

There are no test cases for this function, so it's hard to tell either way.

It's difficult to tell how to handle this. I think the string should be encoded
using charset (this would be true with or without this patch - as evidenced by
line 301 which decodes with charset). But I'm not sure what to do with
charset=None. Currently it defaults to UTF-8.

For the next patch, I will just change it to use an if statement rather than a
try/except, but it will have this same functionality.

http://codereview.appspot.com/2827/diff/1/5
File Lib/test/test_http_cookiejar.py (right):

http://codereview.appspot.com/2827/diff/1/5#newcode1450
Line 1450: "%3c%3c%0Anew%C3%A5/%C3%A5",
As far as I can tell, it's any arbitrary URL allowed here. The RFC says nothing
about encoding. Therefore, I assume an implementation can decide to encode a
string one way or another, as long as it results in a valid URL which is used as
the part of the cookie identifier.

http://codereview.appspot.com/2827/diff/1/3
File Lib/urllib/parse.py (right):

http://codereview.appspot.com/2827/diff/1/3#newcode10
Line 10: "urlsplit", "urlunsplit"]
OK.

http://codereview.appspot.com/2827/diff/1/3#newcode265
Line 265: # Maps lowercase and uppercase variants (but not mixed case).
Yes, I know!! I was going to suggest this as a separate fix, since it's a
separate bug. But since you bring it up, I'll correct it in this patch.

http://codereview.appspot.com/2827/diff/1/3#newcode283
Line 283: def unquote(s, encoding = "utf-8", errors = "replace"):
All noted.

http://codereview.appspot.com/2827/diff/1/3#newcode382
Line 382: safe = safe.encode('ascii', 'ignore')
> Using errors='ignore' seems like a mistake -- it will hide errors.

All this means is that non-ASCII characters get dropped from the "safe" set. I
can see the issue with this though - however the change below means this will
probably go away.

> I also wonder why safe should be limited to ASCII though.

This was a bit of a problem. I thought it was a conceptual/fundamendal issue,
but now that I think about it more, it's just a technical problem.

The first thing quote does is call s.encode(). Therefore, from that point on, we
can only deal in bytes, and can't distinguish one character from another. I
think this needs to change; that will mean rather than encoding right away,
encoding is done in Quoter.__call__ on a per-character basis. This could mess
with the ability to accept either bytes or str. I'll give it a go.

http://codereview.appspot.com/2827/diff/1/3#newcode399
Line 399: if ' ' in s:
Good point.

Sign in to reply to this message.

Antoine Pitrou

http://codereview.appspot.com/2827/diff/1/4 File Lib/email/utils.py (right): http://codereview.appspot.com/2827/diff/1/4#newcode224 Line 224: except: On 2008/08/07 12:01:31, mgiuca wrote: > But ...

16 years, 10 months ago (2008-08-07 12:49:11 UTC) #3

Antoine Pitrou

http://codereview.appspot.com/2827/diff/17/28 File Lib/urllib/parse.py (right): http://codereview.appspot.com/2827/diff/17/28#newcode275 Line 275: except KeyError: If this is supposed to catch ...

16 years, 10 months ago (2008-08-07 17:29:40 UTC) #5

GvR

New code review. Also please address Antoine's comments. http://codereview.appspot.com/2827/diff/1/2 File Doc/library/urllib.parse.rst (right): http://codereview.appspot.com/2827/diff/1/2#newcode198 Line 198: ...

16 years, 10 months ago (2008-08-07 17:38:01 UTC) #6

New code review. Also please address Antoine's comments.

http://codereview.appspot.com/2827/diff/1/2
File Doc/library/urllib.parse.rst (right):

http://codereview.appspot.com/2827/diff/1/2#newcode198
Line 198: replaced by a placeholder character.
On 2008/08/07 12:01:31, mgiuca wrote:
> Okay. I did errors='replace' because URLs often come from users (eg. typed
into
> a web browser) so it's most likely we're just going to want to ignore errors.
> 
> I have no strong opinion on this though.

I do. :-)  Like everywhere else, we should default to strict, and let the
application change the setting if they have a use for a more lenient setting. 
Fail-fast is Pythonic behavior.

(It's bad enough that unrecognized % escapes don't raise an exception, but
that's always been like that.  I doubt if it's useful though.)

http://codereview.appspot.com/2827/diff/1/2#newcode223
Line 223: Replace ``%xx`` escapes by their single-character equivalent.
On 2008/08/07 12:01:31, mgiuca wrote:
> Currently I think unquote only accepts a str. Should it also accept a
> bytes/bytearray?

Since the encoded form is ASCII, and there is uniform confusion about the input
and output types possible anyway, I think so.

http://codereview.appspot.com/2827/diff/1/8
File Lib/test/test_urllib.py (right):

http://codereview.appspot.com/2827/diff/1/8#newcode435
Line 435: result = urllib.parse.quote(given, encoding="latin-1")
On 2008/08/07 12:49:11, Antoine Pitrou wrote:
> Wouldn't it be better to throw a TypeError when an encoding is given for a
bytes
> string?
> 

(Please reply.)

http://codereview.appspot.com/2827/diff/17/29
File Lib/email/utils.py (right):

http://codereview.appspot.com/2827/diff/17/29#newcode222
Line 222: s = urllib.parse.quote(s, safe='', encoding=charset)
Antoine's suggestion was to use encoding=(charset or 'ascii')

http://codereview.appspot.com/2827/diff/17/28
File Lib/urllib/parse.py (right):

http://codereview.appspot.com/2827/diff/17/28#newcode275
Line 275: except KeyError:
On 2008/08/07 17:29:40, Antoine Pitrou wrote:
> If this is supposed to catch non-hex characters, it should be ValueError
because
> it's what bytes.fromhex() raises, not KeyError.
> 

Right, the KeyError was from the previous incarnation which used a dict.  Please
fix.

Sign in to reply to this message.

mgiuca

Oh dear, it seems I never published my comments. I made all of these before ...

16 years, 10 months ago (2008-08-10 05:39:34 UTC) #7

Oh dear, it seems I never published my comments. I made all of these before I
submitted patch 8. Very sorry!

http://codereview.appspot.com/2827/diff/1/2
File Doc/library/urllib.parse.rst (right):

http://codereview.appspot.com/2827/diff/1/2#newcode198
Line 198: replaced by a placeholder character.
On 2008/08/07 12:01:31, mgiuca wrote:
> Okay. I did errors='replace' because URLs often come from users (eg. typed
into
> a web browser) so it's most likely we're just going to want to ignore errors.
> 
> I have no strong opinion on this though.

Done.

http://codereview.appspot.com/2827/diff/1/2#newcode215
Line 215: An alias for :func:`quote`, intended for use with a :class:`bytes`
object
On 2008/08/07 12:01:31, mgiuca wrote:
> Yup, OK. I posted exactly that to the mailing list awhile back. I'll put that
in
> the next patch.

Done.

http://codereview.appspot.com/2827/diff/1/2#newcode242
Line 242: .. function:: unquote_to_bytes(string)
On 2008/08/06 21:39:34, GvR wrote:
> Again, add what the argument type is.

Done.

http://codereview.appspot.com/2827/diff/1/4
File Lib/email/utils.py (right):

http://codereview.appspot.com/2827/diff/1/4#newcode224
Line 224: except:
If it's going to be ASCII, might as well be UTF-8, I think. In that case, I've
removed the try/except block and just pass charset (quote can now handle
charset=None).

If you really think it should be ASCII-only, then your code-wise suggestion is
good.

http://codereview.appspot.com/2827/diff/1/8
File Lib/test/test_urllib.py (right):

http://codereview.appspot.com/2827/diff/1/8#newcode435
Line 435: result = urllib.parse.quote(given, encoding="latin-1")
I think you're right, it probably should throw an error. Not fixed in upcoming
patch (8), since I'm expecting changes to quote anyway.

http://codereview.appspot.com/2827/diff/1/3
File Lib/urllib/parse.py (right):

http://codereview.appspot.com/2827/diff/1/3#newcode10
Line 10: "urlsplit", "urlunsplit"]
On 2008/08/07 12:01:31, mgiuca wrote:
> OK.

Done.

http://codereview.appspot.com/2827/diff/1/3#newcode265
Line 265: # Maps lowercase and uppercase variants (but not mixed case).
On 2008/08/07 12:01:31, mgiuca wrote:
> Yes, I know!! I was going to suggest this as a separate fix, since it's a
> separate bug. But since you bring it up, I'll correct it in this patch.

Done.

http://codereview.appspot.com/2827/diff/1/3#newcode283
Line 283: def unquote(s, encoding = "utf-8", errors = "replace"):
On 2008/08/07 12:01:31, mgiuca wrote:
> All noted.

Done.

http://codereview.appspot.com/2827/diff/1/3#newcode382
Line 382: safe = safe.encode('ascii', 'ignore')
Oh, I remembered the reasoning behind it (as explained in test_urllib.py) - if
we allow non-ASCII characters to be escaped, then we allow quote to generate
invalid URIs (URIs are only allowed to have ASCII characters). It's one thing
for unquote to accept such URIs, but I think we shouldn't be producing them.

Albeit, it only produces an invalid URI if you explicitly request it. So I'm
happy to make the change to allow any character to be safe, but I'll let it go
to discussion first.

http://codereview.appspot.com/2827/diff/1/3#newcode399
Line 399: if ' ' in s:
On 2008/08/07 12:01:31, mgiuca wrote:
> Good point.

Done.

http://codereview.appspot.com/2827/diff/17/29
File Lib/email/utils.py (right):

http://codereview.appspot.com/2827/diff/17/29#newcode222
Line 222: s = urllib.parse.quote(s, safe='', encoding=charset)
encoding=charset falls back to UTF-8, which covers ASCII. I'm happy to
explicitly fall back to ASCII though.

Sign in to reply to this message.

mgiuca

http://codereview.appspot.com/2827/diff/17/28 File Lib/urllib/parse.py (right): http://codereview.appspot.com/2827/diff/17/28#newcode275 Line 275: except KeyError: Good catch. No test case caught ...

16 years, 10 months ago (2008-08-10 05:53:52 UTC) #8

mgiuca

http://codereview.appspot.com/2827/diff/1/2 File Doc/library/urllib.parse.rst (right): http://codereview.appspot.com/2827/diff/1/2#newcode223 Line 223: Replace ``%xx`` escapes by their single-character equivalent. I ...

16 years, 10 months ago (2008-08-10 07:06:56 UTC) #9

mgiuca

http://codereview.appspot.com/2827/diff/17/28 File Lib/urllib/parse.py (right): http://codereview.appspot.com/2827/diff/17/28#newcode275 Line 275: except KeyError: On 2008/08/10 05:53:52, mgiuca wrote: > ...

16 years, 10 months ago (2008-08-10 07:45:01 UTC) #10

GvR

Here's finally my code review. Most stuff was already discussed in the tracker however. http://codereview.appspot.com/2827/diff/82/83 ...

16 years, 10 months ago (2008-08-13 17:50:13 UTC) #12

GvR

BTW, I also had a look at Bill's patch. The one good idea I hope ...

16 years, 10 months ago (2008-08-13 18:48:33 UTC) #13

mgiuca

Hi guys, thanks for these detailed reviews. Here are my comments to accompany patch 10. ...

16 years, 10 months ago (2008-08-14 15:28:59 UTC) #14

Hi guys, thanks for these detailed reviews.

Here are my comments to accompany patch 10.

(Sorry Guido, I just submitted patch 10 to the tracker again; I think me
creating a new Rietveld issue at this stage would just make things more
confusing).

Matt

http://codereview.appspot.com/2827/diff/82/83
File Doc/library/urllib.parse.rst (right):

http://codereview.appspot.com/2827/diff/82/83#newcode185
Line 185: .. function:: quote(string[, safe[, encoding[, errors]]])
OK I will change the argument to "string" (in the code).

We could force them to be kwargs, but note that the string.encode method forces
them as positional args (you can't supply keywords), so this would be directly
the opposite of str.encode (and I think you are wanting them to be the same
interface).

http://codereview.appspot.com/2827/diff/82/83#newcode214
Line 214: .. function:: quote_from_bytes(bytes[, safe])
Note that here (in the code), I didn't change the arg name to "bytes" - just "b"
(to avoid conflict with global bytes). Could hack around it, but it seemed more
trouble than it was worth.

http://codereview.appspot.com/2827/diff/82/83#newcode219
Line 219: Example: ``quote_from_bytes(b'/El Ni\xc3\xb1o/')`` yields
Do you mean just for the quote_from_bytes example? (Leaving the quote example as
utf-8 to show the default in action)? Perhaps this example should be some random
byte string instead, to show that it isn't in fact characters at all.

(Done).

http://codereview.appspot.com/2827/diff/82/83#newcode220
Line 220: ``'/El%20Ni%C3%B1o/'``.
Yes. (Done).

http://codereview.appspot.com/2827/diff/82/85
File Lib/email/utils.py (right):

http://codereview.appspot.com/2827/diff/82/85#newcode277
Line 277: s = urllib.parse.unquote(s, encoding="latin-1")
Well I wasn't *entirely* sure what was happening here, but I think this string
eventually gets used by collapse_rfc2231_value as raw bytes.

By adding the encoding="latin-1", this should behave *precisely* as it did
before the 3300 patch. So I shouldn't be breaking anything.

I sent Barry an email asking for clarification.

http://codereview.appspot.com/2827/diff/82/84
File Lib/urllib/parse.py (right):

http://codereview.appspot.com/2827/diff/82/84#newcode265
Line 265: def unquote_to_bytes(s):
Hm. OK we can support this by simply allowing unquote_to_bytes to accept a bytes
OR a str, as we were originally talking about doing for unquote.

I still don't think it makes sense for unquote to accept a bytes (since you then
need to DEcode those bytes somehow, since it's bytes->str), but it does make
more sense for unquote_to_bytes to be bytes->bytes, and it's a very easy change.

Note that this is a compelling reason to use unquote, since it preserves
unquoted Unicode characters without this issue ever coming up.

(Done).

http://codereview.appspot.com/2827/diff/82/84#newcode269
Line 269: res = s.split('%')
On 2008/08/13 17:50:14, GvR wrote:
> If you make this
> 
> res = s.encode('utf-8').split()
> 
> then you won't have to encode each piece separately.  It also makes is easier
to
> move the encoding into a separate function.

Done.

http://codereview.appspot.com/2827/diff/82/84#newcode275
Line 275: res[i] = bytes.fromhex(item[:2]) + item[2:].encode('utf-8')
Has to be:
bytes([int(item[:2], 16)])

Done.

http://codereview.appspot.com/2827/diff/82/84#newcode280
Line 280: def unquote(s, encoding='utf-8', errors='strict'):
errors='replace' now, on both unquote and unquote_plus, as discussed on tracker.
("bogus data is better than an exception")

http://codereview.appspot.com/2827/diff/82/84#newcode292
Line 292: # pct_sequence: contiguous sequence of percent-encoded bytes, decoded
Ah yes, I originally had something like that, but this makes it impossible to
get BOTH a) allow arbitrary unescaped non-ASCII characters to be unchanged, AND
b) allow non-UTF-8 encodings.

(The reason being that if you unquote_to_bytes, it encodes the unescaped
characters in UTF-8, and un-percent-encodes the percent-encoded characters in
"encoding" - so there is a mismatch unless encoding="utf-8". If you encode
unescaped characters with "encoding" instead, you can't represent characters
outside of encoding's repertoire).

I originally had what you suggested, but realised this and refactored to what is
admittedly more complicated code. Is there a simpler way to achieve both the
above goals?

(This implementation is very careful to never encode the unescaped characters,
so there's no chance of them getting mangled).

http://codereview.appspot.com/2827/diff/82/84#newcode335
Line 335: class Quoter:
On 2008/08/13 17:50:14, GvR wrote:
> As discussed in the tracker, please replace this with something based on
> defaultdict.

Done.
(See tracker for speed tests).

http://codereview.appspot.com/2827/diff/82/84#newcode381
Line 381: errors='strict' (unsupported characters raise a UnicodeEncodeError).
> strict -> replace.

You mean to put this on unquote, right? Not quote. (As discussed on the
tracker). That's done. Leaving this as strict.

http://codereview.appspot.com/2827/diff/82/84#newcode388
Line 388: s = s.encode(encoding or 'utf-8', errors or 'strict')
On 2008/08/13 17:50:14, GvR wrote:
> strict -> replace.

Leaving strict.

> Also, I'm not super comfortable with interpreting '' as the default. IMO these
> should just error out, like they do in other places, as the invalid encoding /
> errors value they are.

Done (only None causes default behaviour).

http://codereview.appspot.com/2827/diff/82/84#newcode390
Line 390: if encoding:
On 2008/08/13 17:50:14, GvR wrote:
> if encoding is not None:

Done.

http://codereview.appspot.com/2827/diff/82/84#newcode392
Line 392: if errors:
On 2008/08/13 17:50:14, GvR wrote:
> if errors is not None:

Done.

http://codereview.appspot.com/2827/diff/82/84#newcode394
Line 394: try:
On 2008/08/13 17:50:14, GvR wrote:
> I'd replace the rest with a call to quote_from_bytes().

Done.

http://codereview.appspot.com/2827/diff/82/84#newcode413
Line 413: def quote_from_bytes(s, safe='/'):
On 2008/08/13 17:50:14, GvR wrote:
> Needs a docstring.

Done.

http://codereview.appspot.com/2827/diff/82/84#newcode426
Line 426: return ''.join(res)
On 2008/08/13 17:50:14, GvR wrote:
> I'd collapse the latter two lines into
> 
> return ''.join(map(quoter, s))
> 
> especially since map() returns an iterator, not a list!

Done.

Sign in to reply to this message.

GvR

Hey Matt, I have a bunch of tiny nits, but there's no need for you ...

16 years, 10 months ago (2008-08-18 20:20:32 UTC) #16

GvR

16 years, 10 months ago (2008-08-18 21:45:42 UTC) #17

Submitted as r65838, with these changes.

http://codereview.appspot.com/2827/diff/79/103
File Lib/urllib/parse.py (right):

http://codereview.appspot.com/2827/diff/79/103#newcode277
Line 277: if item == b'': raise ValueError
On 2008/08/18 20:20:32, GvR wrote:
> You don't need this line.  int(b'', 16) will raise ValueError for you, at no
> extra cost. ;-)

Done.

http://codereview.appspot.com/2827/diff/79/103#newcode302
Line 302: if item == '': raise ValueError
On 2008/08/18 20:20:32, GvR wrote:
> if not item: ...

Done.

http://codereview.appspot.com/2827/diff/79/103#newcode307
Line 307: if len(rest) == 0:
On 2008/08/18 20:20:32, GvR wrote:
> if not rest:

Done.

http://codereview.appspot.com/2827/diff/79/103#newcode317
Line 317: if len(pct_sequence) > 0:
On 2008/08/18 20:20:32, GvR wrote:
> if pct_sequence:

Done.

http://codereview.appspot.com/2827/diff/79/103#newcode319
Line 319: # res[-1] will always be empty if pct_sequence != []
On 2008/08/18 20:20:32, GvR wrote:
> Express this with an assert?  E.g.
> 
>   assert not res[-1]

Done.

http://codereview.appspot.com/2827/diff/79/103#newcode332
Line 332: always_safe = frozenset(
On 2008/08/18 20:20:32, GvR wrote:
> Now that this is really a constant, why not name it with a leading underscore
> and all caps?  I.e. _ALWAYS_SAFE

Done.

http://codereview.appspot.com/2827/diff/79/103#newcode340
Line 340: A mapping from bytes (in range(0,256)) to strings.
On 2008/08/18 20:20:32, GvR wrote:
> I'd prefer to stick to the PEP-8 docstring formatting rules.

Done.

http://codereview.appspot.com/2827/diff/79/103#newcode381
Line 381: s and safe may be either str or bytes objects. encoding must not be
On 2008/08/18 20:20:32, GvR wrote:
> s -> string here and in the next line

Done.

http://codereview.appspot.com/2827/diff/79/103#newcode416
Line 416: not perform string-to-bytes encoding.
On 2008/08/18 20:20:32, GvR wrote:
> It always returns an ASCII string.

Done.

http://codereview.appspot.com/2827/diff/79/103#newcode430
Line 430: return ''.join(quoter[c] for c in bs)
On 2008/08/18 20:20:32, GvR wrote:
> return ''.join(map(quoter.__getitem__, bs))

Done.

Sign in to reply to this message.

Issue 2827: [issue3300] urllib.quote and unquote - Unicode issues (Closed)

Description

Patch Set 1 #

Patch Set 2 : parse.py.patch8 #

Patch Set 3 : patch 9 #

Patch Set 4 : patch 10 #

Messages