http://bugs.python.org/issue3187 Committed revision 66743.
Hi Victor, I've combined your patches and added some comments. http://codereview.appspot.com/3055/diff/1/4 File Lib/fnmatch.py (left): http://codereview.appspot.com/3055/diff/1/4#oldcode50 Line 50: # normcase on posix is NOP. Optimize it away from the loop. Why'd you get rid of this optimization? http://codereview.appspot.com/3055/diff/1/4 File Lib/fnmatch.py (right): http://codereview.appspot.com/3055/diff/1/4#newcode41 Line 41: def filter(names, pat): I wonder if we shouldn't also support isinstance(pat, bytes). Or perhaps key on that and insist that the type of the names match the type of pat... http://codereview.appspot.com/3055/diff/1/3 File Lib/glob.py (right): http://codereview.appspot.com/3055/diff/1/3#newcode56 Line 56: except UnicodeDecodeError: Why catch this exception? Again, ISTM that this API should be changed so that either dirname and pattern are both strings, or they are both bytes. http://codereview.appspot.com/3055/diff/1/5 File Lib/io.py (right): http://codereview.appspot.com/3055/diff/1/5#newcode183 Line 183: if not isinstance(file, (str, bytes, int)): Good catch. However I'm not sure if the file is allowed to be a bytes instance on Windows... http://codereview.appspot.com/3055/diff/1/2 File Lib/posixpath.py (right): http://codereview.appspot.com/3055/diff/1/2#newcode63 Line 63: use_bytes = any( isinstance(part, bytes) for part in (a,)+p ) Style nit: don't put spaces within the ( parentheses ). http://codereview.appspot.com/3055/diff/1/2#newcode66 Line 66: if use_bytes: Again, I'd recommend changing this so that the arguments must either be all str or all bytes.
Victor's latest patch (Sept 29 '08)
Could you add docs and unittests for all the new functionality? http://codereview.appspot.com/3055/diff/401/603 File Lib/fnmatch.py (right): http://codereview.appspot.com/3055/diff/401/603#newcode1 Line 1: """Filename matching with shell patterns. Please complete do your TODO tasks: - support non-ASCII bytes in fnmatch.filter() - fix fnmatch.fnmatchcase() http://codereview.appspot.com/3055/diff/401/603#newcode46 Line 46: if isinstance(pat, bytes): I wish we could also allow bytearray, without having to say it everywhere... http://codereview.appspot.com/3055/diff/401/603#newcode47 Line 47: pat_str = str(pat, "ASCII") I'd use 'ASCII' here and below, matching the quoting style of the rest of the file. http://codereview.appspot.com/3055/diff/401/602 File Lib/glob.py (right): http://codereview.appspot.com/3055/diff/401/602#newcode53 Line 53: dirname = dirname.encode("ASCII") I'd use 'ASCII', since single quotes are the prevailing quoting style in this file. http://codereview.appspot.com/3055/diff/401/601 File Lib/posixpath.py (right): http://codereview.appspot.com/3055/diff/401/601#newcode1 Line 1: """Common operations on Posix pathnames. Also note that os.path.dirname() and os.path.basename(), which sound like thin wrappers around os.path.split(), are somehow not implemented that way and currently fail; I think these should be fixed too (maybe by deferring to os.path.split()). http://codereview.appspot.com/3055/diff/401/605 File Modules/posixmodule.c (left): http://codereview.appspot.com/3055/diff/401/605#oldcode6814 Line 6814: {"getcwdu", posix_getcwdu, METH_NOARGS, posix_getcwdu__doc__}, The deletion of getcwdu() caused (at least) one test to fail, test_unicode_file.py. The fix is to switch the _do_directory() method in that file to getcwdb() where it was using getcwd() and getcwd() where it was using getcwdu(). http://codereview.appspot.com/3055/diff/401/605 File Modules/posixmodule.c (right): http://codereview.appspot.com/3055/diff/401/605#newcode2357 Line 2357: else { I'd like to see a comment here, e.g. /* Ignore undecodable filenames, as discussed in issue 3187. To include these, use getcwdb(). */
More complete patch from Victor
Re: docs: I meant docs in the Doc directory. It looks like I was too quick to ask for supporting bytearray -- consider that request undone. Thanks making progress! We'll get this done before the week is over. http://codereview.appspot.com/3055/diff/610/412 File Lib/fnmatch.py (right): http://codereview.appspot.com/3055/diff/610/412#newcode41 Line 41: if not pat in _cache: Oops, you can't check for <bytearray> in <dict>, since bytearray doesn't have __hash__. I'm taking back my request to support bytearray, sorry. http://codereview.appspot.com/3055/diff/610/411 File Lib/glob.py (right): http://codereview.appspot.com/3055/diff/610/411#newcode53 Line 53: dirname = dirname.encode('ASCII') It might be more readable to write dirname = bytes(os.curdir, 'ASCII') here. http://codereview.appspot.com/3055/diff/610/413 File Lib/io.py (right): http://codereview.appspot.com/3055/diff/610/413#newcode183 Line 183: if not isinstance(file, (str, bytes, bytearray, int)): No, not here -- I don't want to store a mutable object in self.name. Unless you want add a line of code that converts the bytearray to a bytes object. http://codereview.appspot.com/3055/diff/610/417 File Lib/test/test_posixpath.py (right): http://codereview.appspot.com/3055/diff/610/417#newcode46 Line 46: self.assertEqual(posixpath.join(b"/foo", b"bar", b"/bar", b"baz"), b"/bar/baz") Please keep line lengths < 80 chars.
Hi Victor, I hope you're getting these emails. I'll be checking this in, with some edits. Can you work on the doc changes?
Number 3 from Victor
Submitted as r66743. Thanks Victor!