|
|
Created:
15 years, 4 months ago by Jacob Rus Modified:
13 years, 10 months ago Base URL:
http://svn.python.org/view/*checkout*/python/trunk/ Visibility:
Public. |
DescriptionWell, no one commented on my original uploaded patch, but here's a new one:
Again, this is from:
http://bugs.python.org/issue6626
Previous patch:
http://codereview.appspot.com/104091/show
Patch Set 1 #
Total comments: 48
Downloaded from: http://bugs.python.org/file14731/mimetypes5.diff
MessagesTotal messages: 25
copy some comments over from http://codereview.appspot.com/104091/diff/1/2 http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (left): http://codereview.appspot.com/107042/diff/1/2#oldcode222 Line 222: The way all the functions below (I removed them) work is just horribly confusing. Admittedly somewhat less so than the days when they merely said, something to the effect of: def guess_type(url): init() return guess_type(url) But still pretty confusing, not to mention verbose. http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode59 Line 59: ] I reordered these so that more recent Apache versions will come later, so that we don't overwrite the more recent mappings with out-of-date ones. Personally, I'd prefer to remove the dependency on Apache, and just ship a big list of types inside the standard library, either inline in this module (as a list of tuples) or as a separate file. http://codereview.appspot.com/107042/diff/1/2#newcode260 Line 260: } Previously, anyone who redefined suffix_map would permanently prevent other users from getting a 'clean' MimeTypes object, unless they called the private _default_mime_types function (as the tests did). This seemed pretty broken. Now, instead, making a new MimeTypes() object pulls from these private dicts, so that changing the public dicts only affects the module's MimeTypes singleton. http://codereview.appspot.com/107042/diff/1/2#newcode389 Line 389: ] These have been changed to a list of tuples so that declaring the same extension with two different types actually works as expected. (i.e. registers both) http://codereview.appspot.com/107042/diff/1/2#newcode401 Line 401: ] These should be removed altogether, because the only one that has any effect if a reasonably up-to-date Apache is installed on the system is the 'image/jpg' one. Frankly, I don't see the use case for having a separate 'lenient' registry, but I'm happy to discuss. http://codereview.appspot.com/107042/diff/1/2#newcode462 Line 462: strict = not ('-l' in opts or '--lenient' in opts) It's probably unnecessary, but I tried to shorten and clarify the code throughout the module, wherever I could, including in this command line use bit.
Sign in to reply to this message.
http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode68 Line 68: def __init__(self, filenames=knownfiles, strict=True): Does `filenames=knownfiles` mean Apache mime-types files will be parsed by default when creating a new MimeTypes object? It is a pretty important change and I'm not sure why you're doing it. http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. Not sure why you chose to deprecate this. http://codereview.appspot.com/107042/diff/1/2#newcode193 Line 193: warnings.warn("MimeTypes.read is deprecated.", DeprecationWarning) Again, why did you choose to deprecate it? http://codereview.appspot.com/107042/diff/1/2#newcode211 Line 211: # this function too. Sorry, I don't understand. Why doesn't parse_mimetypes() reuse this function rather than duplicating the code? Are you concerned about performance issues on such an ancillary function? http://codereview.appspot.com/107042/diff/1/3 File Lib/test/test_mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/3#newcode15 Line 15: self.db = mimetypes.MimeTypes(filenames=[]) The fact that you had to change this line hints to potential gratuitous breakage.
Sign in to reply to this message.
http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode68 Line 68: def __init__(self, filenames=knownfiles, strict=True): Yes, but it's not actually a change. They were parsed by default before, just the code was so bad that a cursory read-through wouldn't make that obvious. If you look closely, the previous global init() function adds knownfiles’ mappings to the global types_map (see lines ~300-304), and then that types_map gets its types registered into every new MimeTypes object. I personally don’t think we should be reading apache mimetypes files at all, unless explicitly requested by the user, and should instead just include a more complete list of types. http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. a) It's trivial to just call the register function. b) There are very few cases where you only want to register one type, instead of registering a whole bunch. c) I don't see any reason to include extra redundant functions in the API that aren't going to be used, instead of keeping things as slim as we can. http://codereview.appspot.com/107042/diff/1/2#newcode193 Line 193: warnings.warn("MimeTypes.read is deprecated.", DeprecationWarning) I don't think that this object should have a function for registering an apache mime types file with the instance. It's trivial to use two separate operations: one to parse the file and another to add the types (in a single line of code even), and leaving them separate makes it much clearer what's going on, IMO. As far as I know no third party code actually called this function, and I don't imagine it being a common use case. Ideally, third-party code would include a list of tuples themselves, and just call register() directly. http://codereview.appspot.com/107042/diff/1/2#newcode211 Line 211: # this function too. > Why doesn't parse_mimetypes() reuse this > function rather than duplicating the code? Because this is a generator, and so if parse_mimetypes called on it, the generator would be returned inside our with statement, which would then be returned to its caller before yield was called, and the file would be closed out from under us. We'd then try to read from a closed file, and break. There may be a better way to do this, but since I'd like to see this function go away altogether in a later release, I don't think it's a problem.
Sign in to reply to this message.
http://codereview.appspot.com/107042/diff/1/3 File Lib/test/test_mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/3#newcode15 Line 15: self.db = mimetypes.MimeTypes(filenames=[]) I very much doubt it. If you notice, this code currently sets knownfiles = [] at the top of the code, so I didn't actually have to make this change to keep the tests working. I meant to remove that line, but forgot. Either way, I think this is a poor way to set up the tests, because it's not actually testing the way the code operates in reality (which is to say, including the apache mime types files, which override any extension -> type associations made explicitly in the code).
Sign in to reply to this message.
http://codereview.appspot.com/107042/diff/1/3 File Lib/test/test_mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/3#newcode15 Line 15: self.db = mimetypes.MimeTypes(filenames=[]) So to clarify, adding filenames=[] here is purely a change for test readability, to make it obvious what's going on, rather than an actual change to the behavior of the test.
Sign in to reply to this message.
http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. But there's no real reason to /remove/ it either, except that you'd like to break backwards compatibility purely for aesthetical reasons. http://codereview.appspot.com/107042/diff/1/3 File Lib/test/test_mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/3#newcode15 Line 15: self.db = mimetypes.MimeTypes(filenames=[]) But what we want to test here is the default behaviour of the MimeTypes object, not the behaviour when one of its arguments gets a different value. In other words, adding an argument makes it a different test than it was.
Sign in to reply to this message.
http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. There is one main reason to remove it. I don't want anyone calling functions that change the module-global singleton, because that opens the door wide to hard-to-trace bugs when two different bits of code interact. Every place I've currently seen this add_type function used (in a google code search) is either internally in the prior __init__ method, where it's just an implementation detail, or on the module's singleton, where I consider the usage dangerous, and want it to stop. The aesthetic improvement is a nice additional side effect, to be sure.
Sign in to reply to this message.
http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. On 2009/08/15 12:27:47, Jacob Rus wrote: > I don't want anyone calling functions > that change the module-global singleton, How does it differ from the `register` method (which you didn't deprecate)? Besides, if someone has an use for registering new mime-types for use by other modules or libraries in the same process (which I would consider legitimate), why would it be a bad thing? After all, the singleton isn't really exposed, you have to fetch it from the `_db` variable; it's not like the public API encourages mutations to the private singleton. I think you should be more conservative when trying to put things away.
Sign in to reply to this message.
http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. No, if you notice, I don't expose the register method on the singleton, but, for backwards compatibility, do keep exposing add_type (deprecated) on it. (see lines 403-414) Essentially, I deprecated the previous ways of registering types, and left exposed one single method, which doesn't require the type registry to have knowledge of the apache mimetypes format, and allows user code to be clean and concise. 95%+ uses of this add_types function are something like: for key, value in lookup.iteritems(): mimetypes.add_type(value, key) which can now just be: mtInstance.register((v, k) for (k, v) in lookup.iteritems()) or: from mimetypes import add_type add_type("application/x-debian-package", ".ipk") add_type("application/ogg", ".ogg") add_type("audio/x-flac", ".flac") add_type("application/x-dream-package", ".dmpkg") which can now be: mtInstance.register([ ("application/x-debian-package", ".ipk"), ("application/ogg", ".ogg"), ("audio/x-flac", ".flac"), ("application/x-dream-package", ".dmpkg"), ]) ------ Currently, to mutate the singleton all you have to do is call: mimetypes.add_type(foo, bar) So yes, the public API does encourage mutations to the private singleton. The problem with being conservative is that this current code gives people a 10-year out-of-date, incomplete (even for that time) set of types, and then encourages them to stomp all over each-others custom registrations. I think that's bad.
Sign in to reply to this message.
http://codereview.appspot.com/107042/diff/1/3 File Lib/test/test_mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/3#newcode15 Line 15: self.db = mimetypes.MimeTypes(filenames=[]) We aren't currently testing the default behavior, because the default behavior relies on the files in the knownfiles list being read into every instance of the MimeTypes class, unless two things have happened, (1) the knownfiles list has been explicitly set to empty, breaking any code that expects it not to be, and (2) the *private* _default_mime_types (terrible name btw) function has been called, to make sure that init() didn't already add the registrations of types declared in the known mime.types files. I think the reason you're skeptical of my changes is that you haven't fully understood the previous semantics of the code. I haven't changed them, merely made them obvious: they're suspect because they are poor semantics, but previously no one could tell because they hid behind three layers of obfuscation.
Sign in to reply to this message.
http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. On 2009/08/15 12:45:22, Jacob Rus wrote: > No, if you notice, I don't expose the register method on the singleton, but, for > backwards compatibility, do keep exposing add_type (deprecated) on it. Ok, then it's worse than I thought. I had overlooked the fact that `register` and `add_type` were previously exposed publicly, and that you were removing this too. > Essentially, I deprecated the previous ways of registering types, > and left exposed one single method, which doesn't require the type registry to > have knowledge of the apache mimetypes format, and allows user code to be clean > and concise. I'm sorry, but it doesn't cut it. Whether or not you think `register` makes user code "clean and concise" is not a sufficient reason to deprecate the previous way of doing so. Please don't play games with third-party user code and don't deprecate things just because you like it. By the way, there are seemingly redundant APIs in Python, even in the core builtin types (witness `str.find` vs. `str.index`, for example). > 95%+ uses of this add_types function are something like: > > for key, value in lookup.iteritems(): > mimetypes.add_type(value, key) > > which can now just be: > > mtInstance.register((v, k) for (k, v) in lookup.iteritems()) Once again this is not the same. The former registers new types for everybody to see, the latter does so for a specific MimeTypes instance. Whether you think the former is a bad thing to do doesn't mean there aren't legitimate uses for it. Most people (including third-party libraries) will use the mimetypes module using the toplevel functions, i.e. the singleton, because it's the obvious way to do so. Therefore, the obvious way to change the default mappings (those for the singleton) should also stay in effect, and not be deprecated. By the way, there are precedents for global registries. The most prominent example is the `codecs` module. > The problem with being conservative is that this current code gives people a > 10-year out-of-date, incomplete (even for that time) set of types, and then > encourages them to stomp all over each-others custom registrations. You are giving three separate reasons why you think the current code is bad. I may agree with the two first reasons without agreeing with the third one, and they all imply different fixes to the code. I'm ok with changing/deprecating behaviour if there are real, hard issues with the current code (e.g. deadlocks in multithreaded situations, etc.). I'm not ok with gratuitous breakage because you think you have a better idea of what a nice API is. All in all, many parts of your work are fine and welcome, but the deprecations IMO aren't. http://codereview.appspot.com/107042/diff/1/2#newcode404 Line 404: # remove _filenames argument when the init funciton is removed. Typo ("funciton"). http://codereview.appspot.com/107042/diff/1/2#newcode406 Line 406: global register, encodings_map, suffix_map, types_map, common_types You declare `register` global but don't access it at all. http://codereview.appspot.com/107042/diff/1/3 File Lib/test/test_mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/3#newcode15 Line 15: self.db = mimetypes.MimeTypes(filenames=[]) You're missing the point of this specific comment: if you replace `MimeTypes()` with `MimeTypes(filenames=[])`, you're testing two different API calls, even if then end up doing the same thing :-) When changing a module's implementation, it is better to minimize changes to the tests themselves, so that it is clear which behaviours /haven't/ changed with the implementation change. I agree that this specific case is a detail, however.
Sign in to reply to this message.
http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. > Ok, then it's worse than I thought. I had overlooked the fact that `register` > and `add_type` were previously exposed publicly, and that you were removing this > too. No, register is a new method, set up to be a single way to register new types on a custom instance. add_type is *still* exposed publicly, just deprecated and discouraged. > sufficient reason to deprecate the previous way of doing so. Please don't play > games with third-party user code and don't deprecate things just because you > like it. It's not just because I like it, it's because using this API will break code if two different users both try to register their own types. > By the way, there are seemingly redundant APIs in Python, even in the core > builtin types (witness `str.find` vs. `str.index`, for example). Redundancy is fine. But, for instance, overwriting the built in string `find` method is strongly discouraged, because it breaks other users' code. > Once again this is not the same. The former registers new types for everybody to > see, the latter does so for a specific MimeTypes instance. Whether you think the > former is a bad thing to do doesn't mean there aren't legitimate uses for it. I have not seen any examples of code that actually take advantage of this 'legitimate' use. If someone can find such an example, I would be very interested to see it. Preserving bad APIs because they could be used in hypothetical situations seems like the wrong approach. > Most people (including third-party libraries) will use the mimetypes module > using the toplevel functions, i.e. the singleton, because it's the obvious way > to do so. Therefore, the obvious way to change the default mappings (those for > the singleton) should also stay in effect, and not be deprecated. The whole reason to deprecate these is to make the obvious way to customize behavior be a safe one, so that people will use it. > All in all, many parts of your work are fine and welcome, but the deprecations > IMO aren't. Is there any way to migrate people off of APIs that are likely to break, then? I thought that was the whole point of processes like deprecation. Users will then have one or two versions of Python in which to see the deprecation warnings, and make a small number of code changes. I'm happy to write docs to clearly describe the proper use of the API, and the transition should be completely painless: <5 minutes for someone to read the docs and fix their code to the new API, which will have the added benefit of being obvious and transparent.
Sign in to reply to this message.
http://codereview.appspot.com/107042/diff/1/3 File Lib/test/test_mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/3#newcode15 Line 15: self.db = mimetypes.MimeTypes(filenames=[]) > When changing a module's implementation, it is better to minimize changes to the > tests themselves, so that it is clear which behaviours /haven't/ changed with > the implementation change. But you're missing my point. Okay, sure, someone looking *at this patch* will think I'm doing something I'm not. But someone *reading the test*, without reference to the patch, will be able to more clearly understand what is being tested, which is a MimeTypes instance with an explicitly empty 'filenames'. I'm happy to leave this unchanged, but I think it's more fair to anyone trying to read the tests to be as explicit as possible.
Sign in to reply to this message.
http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode406 Line 406: global register, encodings_map, suffix_map, types_map, common_types Whoops, that's a typo. register is not intended to be global.
Sign in to reply to this message.
http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. On 2009/08/15 13:29:25, Jacob Rus wrote: > > No, register is a new method, set up to be a single way to register new types on > a custom instance. add_type is *still* exposed publicly, just deprecated and > discouraged. Ah. So you're adding a new method and deprecating the old method of doing things, even though they're almost similar. I'm sorry, I'm -1 on this. Please don't deprecate the old method. By the way, if `register` is new, please consider giving a name which is more inline with the current conventions (and more explicit too). Like `add_types` or `register_types`. > It's not just because I like it, it's because using this API will break code if > two different users both try to register their own types. Why will it? Unless they want to register different mime-types for the same extensions (which doesn't sound like a reasonable situation, since mime-type values are supposed to be standards (at least de facto if not de jure in most cases)), the end effect will simply be cumulative: both types of user A and user B will be registered. (once again, it's the same as with the `codecs` module) > Redundancy is fine. But, for instance, overwriting the built in string `find` > method is strongly discouraged, because it breaks other users' code. Nobody is talking about overwriting methods here. I was pointing out that your dislike of redundancy is not universal among the Python code base, and is not a sufficient reason to deprecate things. > I have not seen any examples of code that actually take advantage of this > 'legitimate' use. Sorry, but where have you looked exactly? Not all Python code is open source or publicly disclosed. Besides, it /is/ a legitimate use. Please answer this question: how can I affect a third-party library's recognized mime-types if the ability to use the `add_type` function is removed? > I > thought that was the whole point of processes like deprecation. Deprecation is an exceptional process and should be used with care, and scarcily. Most code should continue to work when upgrading from X.Y to X.(Y+1). While we do deprecate things from time to time, it's really something that shouldn't be done lightly. > I'm happy to > write docs to clearly describe the proper use of the API, and the transition > should be completely painless: <5 minutes Whether or not the transition is mostly "painless" is not the point. You still have to modify code when upgrading; and supporting more than one major version of Python becomes more annoying. > for someone to read the docs and fix > their code to the new API, which will have the added benefit of being obvious > and transparent. Sorry, but `register` isn't more "obvious and transparent" than `add_type`. Deprecating `add_type` is completely gratuitous and unwarranted. http://codereview.appspot.com/107042/diff/1/3 File Lib/test/test_mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/3#newcode15 Line 15: self.db = mimetypes.MimeTypes(filenames=[]) On 2009/08/15 13:32:10, Jacob Rus wrote: > > But someone *reading the test*, without > reference to the patch, will be able to more clearly understand what is being > tested, which is a MimeTypes instance with an explicitly empty 'filenames'. This doesn't address my point at all. The previous code tested a MimeTypes instance with no constructor arguments. The new code tests a MimeTypes instance with an explicitly empty `filenames`, which is different. > I think it's more fair to anyone trying to > read the tests to be as explicit as possible. You're not being "more explicit", you're testing a different thing.
Sign in to reply to this message.
http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. > By the way, if `register` is new, please consider giving a name which is more > inline with the current conventions (and more explicit too). Like `add_types` or > `register_types`. I'm happy enough with `register_types`. > Why will it? Unless they want to register different mime-types for the same > extensions (which doesn't sound like a reasonable situation, since mime-type > values are supposed to be standards (at least de facto if not de jure in most > cases)), the end effect will simply be cumulative: both types of user A and user > B will be registered. Currently, most uses of this module call the init() method, which deletes all of those registrations. Sometimes the 'inited' flag is checked first, but just as often not. But beyond that, while these values are supposed to be standardized in theory, in practice they are not. For example, javascript has several types in use: `application/javascript`, `application/ecmascript`, `application/x-javascript`, `text/javascript`, `text/ecmascript`. Or as another example, many microsoft-related types get served as `application/octet-stream` instead of their proper types. These are legitimate uses because user agents treat the types differently. Much more commonly, out-of-date types are often added to such registrations, which means if I add a recent IANA-approved type, someone else will come along and undo my registration. > > I have not seen any examples of code that actually take advantage of this > > 'legitimate' use. > > Sorry, but where have you looked exactly? Not all Python code is open source or > publicly disclosed. This is a pretty empty argument. > Besides, it /is/ a legitimate use. Please answer this question: how can I affect > a third-party library's recognized mime-types if the ability to use the > `add_type` function is removed? The problem is this: you're inventing a contract for this code which differs from the obvious one that users expect. In order for this to be useful, both of these third-party libraries must be expecting such usage. You can't change a third party library's recognized types in the current API if they use MimeTypes instances (well, unless you e.g. add to an apache mime.types file, and this is IMO utterly backwards and brittle), and I believe many more users would make their own instances if they understood the real implications of the current semantics. Current use of this module is mostly from users copy/pasting code they've seen elsewhere. > Most code should continue to work when upgrading from X.Y to X.(Y+1). That will be the case if this patch is applied. Possibly even X.Y+2. > Whether or not the transition is mostly "painless" is not the point. You still > have to modify code when upgrading; and supporting more than one major version > of Python becomes more annoying. Okay. If `add_type` is left on instances (and deprecated as a function on the singleton), it will be possible to write code that works on at least 2.2+ (and including after the patch). I'm convinced that at least readfp and read should be deprecated, as should the init function, the inited flag, knownfiles, and the global-level mappings. Also, I believe the strict flag should be deprecated on all functions and methods, as by default it does nothing useful (nearly does nothing at all), and its use cases are better served by a more robust set of registered types (but making that change requires some discussion with other knowledgeable users, and hasn't been done in this patch).
Sign in to reply to this message.
http://codereview.appspot.com/107042/diff/1/3 File Lib/test/test_mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/3#newcode15 Line 15: self.db = mimetypes.MimeTypes(filenames=[]) No, the previous code tested a MimeTypes instance which had been set to behave differently than normal with several lines of code near the beginning of the test. The new one tested a MimeTypes instance set to behave differently *in precisely the same way* with code that explicitly denoted the change from default behavior. I honestly don't care about this, and think that testing the non-default behavior is a bad idea either way, but I think it's better to have anyone reading the test code see clearly what they're getting.
Sign in to reply to this message.
On 2009/08/15 15:07:22, Jacob Rus wrote: > > Currently, most uses of this module call the init() method, which deletes all of > those registrations. I'm a bit surprised. Having used the mimetypes module myself, I never bothered to call init() The fact that users call init() may have to do with the fact that until recently, not doing so explicitly could be thread-unsafe. > The problem is this: you're inventing a contract for this code which differs > from the obvious one that users expect. In order for this to be useful, both of > these third-party libraries must be expecting such usage. Why wouldn't they, if they are calling the module functions, which are the obvious API? > I'm convinced that at least readfp and read should be deprecated, as should the > init function, the inited flag, knownfiles, and the global-level mappings. Ok for these, they really have no useful purpose. > Also, I believe the strict flag should be deprecated on all functions and > methods, as by default it does nothing useful (nearly does nothing at all), and > its use cases are better served by a more robust set of registered types (but > making that change requires some discussion with other knowledgeable users, and > hasn't been done in this patch). Perhaps, but it would be better if this were part of a separate patch, after the cleanup is done. It will make discussions much easier for people to follow.
Sign in to reply to this message.
http://codereview.appspot.com/107042/diff/1/3 File Lib/test/test_mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/3#newcode15 Line 15: self.db = mimetypes.MimeTypes(filenames=[]) I agree with Antoine here - the knownfiles argument to a given MimeTypes instance and the mimetypes.knownfiles module global are not the same thing. The module _db singleton always uses the mimetypes.knownfiles attribute unless init() is called explicitly with a filename. The filenames argument to a separate MimeTypes instance only affects that instance, not the init() call to initialise the _db singleton. What would be an *excellent* idea though is to enhance the test suite to actually generate some temporary config files, then properly test the module's file handling abilities by passing in a list of those files to the MimeTypes constructor (i.e. subclass the test case and change the setup method to get the list of files to read from a class attribute). Without that I'd be very nervous about checking in significant changes to this module.
Sign in to reply to this message.
As I said in one of my comments, I appreciate the desire for a cleaner API for handling mimetypes, but this isn't the way to get it. Finding projects that have their own mimetypes implementations, asking them why they created their own rather than using the standard one, seeing what features are common to those APIs, etc, are all things that need to be done before making major changes to the standard library API. What you see as a critical bug (custom MimeTypes instances inheriting their initial settings from the mimetypes._db instance), you can bet some developers are relying on as a feature. If code is in the standard library, someone, somewhere, is relying on it working just the way it is now. Even bug fixes can sometimes break code that was designed to work around the presence of the bug. The concept of having a master copy that new instances are cloned from isn't even particularly objectionable, so long as people clearly understand that is what is going on (e.g. this happens with decimal.DefaultContext being used as the basis for new decimal.Context instances). With code this old, 'softly, softly' is the way to go, and the fewer user visible changes in semantics the better. http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode68 Line 68: def __init__(self, filenames=knownfiles, strict=True): No, this is an API change. Note that the filenames argument is NOT passed to the init() call, but is only used for the final read loop when initialising the current mimetypes object. So by default, new MimeTypes objects get whatever is in knownfiles (cached in the module globals by the init() call) along with whatever is in the files explicitly named in the call to the constructor. Only the module singleton created by init() defaults to using knownfiles. http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. A couple of points: 1. In most cases, init() is only called if _db is None (aside from in MimeType.__init__() where inited is checked instead). Since calling init() is what sets both _db and inited, the method will only be invoked once unless other code external to the module invokes it explicitly (and modulo the thread safety problems the module has since it releases the GIL without adding its own locking) 2. The module globals "encodings_map", "suffix_map", "types_map" and "common_types" are *copied* into every MimeTypes instance on creation. These globals are just references to attributes of the _db MimeTypes instance. Modify the global mappings in the mimetypes module before you import that third party module and it *will* see your changes whether it uses a custom instance or not. While I appreciate your desire for a cleaner API, this isn't the way to get there. Before we will consider radical changes to a venerable API like this one, we would want to see the proposed new API field tested extensively as a PyPI module (or via some other mechanism that allows it to see widespread use). http://codereview.appspot.com/107042/diff/1/2#newcode260 Line 260: } As noted in previous comments, this is an incompatible API change. With the age of this code, you can guarantee that someone is relying on the feature that every new MimeTypes instance basically starts as a clone of the current state of the mimetypes._db object. _db isn't a singleton by the way, since you can have as many MimeTypes instances in your program as you want. It's just an ordinary module global. http://codereview.appspot.com/107042/diff/1/2#newcode456 Line 456: sys.exit(1) Scripts that force me to scroll up past the usage message to find the error message are annoying :) http://codereview.appspot.com/107042/diff/1/2#newcode462 Line 462: strict = not ('-l' in opts or '--lenient' in opts) Not a great idea - the old code is in the standard getopt idiom, while the new one only works for argumentless options.
Sign in to reply to this message.
Really though, the bigger question is about trying to actually bring the list of types up to date (currently they're 10-15 years old, and often wrong). I wanted to spark some discussion about that on python-dev, but no one really seemed interested. Alas. Also, the strict/lenient fiasco should be solved. http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode68 Line 68: def __init__(self, filenames=knownfiles, strict=True): But the old behavior is undocumented, exactly counter to expectations, requires an extremely fine reading of the code to figure out that it's happening, and for general uses, undesirable. Basically, the old behavior is a bug against the documentation/reasonable developer expectations, or at least a "missing feature". Also, anyone actually *trying* to exploit the old behavior is relying on the extremely touchy order in which the MimeTypes objects are inited, and any minor code changes they make will break, hard, in an unpredictable hard-to-debug way. It should be possible to create a "clean" MimeTypes object, but the old code makes that impossible. http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. re: 1. I'm not sure what you're saying or what the point is. If you go read code that uses this module (for instance through google code search), you'll see that one of the most common things it does is explicitly calling the init function, because there was some point at which that was necessary or recommended, and people copy/pasted from there. This necessarily breaks other code that intentionally changed the global objects, which means that in many many cases, two libraries which both use this module cannot safely be mixed-and-matched. I think this results from a tragic mis-match between expected API based on common sense and the documentation, and actual API based on the code. 2. Yes, which means you can, without intending (because after all, this (IMO buggy) behavior is undocumented) trivially break other libraries' code that you import, and then be left scratching your head at the failure. http://codereview.appspot.com/107042/diff/1/2#newcode260 Line 260: } Okay, well when I call it a "singleton" what I mean is that the functions (basically, disguised methods) that get called on it deal with a single instance. I view this module as having 2 parts: (1) a "singleton" instance with functions that call it, and an 'init' that just leaves it alone if called multiple times, and (2) a MimeTypes object that can be created especially, but really has nothing to do with the "singleton" instance. After all, both are called on using different APIs. http://codereview.appspot.com/107042/diff/1/2#newcode456 Line 456: sys.exit(1) Oh, I think the re-ordering is a typo. :) http://codereview.appspot.com/107042/diff/1/2#newcode462 Line 462: strict = not ('-l' in opts or '--lenient' in opts) This current module doesn't have options on its arguments, and this cuts out about 10 lines of boilerplate, but whatever.
Sign in to reply to this message.
http://codereview.appspot.com/107042/diff/1/Lib/mimetypes.py File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/Lib/mimetypes.py#newcode68 Lib/mimetypes.py:68: def __init__(self, filenames=knownfiles, strict=True): As Nick said, we have to preserve compatibility, even at the cost of keeping fragile, not-top-notch code around (it pains us as much as you). If the old behavior is undocumented, you can include doc editions in your patch. http://codereview.appspot.com/107042/diff/1/Lib/mimetypes.py#newcode102 Lib/mimetypes.py:102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. > > > I have not seen any examples of code that actually take advantage of this > > > 'legitimate' use. > > Sorry, but where have you looked exactly? Not all Python code is open source > > or publicly disclosed. > This is a pretty empty argument. I disagree. Antoine asked a legitimate question (for example, Google Code Search sometimes misses important matches) and raised a valid concern: A search for public code will not find matches in private code, but people writing that code are a part of our users and we cannot break documented interfaces without very good reasons. http://codereview.appspot.com/107042/diff/1/Lib/test/test_mimetypes.py File Lib/test/test_mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/Lib/test/test_mimetypes.py#newcode10 Lib/test/test_mimetypes.py:10: mimetypes._init_singleton() As Antoine pointed out, there is no singleton, just a global instance.
Sign in to reply to this message.
> I disagree. Antoine asked a legitimate question (for example, Google > Code Search sometimes misses important matches) and raised a valid > concern: A search for public code will not find matches in private > code, but people writing that code are a part of our users This part may be true, but any such uses are “hacking around” the module rather than using it as intended. > and we cannot break documented interfaces without > very good reasons. We’re talking about (poorly engineered and buggy) implementation details, not documented interfaces. As I said in a comment earlier, >> But the old behavior is undocumented, exactly counter to >> expectations, requires an extremely fine reading of the code to >> figure out that it's happening, and for general uses, undesirable. >> Basically, the old behavior is a bug against the >> documentation/reasonable developer expectations, or at least a >> "missing feature". Also, anyone actually *trying* to exploit the old >> behavior is relying on the extremely touchy order in which the >> MimeTypes objects are inited, and any minor code changes they make >> will break, hard, in an unpredictable hard-to-debug way. In any case, the main things I wish we could discuss sometime were (a) which changes don’t count as “breaking things”? As one small example, can we order the reading of possible Apache locations so that the locations for newer versions aren’t clobbered by out-of-date mimetypes databases from older versions? (b) is there any reasonable way forward to make a better (i.e. not broken) API in this module, or should I just advise people never to use the mimetypes module, and put a replacement up on PyPI? (c) is there any interest in including a somewhat real-world up-to-date set of mime type definitions? The existing ones are more than a decade out of date in some cases, and in others just wrong. If so, does anyone have input on what associations would be good? At some point over a year ago I put some time into figuring out a reasonable list, but it’s been long enough that I don’t really remember by now. To be honest, I came away from the discussion a bit disheartened, because I was trying to say “look how we can improve this, bring it up to date without changing the documented API, make it flexible enough so that sizable projects don’t each have to reinvent it with their own new APIs, etc.” and the response seemed to be roughly “nah, we can’t do that, this module and all its problems are stuck for all time”.
Sign in to reply to this message.
> We’re talking about (poorly engineered and buggy) implementation > details, not documented interfaces. Ah, okay. That was not always clear in your patch and comments. > In any case, the main things I wish we could discuss sometime were > > (a) which changes don’t count as “breaking things”? Changes that don’t make the test suite fail. > As one small > example, can we order the reading of possible Apache locations so that > the locations for newer versions aren’t clobbered by out-of-date > mimetypes databases from older versions? I think that is a change good to have in a new version (for example 3.3), but it would probably be rejected in a bugfix release (like 3.1.4). > (b) is there any reasonable way forward to make a better (i.e. not > broken) API in this module [...] If you have a clone of py3k around, it may be interesting to look at the evolution of the configparser module in the last months. It has gained new features and documentation, some brittle implementation details have been made more robust, but the external, documented API has been kept compatible. So yes, it is possible to add functions and classes, deprecate existing code, revamp the implementation, but while preserving compatibility. > (c) is there any interest in including a somewhat real-world up-to-date > set of mime type definitions? Certainly. For example, an entry for SVG has been added in 3.2. > To be honest, I came away from the discussion a bit disheartened, > because I was trying to say “look how we can improve this, bring it up > to date without changing the documented API, make it flexible enough so > that sizable projects don’t each have to reinvent it with their own new > APIs, etc.” and the response seemed to be roughly “nah, we can’t do > that, this module and all its problems are stuck for all time”. I understand how you feel. The response was not “we can’t do anything”, rather “we can’t do this and that change you’re proposing, but we will accept your edits if you rewrite them to not break existing uses”. I do hope you can see how we’re bound by the compatibility promise, and that you will agree to adapt your patch.
Sign in to reply to this message.
|