Issue 107042: show python mimetypes module love, take 2

Jacob Rus

copy some comments over from http://codereview.appspot.com/104091/diff/1/2 http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (left): http://codereview.appspot.com/107042/diff/1/2#oldcode222 Line 222: The way ...

15 years, 10 months ago (2009-08-15 03:46:59 UTC) #2

copy some comments over from
http://codereview.appspot.com/104091/diff/1/2

http://codereview.appspot.com/107042/diff/1/2
File Lib/mimetypes.py (left):

http://codereview.appspot.com/107042/diff/1/2#oldcode222
Line 222: 
The way all the functions below (I removed them) work is just horribly
confusing. Admittedly somewhat less so than the days when they merely said,
something to the effect of:

def guess_type(url):
    init()
    return guess_type(url)

But still pretty confusing, not to mention verbose.

http://codereview.appspot.com/107042/diff/1/2
File Lib/mimetypes.py (right):

http://codereview.appspot.com/107042/diff/1/2#newcode59
Line 59: ]
I reordered these so that more recent Apache versions will come later, so that
we don't overwrite the more recent mappings with out-of-date ones.

Personally, I'd prefer to remove the dependency on Apache, and just ship a big
list of types inside the standard library, either inline in this module (as a
list of tuples) or as a separate file.

http://codereview.appspot.com/107042/diff/1/2#newcode260
Line 260: }
Previously, anyone who redefined suffix_map would permanently prevent other
users from getting a 'clean' MimeTypes object, unless they called the private
_default_mime_types function (as the tests did).

This seemed pretty broken.

Now, instead, making a new MimeTypes() object pulls from these private dicts, so
that changing the public dicts only affects the module's MimeTypes singleton.

http://codereview.appspot.com/107042/diff/1/2#newcode389
Line 389: ]
These have been changed to a list of tuples so that declaring the same extension
with two different types actually works as expected. (i.e. registers both)

http://codereview.appspot.com/107042/diff/1/2#newcode401
Line 401: ]
These should be removed altogether, because the only one that has any effect if
a reasonably up-to-date Apache is installed on the system is the 'image/jpg'
one. Frankly, I don't see the use case for having a separate 'lenient' registry,
but I'm happy to discuss.

http://codereview.appspot.com/107042/diff/1/2#newcode462
Line 462: strict = not ('-l' in opts or '--lenient' in opts)
It's probably unnecessary, but I tried to shorten and clarify the code
throughout the module, wherever I could, including in this command line use bit.

Sign in to reply to this message.

Antoine Pitrou

http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode68 Line 68: def __init__(self, filenames=knownfiles, strict=True): Does `filenames=knownfiles` mean Apache ...

15 years, 10 months ago (2009-08-15 10:50:57 UTC) #3

Jacob Rus

http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode68 Line 68: def __init__(self, filenames=knownfiles, strict=True): Yes, but it's not ...

15 years, 10 months ago (2009-08-15 12:02:36 UTC) #4

http://codereview.appspot.com/107042/diff/1/2
File Lib/mimetypes.py (right):

http://codereview.appspot.com/107042/diff/1/2#newcode68
Line 68: def __init__(self, filenames=knownfiles, strict=True):
Yes, but it's not actually a change. They were parsed by default before, just
the code was so bad that a cursory read-through wouldn't make that obvious.  If
you look closely, the previous global init() function adds knownfiles’ mappings
to the global types_map (see lines ~300-304), and then that types_map gets its
types registered into every new MimeTypes object.

I personally don’t think we should be reading apache mimetypes files at all,
unless explicitly requested by the user, and should instead just include a more
complete list of types.

http://codereview.appspot.com/107042/diff/1/2#newcode102
Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead.
a) It's trivial to just call the register function.

b) There are very few cases where you only want to register one type, instead of
registering a whole bunch.

c) I don't see any reason to include extra redundant functions in the API that
aren't going to be used, instead of keeping things as slim as we can.

http://codereview.appspot.com/107042/diff/1/2#newcode193
Line 193: warnings.warn("MimeTypes.read is deprecated.", DeprecationWarning)
I don't think that this object should have a function for registering an apache
mime types file with the instance. It's trivial to use two separate operations:
one to parse the file and another to add the types (in a single line of code
even), and leaving them separate makes it much clearer what's going on, IMO.  As
far as I know no third party code actually called this function, and I don't
imagine it being a common use case.

Ideally, third-party code would include a list of tuples themselves, and just
call register() directly.

http://codereview.appspot.com/107042/diff/1/2#newcode211
Line 211: # this function too.
> Why doesn't parse_mimetypes() reuse this
> function rather than duplicating the code?

Because this is a generator, and so if parse_mimetypes called on it, the
generator would be returned inside our with statement, which would then be
returned to its caller  before yield was called, and the file would be closed
out from under us. We'd then try to read from a closed file, and break. There
may be a better way to do this, but since I'd like to see this function go away
altogether in a later release, I don't think it's a problem.

Sign in to reply to this message.

Jacob Rus

http://codereview.appspot.com/107042/diff/1/3 File Lib/test/test_mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/3#newcode15 Line 15: self.db = mimetypes.MimeTypes(filenames=[]) I very much doubt it. ...

15 years, 10 months ago (2009-08-15 12:05:23 UTC) #5

Jacob Rus

http://codereview.appspot.com/107042/diff/1/3 File Lib/test/test_mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/3#newcode15 Line 15: self.db = mimetypes.MimeTypes(filenames=[]) So to clarify, adding filenames=[] ...

15 years, 10 months ago (2009-08-15 12:07:21 UTC) #6

Antoine Pitrou

http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. But there's no ...

15 years, 10 months ago (2009-08-15 12:23:21 UTC) #7

Jacob Rus

http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. There is one ...

15 years, 10 months ago (2009-08-15 12:27:47 UTC) #8

Antoine Pitrou

http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. On 2009/08/15 12:27:47, ...

15 years, 10 months ago (2009-08-15 12:34:33 UTC) #9

Jacob Rus

http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. No, if you ...

15 years, 10 months ago (2009-08-15 12:45:22 UTC) #10

Jacob Rus

http://codereview.appspot.com/107042/diff/1/3 File Lib/test/test_mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/3#newcode15 Line 15: self.db = mimetypes.MimeTypes(filenames=[]) We aren't currently testing the ...

15 years, 10 months ago (2009-08-15 12:49:08 UTC) #11

Antoine Pitrou

http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. On 2009/08/15 12:45:22, ...

15 years, 10 months ago (2009-08-15 13:13:44 UTC) #12

http://codereview.appspot.com/107042/diff/1/2
File Lib/mimetypes.py (right):

http://codereview.appspot.com/107042/diff/1/2#newcode102
Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead.
On 2009/08/15 12:45:22, Jacob Rus wrote:
> No, if you notice, I don't expose the register method on the singleton, but,
for
> backwards compatibility, do keep exposing add_type (deprecated) on it.

Ok, then it's worse than I thought. I had overlooked the fact that `register`
and `add_type` were previously exposed publicly, and that you were removing this
too.

> Essentially, I deprecated the previous ways of registering types,
> and left exposed one single method, which doesn't require the type registry to
> have knowledge of the apache mimetypes format, and allows user code to be
clean
> and concise.

I'm sorry, but it doesn't cut it.
Whether or not you think `register` makes user code "clean and concise" is not a
sufficient reason to deprecate the previous way of doing so. Please don't play
games with third-party user code and don't deprecate things just because you
like it.

By the way, there are seemingly redundant APIs in Python, even in the core
builtin types (witness `str.find` vs. `str.index`, for example).

> 95%+ uses of this add_types function are something like:
> 
> for key, value in lookup.iteritems():
>     mimetypes.add_type(value, key)
> 
> which can now just be:
> 
> mtInstance.register((v, k) for (k, v) in lookup.iteritems())

Once again this is not the same. The former registers new types for everybody to
see, the latter does so for a specific MimeTypes instance. Whether you think the
former is a bad thing to do doesn't mean there aren't legitimate uses for it.

Most people (including third-party libraries) will use the mimetypes module
using the toplevel functions, i.e. the singleton, because it's the obvious way
to do so. Therefore, the obvious way to change the default mappings (those for
the singleton) should also stay in effect, and not be deprecated.

By the way, there are precedents for global registries. The most prominent
example is the `codecs` module.

> The problem with being conservative is that this current code gives people a
> 10-year out-of-date, incomplete (even for that time) set of types, and then
> encourages them to stomp all over each-others custom registrations.

You are giving three separate reasons why you think the current code is bad. I
may agree with the two first reasons without agreeing with the third one, and
they all imply different fixes to the code.

I'm ok with changing/deprecating behaviour if there are real, hard issues with
the current code (e.g. deadlocks in multithreaded situations, etc.). I'm not ok
with gratuitous breakage because you think you have a better idea of what a nice
API is.

All in all, many parts of your work are fine and welcome, but the deprecations
IMO aren't.

http://codereview.appspot.com/107042/diff/1/2#newcode404
Line 404: # remove _filenames argument when the init funciton is removed.
Typo ("funciton").

http://codereview.appspot.com/107042/diff/1/2#newcode406
Line 406: global register, encodings_map, suffix_map, types_map, common_types
You declare `register` global but don't access it at all.

http://codereview.appspot.com/107042/diff/1/3
File Lib/test/test_mimetypes.py (right):

http://codereview.appspot.com/107042/diff/1/3#newcode15
Line 15: self.db = mimetypes.MimeTypes(filenames=[])
You're missing the point of this specific comment: if you replace `MimeTypes()`
with `MimeTypes(filenames=[])`, you're testing two different API calls, even if
then end up doing the same thing :-)

When changing a module's implementation, it is better to minimize changes to the
tests themselves, so that it is clear which behaviours /haven't/ changed with
the implementation change.

I agree that this specific case is a detail, however.

Sign in to reply to this message.

Jacob Rus

http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. > Ok, then ...

15 years, 10 months ago (2009-08-15 13:29:25 UTC) #13

http://codereview.appspot.com/107042/diff/1/2
File Lib/mimetypes.py (right):

http://codereview.appspot.com/107042/diff/1/2#newcode102
Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead.
> Ok, then it's worse than I thought. I had overlooked the fact that `register`
> and `add_type` were previously exposed publicly, and that you were removing
this
> too.

No, register is a new method, set up to be a single way to register new types on
a custom instance. add_type is *still* exposed publicly, just deprecated and
discouraged.

> sufficient reason to deprecate the previous way of doing so. Please don't play
> games with third-party user code and don't deprecate things just because you
> like it.

It's not just because I like it, it's because using this API will break code if
two different users both try to register their own types.

> By the way, there are seemingly redundant APIs in Python, even in the core
> builtin types (witness `str.find` vs. `str.index`, for example).

Redundancy is fine. But, for instance, overwriting the built in string `find`
method is strongly discouraged, because it breaks other users' code.

> Once again this is not the same. The former registers new types for everybody
to
> see, the latter does so for a specific MimeTypes instance. Whether you think
the
> former is a bad thing to do doesn't mean there aren't legitimate uses for it.

I have not seen any examples of code that actually take advantage of this
'legitimate' use. If someone can find such an example, I would be very
interested to see it. Preserving bad APIs because they could be used in
hypothetical situations seems like the wrong approach.

> Most people (including third-party libraries) will use the mimetypes module
> using the toplevel functions, i.e. the singleton, because it's the obvious way
> to do so. Therefore, the obvious way to change the default mappings (those for
> the singleton) should also stay in effect, and not be deprecated.

The whole reason to deprecate these is to make the obvious way to customize
behavior be a safe one, so that people will use it.

> All in all, many parts of your work are fine and welcome, but the deprecations
> IMO aren't.

Is there any way to migrate people off of APIs that are likely to break, then? I
thought that was the whole point of processes like deprecation.

Users will then have one or two versions of Python in which to see the
deprecation warnings, and make a small number of code changes. I'm happy to
write docs to clearly describe the proper use of the API, and the transition
should be completely painless: <5 minutes for someone to read the docs and fix
their code to the new API, which will have the added benefit of being obvious
and transparent.

Sign in to reply to this message.

Jacob Rus

http://codereview.appspot.com/107042/diff/1/3 File Lib/test/test_mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/3#newcode15 Line 15: self.db = mimetypes.MimeTypes(filenames=[]) > When changing a module's ...

15 years, 10 months ago (2009-08-15 13:32:09 UTC) #14

Jacob Rus

http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode406 Line 406: global register, encodings_map, suffix_map, types_map, common_types Whoops, that's ...

15 years, 10 months ago (2009-08-15 13:33:23 UTC) #15

Antoine Pitrou

http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. On 2009/08/15 13:29:25, ...

15 years, 10 months ago (2009-08-15 13:53:57 UTC) #16

http://codereview.appspot.com/107042/diff/1/2
File Lib/mimetypes.py (right):

http://codereview.appspot.com/107042/diff/1/2#newcode102
Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead.
On 2009/08/15 13:29:25, Jacob Rus wrote:
> 
> No, register is a new method, set up to be a single way to register new types
on
> a custom instance. add_type is *still* exposed publicly, just deprecated and
> discouraged.

Ah.
So you're adding a new method and deprecating the old method of doing things,
even though they're almost similar.

I'm sorry, I'm -1 on this. Please don't deprecate the old method.

By the way, if `register` is new, please consider giving a name which is more
inline with the current conventions (and more explicit too). Like `add_types` or
`register_types`.

> It's not just because I like it, it's because using this API will break code
if
> two different users both try to register their own types.

Why will it? Unless they want to register different mime-types for the same
extensions (which doesn't sound like a reasonable situation, since mime-type
values are supposed to be standards (at least de facto if not de jure in most
cases)), the end effect will simply be cumulative: both types of user A and user
B will be registered.

(once again, it's the same as with the `codecs` module)

> Redundancy is fine. But, for instance, overwriting the built in string `find`
> method is strongly discouraged, because it breaks other users' code.

Nobody is talking about overwriting methods here.
I was pointing out that your dislike of redundancy is not universal among the
Python code base, and is not a sufficient reason to deprecate things.

> I have not seen any examples of code that actually take advantage of this
> 'legitimate' use.

Sorry, but where have you looked exactly? Not all Python code is open source or
publicly disclosed.

Besides, it /is/ a legitimate use. Please answer this question: how can I affect
a third-party library's recognized mime-types if the ability to use the
`add_type` function is removed?

> I
> thought that was the whole point of processes like deprecation.

Deprecation is an exceptional process and should be used with care, and
scarcily. Most code should continue to work when upgrading from X.Y to X.(Y+1).
While we do deprecate things from time to time, it's really something that
shouldn't be done lightly.

> I'm happy to
> write docs to clearly describe the proper use of the API, and the transition
> should be completely painless: <5 minutes

Whether or not the transition is mostly "painless" is not the point. You still
have to modify code when upgrading; and supporting more than one major version
of Python becomes more annoying.

> for someone to read the docs and fix
> their code to the new API, which will have the added benefit of being obvious
> and transparent.

Sorry, but `register` isn't more "obvious and transparent" than `add_type`.
Deprecating `add_type` is completely gratuitous and unwarranted.

http://codereview.appspot.com/107042/diff/1/3
File Lib/test/test_mimetypes.py (right):

http://codereview.appspot.com/107042/diff/1/3#newcode15
Line 15: self.db = mimetypes.MimeTypes(filenames=[])
On 2009/08/15 13:32:10, Jacob Rus wrote:
> 
> But someone *reading the test*, without
> reference to the patch, will be able to more clearly understand what is being
> tested, which is a MimeTypes instance with an explicitly empty 'filenames'.

This doesn't address my point at all.
The previous code tested a MimeTypes instance with no constructor arguments.
The new code tests a MimeTypes instance with an explicitly empty `filenames`,
which is different.

> I think it's more fair to anyone trying to
> read the tests to be as explicit as possible.

You're not being "more explicit", you're testing a different thing.

Sign in to reply to this message.

Jacob Rus

http://codereview.appspot.com/107042/diff/1/2 File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/2#newcode102 Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead. > By the ...

15 years, 10 months ago (2009-08-15 15:07:22 UTC) #17

http://codereview.appspot.com/107042/diff/1/2
File Lib/mimetypes.py (right):

http://codereview.appspot.com/107042/diff/1/2#newcode102
Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead.
> By the way, if `register` is new, please consider giving a name which is more
> inline with the current conventions (and more explicit too). Like `add_types`
or
> `register_types`.

I'm happy enough with `register_types`.

> Why will it? Unless they want to register different mime-types for the same
> extensions (which doesn't sound like a reasonable situation, since mime-type
> values are supposed to be standards (at least de facto if not de jure in most
> cases)), the end effect will simply be cumulative: both types of user A and
user
> B will be registered.

Currently, most uses of this module call the init() method, which deletes all of
those registrations.  Sometimes the 'inited' flag is checked first, but just as
often not.

But beyond that, while these values are supposed to be standardized in theory,
in practice they are not.  For example, javascript has several types in use:
`application/javascript`, `application/ecmascript`, `application/x-javascript`,
`text/javascript`, `text/ecmascript`.  Or as another example, many
microsoft-related types get served as `application/octet-stream` instead of
their proper types. These are legitimate uses because user agents treat the
types differently.

Much more commonly, out-of-date types are often added to such registrations,
which means if I add a recent IANA-approved type, someone else will come along
and undo my registration.

> > I have not seen any examples of code that actually take advantage of this
> > 'legitimate' use.
> 
> Sorry, but where have you looked exactly? Not all Python code is open source
or
> publicly disclosed.

This is a pretty empty argument.

> Besides, it /is/ a legitimate use. Please answer this question: how can I
affect
> a third-party library's recognized mime-types if the ability to use the
> `add_type` function is removed?

The problem is this: you're inventing a contract for this code which differs
from the obvious one that users expect. In order for this to be useful, both of
these third-party libraries must be expecting such usage.

You can't change a third party library's recognized types in the current API if
they use MimeTypes instances (well, unless you e.g. add to an apache mime.types
file, and this is IMO utterly backwards and brittle), and I believe many more
users would make their own instances if they understood the real implications of
the current semantics. Current use of this module is mostly from users
copy/pasting code they've seen elsewhere.

>  Most code should continue to work when upgrading from X.Y to X.(Y+1).

That will be the case if this patch is applied. Possibly even X.Y+2.

> Whether or not the transition is mostly "painless" is not the point. You still
> have to modify code when upgrading; and supporting more than one major version
> of Python becomes more annoying.

Okay. If `add_type` is left on instances (and deprecated as a function on the
singleton), it will be possible to write code that works on at least 2.2+ (and
including after the patch).

I'm convinced that at least readfp and read should be deprecated, as should the
init function, the inited flag, knownfiles, and the global-level mappings.

Also, I believe the strict flag should be deprecated on all functions and
methods, as by default it does nothing useful (nearly does nothing at all), and
its use cases are better served by a more robust set of registered types (but
making that change requires some discussion with other knowledgeable users, and
hasn't been done in this patch).

Sign in to reply to this message.

Jacob Rus

http://codereview.appspot.com/107042/diff/1/3 File Lib/test/test_mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/3#newcode15 Line 15: self.db = mimetypes.MimeTypes(filenames=[]) No, the previous code tested ...

15 years, 10 months ago (2009-08-15 15:22:59 UTC) #18

Antoine Pitrou

On 2009/08/15 15:07:22, Jacob Rus wrote: > > Currently, most uses of this module call ...

15 years, 10 months ago (2009-08-15 16:11:32 UTC) #19

Nick Coghlan

http://codereview.appspot.com/107042/diff/1/3 File Lib/test/test_mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/3#newcode15 Line 15: self.db = mimetypes.MimeTypes(filenames=[]) I agree with Antoine here ...

15 years, 8 months ago (2009-10-10 13:42:35 UTC) #20

Nick Coghlan

As I said in one of my comments, I appreciate the desire for a cleaner ...

15 years, 8 months ago (2009-10-10 14:30:33 UTC) #21

As I said in one of my comments, I appreciate the desire for a cleaner API for
handling mimetypes, but this isn't the way to get it. Finding projects that have
their own mimetypes implementations, asking them why they created their own
rather than using the standard one, seeing what features are common to those
APIs, etc, are all things that need to be done before making major changes to
the standard library API.

What you see as a critical bug (custom MimeTypes instances inheriting their
initial settings from the mimetypes._db instance), you can bet some developers
are relying on as a feature. If code is in the standard library, someone,
somewhere, is relying on it working just the way it is now. Even bug fixes can
sometimes break code that was designed to work around the presence of the bug.

The concept of having a master copy that new instances are cloned from isn't
even particularly objectionable, so long as people clearly understand that is
what is going on (e.g. this happens with decimal.DefaultContext being used as
the basis for new decimal.Context instances).

With code this old, 'softly, softly' is the way to go, and the fewer user
visible changes in semantics the better.

http://codereview.appspot.com/107042/diff/1/2
File Lib/mimetypes.py (right):

http://codereview.appspot.com/107042/diff/1/2#newcode68
Line 68: def __init__(self, filenames=knownfiles, strict=True):
No, this is an API change. Note that the filenames argument is NOT passed to the
init() call, but is only used for the final read loop when initialising the
current mimetypes object. So by default, new MimeTypes objects get whatever is
in knownfiles (cached in the module globals by the init() call) along with
whatever is in the files explicitly named in the call to the constructor.

Only the module singleton created by init() defaults to using knownfiles.

http://codereview.appspot.com/107042/diff/1/2#newcode102
Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead.
A couple of points:
1. In most cases, init() is only called if _db is None (aside from in
MimeType.__init__() where inited is checked instead). Since calling init() is
what sets both _db and inited, the method will only be invoked once unless other
code external to the module invokes it explicitly (and modulo the thread safety
problems the module has since it releases the GIL without adding its own
locking)

2. The module globals "encodings_map", "suffix_map", "types_map" and
"common_types" are *copied* into every MimeTypes instance on creation. These
globals are just references to attributes of the _db MimeTypes instance. Modify
the global mappings in the mimetypes module before you import that third party
module and it *will* see your changes whether it uses a custom instance or not.

While I appreciate your desire for a cleaner API, this isn't the way to get
there. Before we will consider radical changes to a venerable API like this one,
we would want to see the proposed new API field tested extensively as a PyPI
module (or via some other mechanism that allows it to see widespread use).

http://codereview.appspot.com/107042/diff/1/2#newcode260
Line 260: }
As noted in previous comments, this is an incompatible API change. With the age
of this code, you can guarantee that someone is relying on the feature that
every new MimeTypes instance basically starts as a clone of the current state of
the mimetypes._db object.

_db isn't a singleton by the way, since you can have as many MimeTypes instances
in your program as you want. It's just an ordinary module global.

http://codereview.appspot.com/107042/diff/1/2#newcode456
Line 456: sys.exit(1)
Scripts that force me to scroll up past the usage message to find the error
message are annoying :)

http://codereview.appspot.com/107042/diff/1/2#newcode462
Line 462: strict = not ('-l' in opts or '--lenient' in opts)
Not a great idea - the old code is in the standard getopt idiom, while the new
one only works for argumentless options.

Sign in to reply to this message.

Jacob Rus

Really though, the bigger question is about trying to actually bring the list of types ...

15 years, 8 months ago (2009-10-11 00:27:51 UTC) #22

Really though, the bigger question is about trying to actually bring the list of
types up to date (currently they're 10-15 years old, and often wrong). I wanted
to spark some discussion about that on python-dev, but no one really seemed
interested. Alas.

Also, the strict/lenient fiasco should be solved.

http://codereview.appspot.com/107042/diff/1/2
File Lib/mimetypes.py (right):

http://codereview.appspot.com/107042/diff/1/2#newcode68
Line 68: def __init__(self, filenames=knownfiles, strict=True):
But the old behavior is undocumented, exactly counter to expectations, requires
an extremely fine reading of the code to figure out that it's happening, and for
general uses, undesirable.  Basically, the old behavior is a bug against the
documentation/reasonable developer expectations, or at least a "missing
feature". Also, anyone actually *trying* to exploit the old behavior is relying
on the extremely touchy order in which the MimeTypes objects are inited, and any
minor code changes they make will break, hard, in an unpredictable hard-to-debug
way.

It should be possible to create a "clean" MimeTypes object, but the old code
makes that impossible.

http://codereview.appspot.com/107042/diff/1/2#newcode102
Line 102: Deprecated: Call MimeTypes.register([(ext, type)]) instead.
re: 1. I'm not sure what you're saying or what the point is. If you go read code
that uses this module (for instance through google code search), you'll see that
one of the most common things it does is explicitly calling the init function,
because there was some point at which that was necessary or recommended, and
people copy/pasted from there. This necessarily breaks other code that
intentionally changed the global objects, which means that in many many cases,
two libraries which both use this module cannot safely be mixed-and-matched. I
think this results from a tragic mis-match between expected API based on common
sense and the documentation, and actual API based on the code.

2. Yes, which means you can, without intending (because after all, this (IMO
buggy) behavior is undocumented) trivially break other libraries' code that you
import, and then be left scratching your head at the failure.

http://codereview.appspot.com/107042/diff/1/2#newcode260
Line 260: }
Okay, well when I call it a "singleton" what I mean is that the functions
(basically, disguised methods) that get called on it deal with a single
instance.

I view this module as having 2 parts: (1) a "singleton" instance with functions
that call it, and an 'init' that just leaves it alone if called multiple times,
and (2) a MimeTypes object that can be created especially, but really has
nothing to do with the "singleton" instance. After all, both are called on using
different APIs.

http://codereview.appspot.com/107042/diff/1/2#newcode456
Line 456: sys.exit(1)
Oh, I think the re-ordering is a typo. :)

http://codereview.appspot.com/107042/diff/1/2#newcode462
Line 462: strict = not ('-l' in opts or '--lenient' in opts)
This current module doesn't have options on its arguments, and this cuts out
about 10 lines of boilerplate, but whatever.

Sign in to reply to this message.

merwok

http://codereview.appspot.com/107042/diff/1/Lib/mimetypes.py File Lib/mimetypes.py (right): http://codereview.appspot.com/107042/diff/1/Lib/mimetypes.py#newcode68 Lib/mimetypes.py:68: def __init__(self, filenames=knownfiles, strict=True): As Nick said, we have ...

14 years, 4 months ago (2011-02-09 23:13:53 UTC) #23

Jacob Rus

> I disagree. Antoine asked a legitimate question (for example, Google > Code Search sometimes ...

14 years, 4 months ago (2011-02-10 02:03:48 UTC) #24

> I disagree.  Antoine asked a legitimate question (for example, Google
> Code Search sometimes misses important matches) and raised a valid
> concern: A search for public code will not find matches in private
> code, but people writing that code are a part of our users

This part may be true, but any such uses are “hacking around” the module rather
than using it as intended. 

> and we cannot break documented interfaces without
> very good reasons.

We’re talking about (poorly engineered and buggy) implementation details, not
documented interfaces.

As I said in a comment earlier,

>> But the old behavior is undocumented, exactly counter to
>> expectations, requires an extremely fine reading of the code to
>> figure out that it's happening, and for general uses, undesirable.
>> Basically, the old behavior is a bug against the
>> documentation/reasonable developer expectations, or at least a
>> "missing feature". Also, anyone actually *trying* to exploit the old
>> behavior is relying on the extremely touchy order in which the
>> MimeTypes objects are inited, and any minor code changes they make
>> will break, hard, in an unpredictable hard-to-debug way.

In any case, the main things I wish we could discuss sometime were

(a) which changes don’t count as “breaking things”? As one small example, can we
order the reading of possible Apache locations so that the locations for newer
versions aren’t clobbered by out-of-date mimetypes databases from older
versions?

(b) is there any reasonable way forward to make a better (i.e. not broken) API
in this module, or should I just advise people never to use the mimetypes
module, and put a replacement up on PyPI?

(c) is there any interest in including a somewhat real-world up-to-date set of
mime type definitions? The existing ones are more than a decade out of date in
some cases, and in others just wrong. If so, does anyone have input on what
associations would be good? At some point over a year ago I put some time into
figuring out a reasonable list, but it’s been long enough that I don’t really
remember by now.

To be honest, I came away from the discussion a bit disheartened, because I was
trying to say “look how we can improve this, bring it up to date without
changing the documented API, make it flexible enough so that sizable projects
don’t each have to reinvent it with their own new APIs, etc.” and the response
seemed to be roughly “nah, we can’t do that, this module and all its problems
are stuck for all time”.

Sign in to reply to this message.

merwok

14 years, 4 months ago (2011-02-10 23:07:38 UTC) #25

> We’re talking about (poorly engineered and buggy) implementation
> details, not documented interfaces.
Ah, okay.  That was not always clear in your patch and comments.

> In any case, the main things I wish we could discuss sometime were
> 
> (a) which changes don’t count as “breaking things”?
Changes that don’t make the test suite fail.

> As one small
> example, can we order the reading of possible Apache locations so that
> the locations for newer versions aren’t clobbered by out-of-date
> mimetypes databases from older versions?
I think that is a change good to have in a new version (for example
3.3), but it would probably be rejected in a bugfix release (like 3.1.4).

> (b) is there any reasonable way forward to make a better (i.e. not
> broken) API in this module [...]
If you have a clone of py3k around, it may be interesting to look at the
evolution of the configparser module in the last months.  It has gained
new features and documentation, some brittle implementation details have
been made more robust, but the external, documented API has been kept
compatible.  So yes, it is possible to add functions and classes,
deprecate existing code, revamp the implementation, but while preserving
compatibility.

> (c) is there any interest in including a somewhat real-world up-to-date
> set of mime type definitions?
Certainly.  For example, an entry for SVG has been added in 3.2.

> To be honest, I came away from the discussion a bit disheartened,
> because I was trying to say “look how we can improve this, bring it up
> to date without changing the documented API, make it flexible enough so
> that sizable projects don’t each have to reinvent it with their own new
> APIs, etc.” and the response seemed to be roughly “nah, we can’t do
> that, this module and all its problems are stuck for all time”.
I understand how you feel.  The response was not “we can’t do anything”,
rather “we can’t do this and that change you’re proposing, but we will
accept your edits if you rewrite them to not break existing uses”.  I do
hope you can see how we’re bound by the compatibility promise, and that
you will agree to adapt your patch.

Sign in to reply to this message.

Issue 107042: show python mimetypes module love, take 2

Description

Patch Set 1 #

Messages