|
|
Created:
14 years, 8 months ago by flox Modified:
14 years, 7 months ago Base URL:
http://svn.python.org/view/*checkout*/python/trunk/ Visibility:
Public. |
DescriptionMerge upstream ElementTree/cElementTree in trunk.
* upstream: http://bitbucket.org/effbot/et-2009-provolone/
* branch to prepare 1.3 for Python:
http://bitbucket.org/flox/et-2009-provolone/
See the thread of the request: http://bugs.python.org/issue6472
The goals of this patch are:
- to fix many bugs reported in Python issue tracker
- to ensure consistency between C and Python implementations (with tests)
- to improve test coverage
Some parts are removed because they are experimental or too much specialized:
- module ElementC14N: canonical XML
- ElementTree C API
This patch fixes many issues, including:
#6472 Update ET with upstream changes (and #1143 #1777)
#1538691 Patch cET to export CurrentLineNumber
#1602189 Suggest a textlist() method for ElementTree
#3151 ET serialization bug for weird namespace urls
#3475 _elementtree.c import can fail silently
#6230 ET.Element and cET.Element have slightly different repr
#6232 Improve test coverage of ET and cET
#6265 cET & ET use different exceptions for XML Errors
#6266 cET.iterparse & ET.iterparse return differently encoded strings
#6565 improper use of __setitem__ in ET
Test coverage:
* relevant tests from upstream are ported to "test_xml_etree.py"
* same test_suite is run for both C and Python implementations
* merge tests provided with #2746, #6232 and #6233
Patch Set 1 : Patch for 2.7, with documentation and tests. #
Total comments: 24
Patch Set 2 : First iteration, and document the deprecation of XMLParser.doctype(...) #Patch Set 3 : Merge *all* tests from upstream. Fix refleaks on ParseError and E.attrib. #
Total comments: 7
Patch Set 4 : Add tests for the C API. Drop unused "element_(get|set)slice". #
Total comments: 15
Patch Set 5 : Extend test coverage with #6232. Fix regression in E.findtext... #Patch Set 6 : Split out experimental C API. #Patch Set 7 : Rebase the patch on the Mercurial repository. #
Total comments: 19
Patch Set 8 : Ready for merge in trunk? #
MessagesTotal messages: 38
Clarify the encoded bytestring. http://codereview.appspot.com/207048/diff/1005/10 File Lib/test/test_xml_etree.py (right): http://codereview.appspot.com/207048/diff/1005/10#newcode260 Lib/test/test_xml_etree.py:260: >>> e = ET.XML("<?xml version='1.0' encoding='utf-8'?><body>t\xc3\xa3g</body>") utf-8 encoded bytes. Should read "t\\xc3\\xa3g". http://codereview.appspot.com/207048/diff/1005/10#newcode263 Lib/test/test_xml_etree.py:263: >>> e = ET.XML("<?xml version='1.0' encoding='iso-8859-1'?><body>t\\xe3g</body>") latin-1 encoded bytes http://codereview.appspot.com/207048/diff/1005/11 File Lib/test/test_xml_etree_c.py (right): http://codereview.appspot.com/207048/diff/1005/11#newcode246 Lib/test/test_xml_etree_c.py:246: >>> e = ET.XML("<?xml version='1.0' encoding='utf-8'?><body>t\xc3\xa3g</body>") utf-8 encoded bytes. Should read "t\\xc3\\xa3g". http://codereview.appspot.com/207048/diff/1005/11#newcode249 Lib/test/test_xml_etree_c.py:249: >>> e = ET.XML("<?xml version='1.0' encoding='iso-8859-1'?><body>t\xe3g</body>") latin-1 encoded bytes. Should read "t\\xe3g".
Sign in to reply to this message.
Added developer foreword on all modules, to ease the review. It is very close to the upstream libraries. The only changes are for consistency between ET and cET. The Python part is simple. The merge which was less obvious is "_elementtree.c". The test_suite should guarantee that both implementations are identical (contrary to current behaviour. See issue #6472 thread for details. http://codereview.appspot.com/207048/diff/1005/10 File Lib/test/test_xml_etree.py (right): http://codereview.appspot.com/207048/diff/1005/10#newcode1 Lib/test/test_xml_etree.py:1: # xml.etree test. This file contains enough tests to make sure that Now the tests guarantee the *same* behaviour between ElementTree and cElementTree. It was not the case before. See #6472. http://codereview.appspot.com/207048/diff/1005/11 File Lib/test/test_xml_etree_c.py (right): http://codereview.appspot.com/207048/diff/1005/11#newcode1 Lib/test/test_xml_etree_c.py:1: # xml.etree test for cElementTree Now the tests guarantee the *same* behaviour between ElementTree and cElementTree. It was not the case before. See #6472. http://codereview.appspot.com/207048/diff/1005/13 File Lib/xml/etree/ElementInclude.py (right): http://codereview.appspot.com/207048/diff/1005/13#newcode3 Lib/xml/etree/ElementInclude.py:3: # $Id: ElementInclude.py 3265 2007-09-06 20:42:00Z fredrik $ Module IDENTICAL with upstream. http://codereview.appspot.com/207048/diff/1005/15 File Lib/xml/etree/ElementPath.py (right): http://codereview.appspot.com/207048/diff/1005/15#newcode3 Lib/xml/etree/ElementPath.py:3: # $Id: ElementPath.py 3276 2007-09-12 06:52:30Z fredrik $ Module IDENTICAL with upstream. http://codereview.appspot.com/207048/diff/1005/12 File Lib/xml/etree/ElementTree.py (right): http://codereview.appspot.com/207048/diff/1005/12#newcode3 Lib/xml/etree/ElementTree.py:3: # $Id: ElementTree.py 3276 2007-09-12 06:52:30Z fredrik $ Very close to the upstream module. Only additions/removal: * preserve the patch for #2746. * support extended slicing. * remove the deprecated __(get|set|del)slice__ http://codereview.appspot.com/207048/diff/1005/14 File Lib/xml/etree/__init__.py (right): http://codereview.appspot.com/207048/diff/1005/14#newcode1 Lib/xml/etree/__init__.py:1: # $Id: __init__.py 3265 2007-09-06 20:42:00Z fredrik $ Module IDENTICAL with upstream. http://codereview.appspot.com/207048/diff/1005/17 File Modules/_elementtree.c (right): http://codereview.appspot.com/207048/diff/1005/17#newcode3 Modules/_elementtree.c:3: * $Id: _elementtree.c 3469 2009-01-10 14:30:13Z fredrik $ Careful merge of upstream into trunk. New additions: * element_findall always returns an iterator (consistency with pure python implementation) * new mapping methods for extended slicing http://codereview.appspot.com/207048/diff/1005/16 File Modules/celementtree.h (right): http://codereview.appspot.com/207048/diff/1005/16#newcode1 Modules/celementtree.h:1: /* Stuff to export relevant entry points from cElementTree */ Upstream stuff. No change.
Sign in to reply to this message.
http://codereview.appspot.com/207048/diff/1005/10 File Lib/test/test_xml_etree.py (right): http://codereview.appspot.com/207048/diff/1005/10#newcode251 Lib/test/test_xml_etree.py:251: >>> ET.tostring(ET.PI('test', u'<testing&>\xe3'), 'latin1') Backported from py3k: r78126 http://codereview.appspot.com/207048/diff/1005/10#newcode256 Lib/test/test_xml_etree.py:256: def check_issue6233(): Backported from py3k: r78123 http://codereview.appspot.com/207048/diff/1005/10#newcode399 Lib/test/test_xml_etree.py:399: def bug_1534630(): copied from test_xml_etree_c.py (ensure same behavior) http://codereview.appspot.com/207048/diff/1005/11 File Lib/test/test_xml_etree_c.py (right): http://codereview.appspot.com/207048/diff/1005/11#newcode226 Lib/test/test_xml_etree_c.py:226: def processinginstruction(): copied from test_xml_etree.py (consistency) http://codereview.appspot.com/207048/diff/1005/11#newcode244 Lib/test/test_xml_etree_c.py:244: def check_issue6233(): copied from test_xml_etree.py (consistency)
Sign in to reply to this message.
http://codereview.appspot.com/207048/diff/1005/12 File Lib/xml/etree/ElementTree.py (left): http://codereview.appspot.com/207048/diff/1005/12#oldcode1237 Lib/xml/etree/ElementTree.py:1237: pass Is this method deprecated? It seems to disappear after the patch. http://codereview.appspot.com/207048/diff/1005/17 File Modules/_elementtree.c (right): http://codereview.appspot.com/207048/diff/1005/17#newcode798 Modules/_elementtree.c:798: seq = PySequence_Fast(seq_in, ""); The second argument (error message if seq_in is not a valid sequence) should be non-empty. http://codereview.appspot.com/207048/diff/1005/17#newcode892 Modules/_elementtree.c:892: See bug 6472. */ Rather than disabling the C implementation, we could simply call PyObject_GetIter() on the result. http://codereview.appspot.com/207048/diff/1005/17#newcode1498 Modules/_elementtree.c:1498: recycle = PyList_New(slicelen); This lacks an error check. http://codereview.appspot.com/207048/diff/1005/17#newcode1512 Modules/_elementtree.c:1512: Py_DECREF(seq); There's a refleak with `recycle` but we can't decref it here, since otherwise some elements will be destroyed while they are still referenced by the parent. The call to element_resize() should instead be done before `recycle` is populated (but after it is allocated).
Sign in to reply to this message.
http://codereview.appspot.com/207048/diff/1005/12 File Lib/xml/etree/ElementTree.py (left): http://codereview.appspot.com/207048/diff/1005/12#oldcode1237 Lib/xml/etree/ElementTree.py:1237: pass On 2010/02/10 13:14:37, Antoine Pitrou wrote: > Is this method deprecated? It seems to disappear after the patch. > Now it is an optional method of the builder: self.target.doctype(*args) The backward compatibility concern is if someone subclassed XMLTreeBuilder (now XMLParser) and define a custom "doctype()" method. In that case, it should be called by the "_default" handler on line 1226 above. The new way to do it is to provide a "doctype" method on the builder itself (instead of XMLParser). http://codereview.appspot.com/207048/diff/1005/12 File Lib/xml/etree/ElementTree.py (right): http://codereview.appspot.com/207048/diff/1005/12#newcode938 Lib/xml/etree/ElementTree.py:938: # dublic core obvious typo...
Sign in to reply to this message.
First iteration, and document the deprecation of XMLParser.doctype(...)
Sign in to reply to this message.
You do realize that you're merging in an experimental release, right? I'm a bit worried that the result of this effort will be plenty of incompatibilities with the upstream library (and there are also signs on bugs.python.org that some people involved don't understand the difference between specification of a portable API and artifacts of a certain implementation of the same API), but I'm travelling right now, and have no bandwidth to deal with this. Just be careful. </F> On Wed, Feb 10, 2010 at 2:14 PM, <antoine.pitrou@gmail.com> wrote: > > http://codereview.appspot.com/207048/diff/1005/12 > File Lib/xml/etree/ElementTree.py (left): > > http://codereview.appspot.com/207048/diff/1005/12#oldcode1237 > Lib/xml/etree/ElementTree.py:1237: pass > Is this method deprecated? It seems to disappear after the patch. > > http://codereview.appspot.com/207048/diff/1005/17 > File Modules/_elementtree.c (right): > > http://codereview.appspot.com/207048/diff/1005/17#newcode798 > Modules/_elementtree.c:798: seq = PySequence_Fast(seq_in, ""); > The second argument (error message if seq_in is not a valid sequence) > should be non-empty. > > http://codereview.appspot.com/207048/diff/1005/17#newcode892 > Modules/_elementtree.c:892: See bug 6472. */ > Rather than disabling the C implementation, we could simply call > PyObject_GetIter() on the result. > > http://codereview.appspot.com/207048/diff/1005/17#newcode1498 > Modules/_elementtree.c:1498: recycle = PyList_New(slicelen); > This lacks an error check. > > http://codereview.appspot.com/207048/diff/1005/17#newcode1512 > Modules/_elementtree.c:1512: Py_DECREF(seq); > There's a refleak with `recycle` but we can't decref it here, since > otherwise some elements will be destroyed while they are still > referenced by the parent. > The call to element_resize() should instead be done before `recycle` is > populated (but after it is allocated). > > http://codereview.appspot.com/207048/show >
Sign in to reply to this message.
Merge *all* tests from upstream. Fix refleaks on ParseError and E.attrib.
Sign in to reply to this message.
On 2010/02/11 18:39:08, fredrik_pythonware.com wrote: > You do realize that you're merging in an experimental release, right? > I'm a bit worried that the result of this effort will be plenty of > incompatibilities with the upstream library (and there are also signs > on http://bugs.python.org that some people involved don't understand the > difference between specification of a portable API and artifacts of a > certain implementation of the same API), but I'm travelling right now, > and have no bandwidth to deal with this. Just be careful. > > </F> > Thanks, Fredrik for your feedback. Actually, I started to fix some ET/cET bugs 3 months ago. The main issues with the current "xml.etree" package are explained on #6472. Then I found that most bugs and discrepancies were already fixed in the upstream versions. I thought that the best approach is to port the upstream version in trunk, rather than fixing bugs separately in /python/trunk/. I took the upstream bundles and I merged them carefully with the trunk 2.7 alpha over December 2009. Recently Antoine showed some interest in fixing some bugs of "xml.etree" (r78123-r78126). I suggested again to merge the upstream implementation of ET in trunk. With your comment about the *experimental* status of the package, I decided to grow the python test suite with both "selftest.py" upstream tests. I merged all tests together, and now the same test suite passes with Python and C implementations. (test_xml_etree.py: 370 lines --> 1500 lines) The tests show no regression. Additionally, some reference leakings were identified and fixed, while running the test suite. (See diff between patch set 1 and 3) The differences between upstream and the current patch (patch set 3) are limited: * Lib/xml/etree/ElementTree.py http://paste.pocoo.org/compare/177059/177058/ * Modules/_elementree.c http://paste.pocoo.org/compare/177063/177061/ * all other modules are IDENTICAL with upstream Thank you again for this software, and your comments. I hope we can merge it in trunk before the beta of 2.7. -- Florent
Sign in to reply to this message.
http://codereview.appspot.com/207048/diff/1035/50 File Modules/_elementtree.c (right): http://codereview.appspot.com/207048/diff/1035/50#newcode799 Modules/_elementtree.c:799: seq = PySequence_Fast(seq_in, ""); Don't miss this one, next time... http://codereview.appspot.com/207048/diff/1035/50#newcode1303 Modules/_elementtree.c:1303: seq = PySequence_Fast(seq_in, "can only assign an iterable"); 2nd argument is useless and can be reverted to empty string?
Sign in to reply to this message.
http://codereview.appspot.com/207048/diff/1035/50 File Modules/_elementtree.c (left): http://codereview.appspot.com/207048/diff/1035/50#oldcode1409 Modules/_elementtree.c:1409: element_setslice, If element_setslice and element_getslice aren't used anymore, they should be deleted. http://codereview.appspot.com/207048/diff/1035/50 File Modules/_elementtree.c (right): http://codereview.appspot.com/207048/diff/1035/50#newcode917 Modules/_elementtree.c:917: return NULL; This conditional is quite useless, isn't it? http://codereview.appspot.com/207048/diff/1035/50#newcode1303 Modules/_elementtree.c:1303: seq = PySequence_Fast(seq_in, "can only assign an iterable"); On 2010/02/12 14:07:23, flox wrote: > 2nd argument is useless and can be reverted to empty string? Certainly, yes.
Sign in to reply to this message.
http://codereview.appspot.com/207048/diff/1035/50 File Modules/_elementtree.c (left): http://codereview.appspot.com/207048/diff/1035/50#oldcode1409 Modules/_elementtree.c:1409: element_setslice, On 2010/02/12 15:27:46, Antoine Pitrou wrote: > If element_setslice and element_getslice aren't used anymore, they should be > deleted. * element_setslice is unused * element_getslice is used by the experimental "element_capi_snapshot" * element_get_type is unused I will try to replace "element_getslice" with "element_subscr", then remove all 3 functions. http://codereview.appspot.com/207048/diff/1035/50 File Modules/_elementtree.c (right): http://codereview.appspot.com/207048/diff/1035/50#newcode917 Modules/_elementtree.c:917: return NULL; On 2010/02/12 15:27:46, Antoine Pitrou wrote: > This conditional is quite useless, isn't it? > indeed, it is ...
Sign in to reply to this message.
Add tests for the C API. Drop unused "element_(get|set)slice".
Sign in to reply to this message.
http://codereview.appspot.com/207048/diff/67/1054 File Modules/_testcapimodule.c (right): http://codereview.appspot.com/207048/diff/67/1054#newcode1311 Modules/_testcapimodule.c:1311: childob = (*capi->getitem)(newob, 0); You should check that childob is non-NULL. http://codereview.appspot.com/207048/diff/67/1054#newcode1315 Modules/_testcapimodule.c:1315: return NULL; newob should be decref'ed here http://codereview.appspot.com/207048/diff/67/1054#newcode1320 Modules/_testcapimodule.c:1320: return NULL; newob should also be decref'ed here. An alternative is to add an "error" label which Py_XDECREF's all local objects and returns NULL. http://codereview.appspot.com/207048/diff/67/1054#newcode1325 Modules/_testcapimodule.c:1325: Py_DECREF(Py_None); Py_None obviously shouldn't be decref'ed here, since it hasn't been returned. Given the current implementation the getitem() result should be decref'ed, though. http://codereview.appspot.com/207048/diff/67/1054#newcode1331 Modules/_testcapimodule.c:1331: Py_DECREF(Py_None); This might have to be removed if the getitem() implementation is fixed to return a borrowed reference. http://codereview.appspot.com/207048/diff/67/1054#newcode1338 Modules/_testcapimodule.c:1338: strcmp(PyString_AS_STRING(snapshot.tag), "document") != 0) { If you want to make all these checks easier (and/or more complete), you could instead return a tuple of the snapshot's contents and check the values in Python. http://codereview.appspot.com/207048/diff/67/1052 File Modules/celementtree.h (right): http://codereview.appspot.com/207048/diff/67/1052#newcode51 Modules/celementtree.h:51: PyObject* type; The doc/comments should state whether this reference is owned or borrowed (i.e., whether one should Py_DECREF it when done with the capi struct). http://codereview.appspot.com/207048/diff/67/1052#newcode56 Modules/celementtree.h:56: int (*assert)(PyObject* elem); I'm not sure calling this "assert" is a good idea. If some C compiler uses a #define for the standard "assert", it can refuse to compile or compile to the wrong symbol. Why not something explicit such as "checktype"? http://codereview.appspot.com/207048/diff/67/1052#newcode72 Modules/celementtree.h:72: /* Returns a borrowed reference, or Py_None if the element does not The doc is wrong, because the implementation returns a new reference. Either the doc or the implementation should be fixed. http://codereview.appspot.com/207048/diff/67/1052#newcode74 Modules/celementtree.h:74: PyObject* (*getitem)(PyObject* elem, int index); The index should probably be a Py_ssize_t instead.
Sign in to reply to this message.
http://codereview.appspot.com/207048/diff/67/1054 File Modules/_testcapimodule.c (right): http://codereview.appspot.com/207048/diff/67/1054#newcode1338 Modules/_testcapimodule.c:1338: strcmp(PyString_AS_STRING(snapshot.tag), "document") != 0) { On 2010/02/15 15:06:56, Antoine Pitrou wrote: > If you want to make all these checks easier (and/or more complete), you could > instead return a tuple of the snapshot's contents and check the values in > Python. > Agreed. http://codereview.appspot.com/207048/diff/67/1052 File Modules/celementtree.h (right): http://codereview.appspot.com/207048/diff/67/1052#newcode51 Modules/celementtree.h:51: PyObject* type; On 2010/02/15 15:06:56, Antoine Pitrou wrote: > The doc/comments should state whether this reference is owned or borrowed (i.e., > whether one should Py_DECREF it when done with the capi struct). Added a statement "borrowed reference". http://codereview.appspot.com/207048/diff/67/1052#newcode56 Modules/celementtree.h:56: int (*assert)(PyObject* elem); On 2010/02/15 15:06:56, Antoine Pitrou wrote: > I'm not sure calling this "assert" is a good idea. If some C compiler uses a > #define for the standard "assert", it can refuse to compile or compile to the > wrong symbol. Why not something explicit such as "checktype"? > Agreed. http://codereview.appspot.com/207048/diff/67/1052#newcode72 Modules/celementtree.h:72: /* Returns a borrowed reference, or Py_None if the element does not On 2010/02/15 15:06:56, Antoine Pitrou wrote: > The doc is wrong, because the implementation returns a new reference. Either the > doc or the implementation should be fixed. Fixed the implementation. http://codereview.appspot.com/207048/diff/67/1052#newcode74 Modules/celementtree.h:74: PyObject* (*getitem)(PyObject* elem, int index); On 2010/02/15 15:06:56, Antoine Pitrou wrote: > The index should probably be a Py_ssize_t instead. > Ok.
Sign in to reply to this message.
Extend test coverage with #6232. Fix regression in E.findtext...
Sign in to reply to this message.
On 2010/02/16 22:38:37, flox wrote: > Extend test coverage with #6232. Fix regression in E.findtext... I propose to bump the versions numbers for the release. Python 2.6 versions: * ElementTree 1.2.6 * cElementTree 1.0.6 Upstream snapshots: * ElementTree 1.3a3 - 2007-09-12 * cElementTree 1.0.6a2 - 2009-01-10 * cElementTree.CAPI 1.0 Versions proposed for 2.7: * ElementTree 1.3 * cElementTree 1.0.7 * cElementTree.CAPI 1.0
Sign in to reply to this message.
On 2010/02/17 09:12:42, flox wrote: > > Versions proposed for 2.7: > * ElementTree 1.3 > * cElementTree 1.0.7 > * cElementTree.CAPI 1.0 An alternative proposal: * ElementTree 1.3.0 (instead of 1.3) => keep 3 numbers * cElementTree 1.1.0 (instead of 1.0.7) => because there's a significative effort to bring consistency with the Python implementation (+extended slicing, +leak fixes) * cElementTree.CAPI 1.1 (instead of 1.0) => because the API is not 100% compatible with the alpha-API * capi->assert becomes capi->checktype * capi->type returns a borrowed reference * capi->getelement returns a borrowed reference => But we can stick to 1.0, if we consider that the previous API was experimental, not released.
Sign in to reply to this message.
Since you've effectively hijacked the library, and have created your own fork that's not fully compatible with any formal release of the upstream library, and am not contributing any patches back to upstream, I suggest renaming it instead. </F> On Wed, Feb 17, 2010 at 10:31 AM, <florent.xicluna@gmail.com> wrote: > On 2010/02/17 09:12:42, flox wrote: > >> Versions proposed for 2.7: >> * ElementTree 1.3 >> * cElementTree 1.0.7 >> * cElementTree.CAPI 1.0 > > An alternative proposal: > > * ElementTree 1.3.0 (instead of 1.3) > => keep 3 numbers > > * cElementTree 1.1.0 (instead of 1.0.7) > => because there's a significative effort to bring consistency > with the Python implementation > (+extended slicing, +leak fixes) > > * cElementTree.CAPI 1.1 (instead of 1.0) > => because the API is not 100% compatible with the alpha-API > * capi->assert becomes capi->checktype > * capi->type returns a borrowed reference > * capi->getelement returns a borrowed reference > => But we can stick to 1.0, if we consider that the previous API > was experimental, not released. > > > http://codereview.appspot.com/207048/show >
Sign in to reply to this message.
On 2010/02/17 09:40:12, fredrik_pythonware.com wrote: > Since you've effectively hijacked the library, and have created your > own fork that's not fully compatible with any formal release of the > upstream library, and am not contributing any patches back to > upstream, I suggest renaming it instead. > > </F> > Fredrik, thank you for this comment. But I do not consider this upstream merge as a fork. It is intended to fix the plethora of bugs which were reported in the issue tracker over the past 3 years. (#1143, #1777, #1538691, #3151, #3475, #6230, #6232, #6265, #6266, #6472, #6565) Did you see the test coverage, before stating that it is an incompatible fork? The test suite includes all upstream tests + many tests contributed by Neil for ElementTree 1.2.6 (issue #6232). And I've provided the diff with upstream snapshot, to show how close it is from the upstream library (as close as "xml.etree" 2.6 with ET 1.2.6/cET 1.0.5). See diffs on this page: http://codereview.appspot.com/207048/show If the concern is about cElementTree.CAPI, we can remove it. I would prefer to remove it from the patch, if there's any risk of incompatibility with the upstream library. For the version numbers, could you provide some hints, so we do not collide with the upcoming release of the upstream libraries? Btw, why the "xml.etree.cElementTree" in 2.6 is versioned 1.0.6, but the latest upstream snapshot has version "1.0.6a2"? -- Florent Xicluna
Sign in to reply to this message.
Fredrik, > Since you've effectively hijacked the library, and have created your > own fork that's not fully compatible with any formal release of the > upstream library, and am not contributing any patches back to > upstream, I suggest renaming it instead. The point here is to fix these bugs for all Python users. If the "patches" were contributed upstream, it would be a NO-OP for Python until upstream gets ported back to Python (which hasn't seemed to happen for years, has it?). If you want to propose another process then please do so. But the process you will be proposing has to have the final outcome of fixing these bugs *in Python* as well, not only for users of your own releases (or, AFAIU, SVN repository checkouts).
Sign in to reply to this message.
The problem is that you're merging in features from a version that has never been formally released -- I would have thought labeling something as "experimental" and "work in progress" and storing it in a repository named after a cheese (!) would be enough to make it clear that the design wasn't finalized, but apparently that was a bit optimistic (guess that's the downside of experimenting in a public repository :-). I don't mind you guys pulling bug fixes into Python 3 -- that's great -- the problem is when you start pulling in features, because that means that you're basically freezing the API based on something that was never intended to be final. Since ET is a portable API with multiple implementations, I'm not sure that's optimal. But again, work & travel means that I have no time for before mid-March or so. If you want to push forward with an 1.3 release before that, there's not much I can do about it (it's open source with a permissive license, after all). Otherwise, I'd recommend sticking mostly to the 1.2 API with compatible bug fixes and other tweaks required to make it work well under 3.X (i.e. going for "1.2.8" instead of "1.3.0"). Adding things like "extend" is non-controversial, things like the namespace-aware parsers etc less so. </F> On Wed, Feb 17, 2010 at 1:06 PM, <antoine.pitrou@gmail.com> wrote: > Fredrik, > >> Since you've effectively hijacked the library, and have created your >> own fork that's not fully compatible with any formal release of the >> upstream library, and am not contributing any patches back to >> upstream, I suggest renaming it instead. > > The point here is to fix these bugs for all Python users. If the > "patches" were contributed upstream, it would be a NO-OP for Python > until upstream gets ported back to Python (which hasn't seemed to happen > for years, has it?). If you want to propose another process then please > do so. > > But the process you will be proposing has to have the final outcome of > fixing these bugs *in Python* as well, not only for users of your own > releases (or, AFAIU, SVN repository checkouts). > > > http://codereview.appspot.com/207048/show >
Sign in to reply to this message.
Split out experimental C API.
Sign in to reply to this message.
On 2010/02/17 12:43:14, fredrik_pythonware.com wrote: > The problem is that you're merging in features from a version that has > never been formally released. > (...) the problem is when you start pulling in features, > because that means that you're basically freezing the API based on > something that was never intended to be final. Since ET is a portable > API with multiple implementations, I'm not sure that's optimal. Fredrik, I understand your point about the ElementTree API. I don't want to bring an experimental ElementTree API 1.3 in the stdlib. This patch does not aim at upgrading the API before it is released. So, do you agree with the following goals for "xml.etree" in Python 2.7 and Python 3.2? - stick to ElementTree API 1.2 (until 1.3 is formally released) - fix the outstanding bugs (listed on the Rietveld page) - align the C implementation with the Python implementation - grow the test suite with upstream tests (except some 1.3-only tests) If you are supportive of such release, and there's a real opportunity for the patch to be accepted, I can rework the patch to remove some experimental 1.3 features and prepare an 1.2.8 version. -- Florent
Sign in to reply to this message.
Rebase the patch on the Mercurial repository.
Sign in to reply to this message.
Finally managed to set aside enough time to review both the mercurial fork and this patch. Some minor comments inline; the only thing I would prefer to see changed before commit is the deprecation of getiterator in 1.3 (see notes). If I've missed anything else, let's deal with that in 1.4 :) And again, thanks for doing this work, and sorry for having so little time to spend on this at this time. </F> http://codereview.appspot.com/207048/diff/8001/9006 File Lib/xml/etree/ElementTree.py (right): http://codereview.appspot.com/207048/diff/8001/9006#newcode487 Lib/xml/etree/ElementTree.py:487: return list(self.iter(tag)) Since the new spelling hasn't been available before, I'd prefer to deprecate this in 1.4 (as noted in a comment the original 1.3 code). That is, version N=document as deprecated, N+1=warn that it will go away, N+2 or higher=remove. (btw, 'getiterator' is older than the iterator concept in Python, and was only defined to return something that you could iterate over. The 1.3 code uses list(iter) for strict API compatibility with the ElementTree 1.2 implementation (being overly cautious here, perhaps)). http://codereview.appspot.com/207048/diff/8001/9006#newcode1239 Lib/xml/etree/ElementTree.py:1239: raise ValueError("unknown event %r" % event) This will yield a ValueError instead of an ImportError for c14n if ElementC14N isn't present; the former doesn't really tell the user what's missing. It's no big deal, really, just a minor regression that might be worth revisiting later on. http://codereview.appspot.com/207048/diff/8001/9006#newcode1297 Lib/xml/etree/ElementTree.py:1297: for elem in tree.getiterator(): Use iter here? http://codereview.appspot.com/207048/diff/8001/9010 File Modules/_elementtree.c (right): http://codereview.appspot.com/207048/diff/8001/9010#newcode110 Modules/_elementtree.c:110: #define Py_RETURN_NONE return Py_INCREF(Py_None), Py_None I tend prefer using feature tests instead of version tests whenever possible, but no big deal.
Sign in to reply to this message.
See my answers below. I don't get the point about ValueError versus ImportError. I will commit the changes in the Mercurial repo, then update the patch here. http://codereview.appspot.com/207048/diff/8001/9006 File Lib/xml/etree/ElementTree.py (right): http://codereview.appspot.com/207048/diff/8001/9006#newcode487 Lib/xml/etree/ElementTree.py:487: return list(self.iter(tag)) On 2010/03/09 10:38:43, effbot wrote: > Since the new spelling hasn't been available before, > I'd prefer to deprecate this in 1.4 (as noted in a comment the original 1.3 > code). That is, version N=document as deprecated, N+1=warn that it will go > away, N+2 or higher=remove. > > (btw, 'getiterator' is older than the iterator concept in Python, and was only > defined to return something that you could iterate over. The 1.3 code uses > list(iter) for strict API compatibility with the ElementTree 1.2 implementation > (being overly cautious here, perhaps)). OK, I will rollback this deprecation warning. http://codereview.appspot.com/207048/diff/8001/9006#newcode1239 Lib/xml/etree/ElementTree.py:1239: raise ValueError("unknown event %r" % event) On 2010/03/09 10:38:43, effbot wrote: > This will yield a ValueError instead of an ImportError for c14n if ElementC14N > isn't present; the former doesn't really tell the user what's missing. It's no > big deal, really, just a minor regression that might be worth revisiting later > on. > Are you sure? This fix is for "iterparse" to be consistent with the C implementation. I don't see the link with c14n. http://codereview.appspot.com/207048/diff/8001/9006#newcode1297 Lib/xml/etree/ElementTree.py:1297: for elem in tree.getiterator(): On 2010/03/09 10:38:43, effbot wrote: > Use iter here? ok. http://codereview.appspot.com/207048/diff/8001/9010 File Modules/_elementtree.c (right): http://codereview.appspot.com/207048/diff/8001/9010#newcode110 Modules/_elementtree.c:110: #define Py_RETURN_NONE return Py_INCREF(Py_None), Py_None On 2010/03/09 10:38:43, effbot wrote: > I tend prefer using feature tests instead of version tests > whenever possible, but no big deal. Ok, I can roll it back. I changed it to make it easier the day you drop the compatibility with 2.4.
Sign in to reply to this message.
http://codereview.appspot.com/207048/diff/8001/9010 File Modules/_elementtree.c (left): http://codereview.appspot.com/207048/diff/8001/9010#oldcode2699 Modules/_elementtree.c:2699: "def getiterator(node, tag=None):\n" /* helper */ Note: the C implementation of ElementTree was already returning an iterator for ElementTree.getiterator. Because of this, I would suggest to keep "ET.getiterator == ET.iter" for both Python and C implementations (Until we deprecate it). It's not optimal to keep the inconsistency between Python and C here. Could I make the "getiterator()" warning a PendingDeprecationWarning? It will give some information for the careful developer.
Sign in to reply to this message.
Some followup comments. http://codereview.appspot.com/207048/diff/8001/9006 File Lib/xml/etree/ElementTree.py (right): http://codereview.appspot.com/207048/diff/8001/9006#newcode790 Lib/xml/etree/ElementTree.py:790: raise ValueError("unknown method %r" % method) Reposting to right place: This will yield a ValueError instead of an ImportError for c14n if ElementC14N isn't present; the former doesn't really tell the user what's missing. It's no big deal, really, just a minor regression that might be worth revisiting later on. (An ImportError mentioning a module name gives a stronger hint to the user than a ValueError.) http://codereview.appspot.com/207048/diff/8001/9006#newcode1239 Lib/xml/etree/ElementTree.py:1239: raise ValueError("unknown event %r" % event) Oops. The comment was supposed to be at line 790, not here. That's what you get for copying your notes from a separate document and not paying attention (that, or Rieveld played tricks on me :). http://codereview.appspot.com/207048/diff/8001/9006#newcode1642 Lib/xml/etree/ElementTree.py:1642: from ElementC14N import _serialize_c14n Maybe the import should be done from "elementtree.ElementC14N" so you can use this even if you import xml.etree but have the stand-alone version installed? http://codereview.appspot.com/207048/diff/8001/9010 File Modules/_elementtree.c (left): http://codereview.appspot.com/207048/diff/8001/9010#oldcode2699 Modules/_elementtree.c:2699: "def getiterator(node, tag=None):\n" /* helper */ That inconsistency is old, and I was more concerned with backwards compatibility for existing code than for people who are explicitly migrating (this is the whole rationale for having to import cElementTree explicitly, instead of mapping ElementTree to it if it's there), but ok, let's make them equal and see what happens. A bit mixed about PendingDeprecationWarning; it's not a bad idea in itself, but are people using -Wall enough to motivate the extra overhead for every call in existing code? Also, it's not in 2.2 iirc, so you'll have to add extra logic for that.
Sign in to reply to this message.
Comments about E.getiterator() and support of C14N serializer. I will push some changes to Mercurial tonight. http://codereview.appspot.com/207048/diff/8001/9006 File Lib/xml/etree/ElementTree.py (right): http://codereview.appspot.com/207048/diff/8001/9006#newcode1642 Lib/xml/etree/ElementTree.py:1642: from ElementC14N import _serialize_c14n On 2010/03/10 14:37:25, effbot wrote: > Maybe the import should be done from "elementtree.ElementC14N" so you can use > this even if you import xml.etree but have the stand-alone version installed? I'm concerned about the risk of incompatiblity if "xml.etree" and "elementtree" are mixed. There will be 2 versions of ElementTree module imported: xml.etree.ElementTree and elementtree.ElementTree I preserved the code in the "xml.etree" version to lower the differences with the upstream "elementtree". But there's no plan to add ElementC14N in Python (afaiu). And I don't plan to document C14N, since it is not part of "xml.etree". We can decide to change it later, (and change ValueError --> ImportError). http://codereview.appspot.com/207048/diff/8001/9010 File Modules/_elementtree.c (left): http://codereview.appspot.com/207048/diff/8001/9010#oldcode2699 Modules/_elementtree.c:2699: "def getiterator(node, tag=None):\n" /* helper */ On 2010/03/10 14:37:25, effbot wrote: > A bit mixed about PendingDeprecationWarning; it's not a bad idea in itself, but > are people using -Wall enough to motivate the extra overhead for every call in > existing code? Also, it's not in 2.2 iirc, so you'll have to add extra logic > for that. Python 2.2 is no longer supported, is it? The PendingDeprecationWarning will help people migrate their software, when they will move to 2.7. For the inconsistency, between C getiterator and Py getiterator, now I think we can preserve (and document) the inconsistency. Since it is pending deprecation, it is not a big deal. If the user wants a predictable behavior, he should use E.iter() and list(E.iter()).
Sign in to reply to this message.
http://codereview.appspot.com/207048/diff/8001/9006 File Lib/xml/etree/ElementTree.py (right): http://codereview.appspot.com/207048/diff/8001/9006#newcode1642 Lib/xml/etree/ElementTree.py:1642: from ElementC14N import _serialize_c14n It's a bad idea in general, absolutely, but they're in distinct namespaces and the save code should be pretty well isolated (it's designed to work for mixtures of ElementTree and cElementTree, after all). But as I said in my original (misplaced) comment re C14N, we can revisit this later; we don't have to solve it for 1.3. http://codereview.appspot.com/207048/diff/8001/9010 File Modules/_elementtree.c (left): http://codereview.appspot.com/207048/diff/8001/9010#oldcode2699 Modules/_elementtree.c:2699: "def getiterator(node, tag=None):\n" /* helper */ My original 1.3 release notes said 2.2 and newer (1.2 supports 1.5.2 and newer). Old Python versions are a lot more common out there than the core developers tend to think, so I'd rather not break compatibility with a given old version for minor issues. Or have you made any other changes that break 2.2 compatibility? The cElementTree changes are all about pre-2.2 support, right?
Sign in to reply to this message.
http://codereview.appspot.com/207048/diff/8001/9010 File Modules/_elementtree.c (left): http://codereview.appspot.com/207048/diff/8001/9010#oldcode2699 Modules/_elementtree.c:2699: "def getiterator(node, tag=None):\n" /* helper */ On 2010/03/10 15:46:16, effbot wrote: > My original 1.3 release notes said 2.2 and newer (1.2 supports 1.5.2 and newer). In the header of ElementTree.py, I did not change it: # light-weight XML support for Python 2.3 and later. So I fixed ElementC14N and cElementTree to preserve 2.3 compatibility, and I installed 2.3 to test them. But I don't see any reason to preserve compatibility for 2.2 in the C implementation only. There's no generators in 2.2 (except from __future__). And there's no guarantee that other things work, because many features depend on ElementTree.py.
Sign in to reply to this message.
(this got stuck in review mode; trying again) http://codereview.appspot.com/207048/diff/8001/9010 File Modules/_elementtree.c (left): http://codereview.appspot.com/207048/diff/8001/9010#oldcode2699 Modules/_elementtree.c:2699: "def getiterator(node, tag=None):\n" /* helper */ Ok, let's go for 2.3. The README at hg.effbot.org says 2.3, and there also an entry in CHANGES about 2.3 as a baseline, so it should be ok. I'll update the effbot.org document I looked at before.
Sign in to reply to this message.
Ready for merge in trunk?
Sign in to reply to this message.
I sampled the patch deltas for the modules we've discussed lately, and assuming you haven't done any radical changes anywhere else (I sampled a few more patch deltas, and they were all empty), go ahead and check it in! (is there a way in rietveld to see all changes from one version of the patch to another on one page, without having to click on all the tiny sevens one at a time?) Cheers /F On Thu, Mar 11, 2010 at 3:16 PM, <florent.xicluna@gmail.com> wrote: > Ready for merge in trunk? > > http://codereview.appspot.com/207048/show >
Sign in to reply to this message.
About serializer encoding
Sign in to reply to this message.
Thank you for reviewing. It is merged in trunk (r78838) and 3.x (r78942). There's a different issue opened about serializer encoding: http://codereview.appspot.com/664043 http://bugs.python.org/issue8047
Sign in to reply to this message.
|