Issue 196062: C extension for SQLAlchemy

Can't Edit
Can't Publish+Mail
Start Review

Created:
16 years, 3 months ago by gdementen

Modified:
16 years, 2 months ago

Reviewers:
Antoine Pitrou

Base URL:
http://svn.sqlalchemy.org/sqlalchemy/trunk/lib/sqlalchemy

Visibility:
Public.

Patch Set 2 : v2 the patch, with all of Antoine's comments taken into account and quite a few other fixes... #

Created: 16 years, 2 months ago

Download [raw] [tar.bz2]

Unified diffs	Side-by-side diffs	Stats (+1438 lines, -407 lines)			Patch
lib/sqlalchemy/cextensions.c	View	1 chunk	+933 lines, -0 lines	0 comments	Download
lib/sqlalchemy/dialects/access/base.py	View	4 chunks	+8 lines, -37 lines	0 comments	Download
lib/sqlalchemy/dialects/informix/base.py	View	1 chunk	+1 line, -1 line	0 comments	Download
lib/sqlalchemy/dialects/maxdb/base.py	View	6 chunks	+69 lines, -53 lines	0 comments	Download
lib/sqlalchemy/dialects/mssql/base.py	View	2 chunks	+3 lines, -14 lines	0 comments	Download
lib/sqlalchemy/dialects/mysql/base.py	View	6 chunks	+10 lines, -4 lines	0 comments	Download
lib/sqlalchemy/dialects/mysql/mysqldb.py	View	2 chunks	+2 lines, -6 lines	0 comments	Download
lib/sqlalchemy/dialects/mysql/oursql.py	View	1 chunk	+2 lines, -6 lines	0 comments	Download
lib/sqlalchemy/dialects/oracle/zxjdbc.py	View	1 chunk	+3 lines, -0 lines	0 comments	Download
lib/sqlalchemy/dialects/postgresql/pg8000.py	View	2 chunks	+6 lines, -19 lines	0 comments	Download
lib/sqlalchemy/dialects/postgresql/psycopg2.py	View	3 chunks	+6 lines, -13 lines	0 comments	Download
lib/sqlalchemy/dialects/postgresql/pypostgresql.py	View	2 chunks	+2 lines, -6 lines	0 comments	Download
lib/sqlalchemy/dialects/sqlite/base.py	View	7 chunks	+22 lines, -26 lines	0 comments	Download
lib/sqlalchemy/dialects/sybase/base.py	View	1 chunk	+1 line, -18 lines	0 comments	Download
lib/sqlalchemy/engine/base.py	View	14 chunks	+195 lines, -147 lines	0 comments	Download
lib/sqlalchemy/processors.py	View	1 chunk	+105 lines, -0 lines	0 comments	Download
lib/sqlalchemy/types.py	View	7 chunks	+57 lines, -49 lines	0 comments	Download
setup.py	View	2 chunks	+7 lines, -2 lines	0 comments	Download
test/sql/test_query.py	View	1 chunk	+6 lines, -6 lines	0 comments	Download

Messages

Total messages: 6

Expand All Messages | Collapse All Messages

Antoine Pitrou

Hello, These are a bunch of comments. I hope they'll help you! Antoine. http://codereview.appspot.com/196062/diff/1/2 File ...

16 years, 3 months ago (2010-01-29 22:28:50 UTC) #2

Hello,

These are a bunch of comments. I hope they'll help you!

Antoine.

http://codereview.appspot.com/196062/diff/1/2
File cextensions.c (right):

http://codereview.appspot.com/196062/diff/1/2#newcode3
cextensions.c:3: #include "structmember.h" // required for PyMemberDef
Just a nit, but you usually shouldn't use C++ style comments. They won't be
supported by all compilers.

http://codereview.appspot.com/196062/diff/1/2#newcode7
cextensions.c:7: - __init__ VS __new__
Choose one of them :)
Note that __new__ is slightly less dangerous when allocating your own resources,
because it can be called only once. __init__ can be called repeatedly on the
same object and you have to be careful about that.
Also, when __init__ fails, the object still exists and its methods or members
can be called. Therefore you have to check that they aren't NULL in all your
methods.

http://codereview.appspot.com/196062/diff/1/2#newcode36
cextensions.c:36: l = PyInt_AsLong(arg);
Lacks a type check. Also, perhaps you should accept longs rather than simply
ints. Therefore I would use something like:

PyObject *zero = PyInt_FromLong(0);
if (zero == NULL)
    return NULL;
return PyObject_RichCompare(arg, zero, Py_NE);

(note: you can cache "zero" in a static PyObject if you want)

http://codereview.appspot.com/196062/diff/1/2#newcode39
cextensions.c:39: * the intended value or an error, but here -1 *is* an error.
This sounds pretty unreasonable. Anybody could call your method with "-1" as
argument and produce nasty effects.

http://codereview.appspot.com/196062/diff/1/2#newcode151
cextensions.c:151: PyDictObject *colfuncs;
Note: in CPython we always declare such members as "PyObject *", even if you
know what the type will be. But I suspect that it's more of a style thing.

http://codereview.appspot.com/196062/diff/1/2#newcode180
cextensions.c:180: };
You should use READONLY for all your fields, because otherwise someone can
modify them to an instance of the wrong type (e.g. "_row" to a non-tuple), and
given how your code is written it will then crash the interpreter.

http://codereview.appspot.com/196062/diff/1/2#newcode187
cextensions.c:187: printf("too few args!\n");
Or too many? :)
By the way, you must obviously set a proper error state here, rather than doing
a printf (using e.g. PyErr_SetString).

http://codereview.appspot.com/196062/diff/1/2#newcode197
cextensions.c:197: //it seems to be faster, but there might be some drawback to
this approach
No obvious drawback. PyArg_UnpackTuple is just easier to use for non-trivial
function call signatures.

http://codereview.appspot.com/196062/diff/1/2#newcode198
cextensions.c:198: if (parent) {
Why the "if"? A completely constructed tuple should never have any NULL members.

http://codereview.appspot.com/196062/diff/1/2#newcode200
cextensions.c:200: self->parent = parent;
As I explained earlier, if this is the __init__ you should do
Py_CLEAR(self->parent) before.

http://codereview.appspot.com/196062/diff/1/2#newcode205
cextensions.c:205: return -1;
Missing an exception too (for example a call to PyErr_SetString() with
PyExc_TypeError).
Ditto for the other, following args.

http://codereview.appspot.com/196062/diff/1/2#newcode248
cextensions.c:248: rowlen = Py_SIZE(values);
PyTuple_GET_SIZE() would be better, although the two are exactly equivalent
nowadays.
Oh, and looking at what the function does, you should also check that values and
processors have the same length, and otherwise raise e.g. a RuntimeError (rather
than crash the interpreter).

http://codereview.appspot.com/196062/diff/1/2#newcode250
cextensions.c:250: result = PyTuple_New(rowlen);
You must check that the return value is not NULL. Memory allocation can always
fail.
(same for the PyListObject below).

http://codereview.appspot.com/196062/diff/1/2#newcode251
cextensions.c:251: resultptr = ((PyTupleObject *)result)->ob_item;
Yikes! Use PySequence_Fast_ITEMS instead.
(same for the PyListObject below).

http://codereview.appspot.com/196062/diff/1/2#newcode258
cextensions.c:258: funcptr = processors->ob_item;
Yikes again.

http://codereview.appspot.com/196062/diff/1/2#newcode294
cextensions.c:294: result = PyObject_GetIter(values);
You should check that values isn't NULL before this.

http://codereview.appspot.com/196062/diff/1/2#newcode316
cextensions.c:316: index = PyLong_AsLong(key);
I think you want PyNumber_AsSsize_t() instead.
(and, yes, your index should be a Py_ssize_t).

http://codereview.appspot.com/196062/diff/1/2#newcode322
cextensions.c:322: 1);
You should first check that values and processors are not NULL.
Oh, and you must also Py_DECREF() them after the call to
BaseRowProxy_processvalues.

http://codereview.appspot.com/196062/diff/1/2#newcode331
cextensions.c:331: key, NULL);
You must Py_DECREF(funcname) now.

http://codereview.appspot.com/196062/diff/1/2#newcode335
cextensions.c:335: indexobject = PyTuple_GET_ITEM(record, 1);
You must check that it is a tuple first... Or you can use PySequence_GetItem()
instead if you want (but check for NULL on return).

http://codereview.appspot.com/196062/diff/1/2#newcode336
cextensions.c:336: index = PyInt_AsLong(indexobject);
PyNumber_AsSsize_t() would do the required type check (but you must check for
NULL on return).

http://codereview.appspot.com/196062/diff/1/2#newcode389
cextensions.c:389: static PyMappingMethods BaseRowProxy_as_mapping = {
I don't think you need both PySequenceMethods and PyMappingMethods. Choose
whatever suits your type.

http://codereview.appspot.com/196062/diff/1/2#newcode416
cextensions.c:416: Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,        /* tp_flags
*/
Your object points to other, arbitrary PyObjects and as such it should be
GC-enabled. 
You should add Py_TPFLAGS_HAVE_GC here, and implement tp_clear and tp_traverse.

http://codereview.appspot.com/196062/diff/1/2#newcode458
cextensions.c:458: }
Er, this is terribly wrong. The "encoding" pointer will only be valid as long as
the original argument (the corresponding PyObject) is still alive.
You could store the PyObject instead, and use PyString_AS_STRING when necessary.
(you can use the "S" typecode in PyArg_ParseTupleAndKeywords, AFAICT)

http://codereview.appspot.com/196062/diff/1/2#newcode472
cextensions.c:472: return PyUnicode_Decode(str, strlen(str), self->encoding,
self->errors);
Never use strlen(). The string may e.g. contain "\0" characters in its middle.
Use PyString_Size or PyString_GET_SIZE instead.

http://codereview.appspot.com/196062/diff/1/2#newcode558
cextensions.c:558: }
I think this is backwards. Use PyFloat_Check or PyFloat_CheckExact, and pass the
argument as is if false.

By the way, I think I have seen MySQL return longs even for small integers. And
what would happen if your database contains an integer that is too large to be
kept as a PyIntObject? It would be kept as a PyLongObject instead.

I also think converting the float to a string is wrong. If the DBAPI doesn't
return a string in the first place, the decimal number can already have lost
some precision in the process. What does the Python version of the code do?

http://codereview.appspot.com/196062/diff/1/2#newcode588
cextensions.c:588: Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,        /*tp_flags*/
If the "type" member can be any arbitrary object, DecimalResultProcessorType
should probably be GC-enabled. Problems don't seem likely for normal uses
though.

gdementen

On Fri, Jan 29, 2010 at 23:28, <antoine.pitrou@gmail.com> wrote: > These are a bunch of ...

16 years, 3 months ago (2010-01-30 10:14:46 UTC) #3

gdementen

http://codereview.appspot.com/196062/diff/1/2 File cextensions.c (right): http://codereview.appspot.com/196062/diff/1/2#newcode36 cextensions.c:36: l = PyInt_AsLong(arg); On 2010/01/29 22:28:50, Antoine Pitrou wrote: ...

16 years, 3 months ago (2010-01-30 15:23:27 UTC) #4

http://codereview.appspot.com/196062/diff/1/2
File cextensions.c (right):

http://codereview.appspot.com/196062/diff/1/2#newcode36
cextensions.c:36: l = PyInt_AsLong(arg);
On 2010/01/29 22:28:50, Antoine Pitrou wrote:
> Lacks a type check. Also, perhaps you should accept longs rather than simply
> ints. Therefore I would use something like:
> 
> PyObject *zero = PyInt_FromLong(0);
> if (zero == NULL)
>     return NULL;
> return PyObject_RichCompare(arg, zero, Py_NE);
> 
> (note: you can cache "zero" in a static PyObject if you want)
> 

Yes, the point is to accept both int and longs, but AFAICT, PyInt_AsLong works
for longs too (converting if necessary).

Also the contract here is clearly: None -> None, 0 -> False, 1 -> True, anything
else is undefined. Yes, it would be nicer to raise an explicit exception but it
is not too critical as long as it does not segfault or does anything as ugly. 

If we find out that some DBAPI return something else, I'd rather use a different
function than produce a generic and slower one. The RichCompare method is slower
than the simple comparison below, right?

http://codereview.appspot.com/196062/diff/1/2#newcode39
cextensions.c:39: * the intended value or an error, but here -1 *is* an error.
On 2010/01/29 22:28:50, Antoine Pitrou wrote:
> This sounds pretty unreasonable. Anybody could call your method with "-1" as
> argument and produce nasty effects.

Well, this is outside the spec of the function, so no it's not unreasonable to
fail in that case. I realize now that no exception is set in that case and that
should be fixed indeed.

http://codereview.appspot.com/196062/diff/1/2#newcode180
cextensions.c:180: };
On 2010/01/29 22:28:50, Antoine Pitrou wrote:
> You should use READONLY for all your fields, because otherwise someone can
> modify them to an instance of the wrong type (e.g. "_row" to a non-tuple), and
> given how your code is written it will then crash the interpreter.
> 
Well... _row should really support other sequences too. As for having them
readonly, this is not an option: at least "__setstate__" and __init__ of some
(python) subclasses modify those members. It would probably be safer to allow
any kind of sequence for _processors, and any abstract mapping for _colfuncs but
well, that can wait as only "dialect" code can potentially use this code so this
can be set in the spec. On the other hand, I guess I'll need to add a check for
the correct type in all methods so that it fails gracefully instead of a random
crash. sigh...

http://codereview.appspot.com/196062/diff/1/2#newcode187
cextensions.c:187: printf("too few args!\n");
On 2010/01/29 22:28:50, Antoine Pitrou wrote:
> Or too many? :)
> By the way, you must obviously set a proper error state here, rather than
doing
> a printf (using e.g. PyErr_SetString).
Of course. But this was just my debugging aid. With my latest modification to
the Python part, pickling works for single instances of RowProxy, but when I
pickle a list of instances, the init function gets called with a single
argument. Any idea why this happens?

http://codereview.appspot.com/196062/diff/1/2#newcode198
cextensions.c:198: if (parent) {
On 2010/01/29 22:28:50, Antoine Pitrou wrote:
> Why the "if"? A completely constructed tuple should never have any NULL
members.
> 

I just copied the pattern from the tutorial:
http://docs.python.org/extending/newtypes.html#adding-data-and-methods-to-the...

http://codereview.appspot.com/196062/diff/1/2#newcode251
cextensions.c:251: resultptr = ((PyTupleObject *)result)->ob_item;
On 2010/01/29 22:28:50, Antoine Pitrou wrote:
> Yikes! Use PySequence_Fast_ITEMS instead.
> (same for the PyListObject below).
> 

Well, the macro does essentially the same thing, just adding a check that is not
needed since the sequence is created on the instruction just before so we know
for sure its exact type. I used it anyway since it makes the code a bit more
compact but still not "yikes"-worth IMO ;-).

http://codereview.appspot.com/196062/diff/1/2#newcode389
cextensions.c:389: static PyMappingMethods BaseRowProxy_as_mapping = {
On 2010/01/29 22:28:50, Antoine Pitrou wrote:
> I don't think you need both PySequenceMethods and PyMappingMethods. Choose
> whatever suits your type.

Well, that's what I thought too, but then when I tested my code, the length
method was very slow until I defined it as a sequence too. Go figure...

http://codereview.appspot.com/196062/diff/1/2#newcode416
cextensions.c:416: Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,        /* tp_flags
*/
On 2010/01/29 22:28:50, Antoine Pitrou wrote:
> Your object points to other, arbitrary PyObjects and as such it should be
> GC-enabled. 
> You should add Py_TPFLAGS_HAVE_GC here, and implement tp_clear and
tp_traverse.
> 

Argl... I thought I could avoid that as there shouldn't be any cycle in any sane
use. And I have no idea how to write those methods.

http://codereview.appspot.com/196062/diff/1/2#newcode558
cextensions.c:558: }
On 2010/01/29 22:28:50, Antoine Pitrou wrote:
> I think this is backwards. Use PyFloat_Check or PyFloat_CheckExact, and pass
the
> argument as is if false.
> 
> By the way, I think I have seen MySQL return longs even for small integers.
And
> what would happen if your database contains an integer that is too large to be
> kept as a PyIntObject? It would be kept as a PyLongObject instead.
> 
> I also think converting the float to a string is wrong. If the DBAPI doesn't
> return a string in the first place, the decimal number can already have lost
> some precision in the process. What does the Python version of the code do?
> 

Well, this is coherent with the Python version. I agree with you, but I lost the
argument at:
http://www.sqlalchemy.org/trac/ticket/1625

http://codereview.appspot.com/196062/diff/1/2#newcode588
cextensions.c:588: Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,        /*tp_flags*/
On 2010/01/29 22:28:50, Antoine Pitrou wrote:
> If the "type" member can be any arbitrary object, DecimalResultProcessorType
> should probably be GC-enabled. Problems don't seem likely for normal uses
> though.

Well, it really should be either decimal.Decimal or fastdec.mpd, or any other
alternative decimal implementation.

Antoine Pitrou

16 years, 3 months ago (2010-01-30 16:30:30 UTC) #5

http://codereview.appspot.com/196062/diff/1/2
File cextensions.c (right):

http://codereview.appspot.com/196062/diff/1/2#newcode36
cextensions.c:36: l = PyInt_AsLong(arg);
On 2010/01/30 15:23:27, gdementen wrote:
> 
> If we find out that some DBAPI return something else, I'd rather use a
different
> function than produce a generic and slower one. The RichCompare method is
slower
> than the simple comparison below, right?

Certainly yes.

http://codereview.appspot.com/196062/diff/1/2#newcode180
cextensions.c:180: };
> On the other hand, I guess I'll need to add a check for
> the correct type in all methods so that it fails gracefully instead of a
random
> crash. sigh...

Yes, you need some type checks in your methods. Or, more appropriately, you can
use getters and setters (see tp_getset) so as to raise a TypeError as soon as
the user attempts to set a value of a wrong type.

http://codereview.appspot.com/196062/diff/1/2#newcode187
cextensions.c:187: printf("too few args!\n");
On 2010/01/30 15:23:27, gdementen wrote:
> Of course. But this was just my debugging aid. With my latest modification to
> the Python part, pickling works for single instances of RowProxy, but when I
> pickle a list of instances, the init function gets called with a single
> argument. Any idea why this happens?

No, sorry. I don't know pickle very well.
You could try to implement a pickling method, though, such as __reduce__ or
__setstate__.

http://codereview.appspot.com/196062/diff/1/2#newcode198
cextensions.c:198: if (parent) {
> I just copied the pattern from the tutorial:
>
http://docs.python.org/extending/newtypes.html#adding-data-and-methods-to-the...

Well, the tutorial calls PyArg_ParseTupleAndKeywords() and specifies that some
arguments are optional, therefore their pointers can be NULL because they
/aren't/ in the tuple :) The tuple itself can't contain any NULLs.

http://codereview.appspot.com/196062/diff/1/2#newcode251
cextensions.c:251: resultptr = ((PyTupleObject *)result)->ob_item;
> Well, the macro does essentially the same thing, just adding a check that is
not
> needed since the sequence is created on the instruction just before so we know
> for sure its exact type. I used it anyway since it makes the code a bit more
> compact but still not "yikes"-worth IMO ;-).

The reason there is a macro is that we can change the internal implementation of
lists if we want, without breaking your code :) (since we would then change the
expansion of the macro)

If you bypass the macro and access the internal attributes directly, you are
depending on an implementation detail which might change one day (although it
hasn't changed a lot).

http://codereview.appspot.com/196062/diff/1/2#newcode416
cextensions.c:416: Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,        /* tp_flags
*/
> 
> Argl... I thought I could avoid that as there shouldn't be any cycle in any
sane
> use. And I have no idea how to write those methods.

Well you can avoid that if you are sure there won't be any cycle "in sane use".
Beware that any "insane" use might produce reference leaks though ;)

gdementen

16 years, 2 months ago (2010-02-08 20:44:10 UTC) #6

v2 the patch, with all of Antoine's comments taken into account and quite a few
other fixes...

Expand All Messages | Collapse All Messages