Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(101)

Issue 1694050: Add Unladden Swallow's optimizations to Python 3's pickle.

Can't Edit
Can't Publish+Mail
Start Review
Created:
14 years, 5 months ago by Alexandre Vassalotti
Modified:
14 years, 5 months ago
Reviewers:
Antoine Pitrou
Visibility:
Public.

Description

Report on Linux ion 2.6.32-24-generic #38-Ubuntu SMP Mon Jul 5 09:20:59 UTC 2010 x86_64 Total CPU cores: 2 pickle: Min: 2.233613 -> 0.954951: 133.90% faster Avg: 2.243726 -> 0.958405: 134.11% faster Significant (t=776.183816, a=0.95) Stddev: 0.01079 -> 0.00456: 136.54% smaller pickle_dict: Min: 0.993324 -> 0.568035: 74.87% faster Avg: 0.998166 -> 0.569487: 75.27% faster Significant (t=615.184784, a=0.95) Stddev: 0.00485 -> 0.00087: 458.50% smaller pickle_list: Min: 0.896554 -> 0.335743: 167.04% faster Avg: 0.903450 -> 0.342434: 163.83% faster Significant (t=531.142212, a=0.95) Stddev: 0.00725 -> 0.00181: 301.34% smaller unpickle: Min: 3.950370 -> 0.938152: 321.08% faster Avg: 3.965907 -> 0.947308: 318.65% faster Significant (t=1125.467516, a=0.95) Stddev: 0.01770 -> 0.00680: 160.32% smaller unpickle_list: Min: 2.920273 -> 0.529199: 451.83% faster Avg: 2.937320 -> 0.534756: 449.28% faster Significant (t=1258.585781, a=0.95) Stddev: 0.01228 -> 0.00561: 118.77% smaller

Patch Set 1 : Clean up patch for review. #

Total comments: 29

Patch Set 2 : Bug fixes and buffering improvements #

Patch Set 3 : Fix memory leak. #

Unified diffs Side-by-side diffs Delta from patch set Stats (+1813 lines, -536 lines) Patch
M Lib/pickle.py View 1 chunk +0 lines, -23 lines 0 comments Download
M Modules/_pickle.c View 1 2 132 chunks +1813 lines, -513 lines 0 comments Download

Messages

Total messages: 2
Antoine Pitrou
http://codereview.appspot.com/1694050/diff/2001/3001 File Modules/_pickle.c (right): http://codereview.appspot.com/1694050/diff/2001/3001#newcode210 Modules/_pickle.c:210: new_allocated = (allocated >> 3) + 6; Is there ...
14 years, 5 months ago (2010-07-29 10:07:44 UTC) #1
Alexandre Vassalotti
14 years, 5 months ago (2010-08-02 00:50:21 UTC) #2
Thanks Antoine for the review!

I have made many improvements to the patch, but there are still many things that
remains to be done.

http://codereview.appspot.com/1694050/diff/2001/3001
File Modules/_pickle.c (right):

http://codereview.appspot.com/1694050/diff/2001/3001#newcode210
Modules/_pickle.c:210: new_allocated = (allocated >> 3) + 6;
The goal of these changes were to make the overflow checks simpler. Anyway,
increasing allocation rate doesn't show any significant performance
improvements.

http://codereview.appspot.com/1694050/diff/2001/3001#newcode323
Modules/_pickle.c:323: char *output_buffer;
Done.

http://codereview.appspot.com/1694050/diff/2001/3001#newcode347
Modules/_pickle.c:347: PyObject **memo;
You cannot resize a list object arbitrarily after you create it. This means,
with a list, you would need to create a new list object every time you want to
resize it.

However, the way this memo is resized is prone to DoS exploit. For example,
loading this pickle b'\x80\x02]r\xff\xff\xff\x0f.' will cause a huge memory
allocation. This is ugly and unacceptable.

http://codereview.appspot.com/1694050/diff/2001/3001#newcode355
Modules/_pickle.c:355: PyObject *py_input;
On 2010/07/29 10:07:44, Antoine Pitrou wrote:
> You could acquire a Py_buffer to the object instead, this would solve the
> refcount problem as well as your XXX later on.
> 

Done.

http://codereview.appspot.com/1694050/diff/2001/3001#newcode355
Modules/_pickle.c:355: PyObject *py_input;
On 2010/07/29 10:07:44, Antoine Pitrou wrote:
> You could acquire a Py_buffer to the object instead, this would solve the
> refcount problem as well as your XXX later on.
> 

Done.

http://codereview.appspot.com/1694050/diff/2001/3001#newcode644
Modules/_pickle.c:644: of 'func'. */
I know. I am the author of these comments. :-)

http://codereview.appspot.com/1694050/diff/2001/3001#newcode646
Modules/_pickle.c:646: _Pickler_FastCall(PicklerObject *self, PyObject *func,
PyObject *arg)
It's just to make the internal API more consistent.

http://codereview.appspot.com/1694050/diff/2001/3001#newcode661
Modules/_pickle.c:661: memset(self->output_buffer, 0, sizeof(char) *
self->output_len);
On 2010/07/29 10:07:44, Antoine Pitrou wrote:
> Why is memset() needed at all?

Done.

http://codereview.appspot.com/1694050/diff/2001/3001#newcode680
Modules/_pickle.c:680: output = _Pickler_GetString(self);
I fixed this by using a bytes object as the buffer. However, I am not too happy
with the current API. I am planning to fix  the API along with the unbounded
buffer issue.

http://codereview.appspot.com/1694050/diff/2001/3001#newcode730
Modules/_pickle.c:730: return NULL;
On 2010/07/29 10:07:44, Antoine Pitrou wrote:
> If you return NULL, you have to Py_DECREF(self) first.
> 

Done.

http://codereview.appspot.com/1694050/diff/2001/3001#newcode738
Modules/_pickle.c:738: return NULL;
On 2010/07/29 10:07:44, Antoine Pitrou wrote:
> Same as above.

Done.

http://codereview.appspot.com/1694050/diff/2001/3001#newcode830
Modules/_pickle.c:830: Py_INCREF(input);
On 2010/07/29 10:07:44, Antoine Pitrou wrote:
> Usually you Py_INCREF the new object before Py_DECREFing the old one (because
of
> possible side effects, although it's rather unlikely here).
> 

Done.

http://codereview.appspot.com/1694050/diff/2001/3001#newcode869
Modules/_pickle.c:869: data = _Unpickler_FastCall(self, self->read, len);
_Unpickler_FastCall() steals the reference to len here. So, we should not
Py_DECREF(len).

Yeah, I know. It's ugly.

http://codereview.appspot.com/1694050/diff/2001/3001#newcode1048
Modules/_pickle.c:1048: return NULL;
On 2010/07/29 10:07:44, Antoine Pitrou wrote:
> Again, you must Py_DECREF(self) here.

Done.

http://codereview.appspot.com/1694050/diff/2001/3001#newcode1053
Modules/_pickle.c:1053: return NULL;
On 2010/07/29 10:07:44, Antoine Pitrou wrote:
> Here as well.
> 

Done.
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b