|
|
DescriptionThis patch is intended to fix http://bugs.python.org/issue22448. Option b) was implemented.
The measure to indicate that scheduled event filtering (to remove cancelled events) needs to be performed is
1) There are at least 100 events scheduled.
This prevents needless reheapifying if there are only a handful of events.
2) More than 50% of scheduled events are cancelled.
The use of a percentage ensures that a list of 10,000 events isn't reheaped for some small fixed number of cancelled events. Otherwise 50% was just chosen as it seemed a reasonable measure.
Note that _cancel_count is not always completely accurate, nor does it need to be:
- It may be possible for events that are _ready to be cancelled, and it may be possible for race conditions to occur if two threads are cancelling the same timer handle at the same time.
- Other threads may cancel events while _scheduled events are being rebuilt.
Note that I added a condition (events.py) to avoid repeated cancellation messing up the _cancel_count. Once I added that, I thought it would make sense to include the rest of the method in the condition. This has the added benefit of the string representation of the arguments not getting wiped on repeated cancellation.
test_base_events.py and test_events.py were both updated to test these changes.
Patch Set 1 #Patch Set 2 : pep8 line length #Patch Set 3 : Test that cancelled events at head of scheduled list are accounted for #Patch Set 4 : Moved double cancellation test to debug test #Patch Set 5 : Removed blank indentation #Patch Set 6 : Better documentation for test case #Patch Set 7 : Removed redundant leading spaces on blank lines #
Total comments: 2
Patch Set 8 : Demonstrates options on how to stop overcounting events + misc feedback fixes #Patch Set 9 : Track scheduled handles by flag inline #
Total comments: 13
Patch Set 10 : Various nits fixed #Patch Set 11 : Test code that ties into new constants fixed #
Total comments: 5
Patch Set 12 : Minor fixes based on latest feedback #
Total comments: 1
MessagesTotal messages: 24
https://codereview.appspot.com/145220043/diff/110001/asyncio/base_events.py File asyncio/base_events.py (right): https://codereview.appspot.com/145220043/diff/110001/asyncio/base_events.py#n... asyncio/base_events.py:148: self._cancel_count = 0 Maybe we can come up with a bit more descriptive name? https://codereview.appspot.com/145220043/diff/110001/asyncio/base_events.py#n... asyncio/base_events.py:995: if sched_count > 100 and self._cancel_count / sched_count > 0.5: 100 and 0.5 should be constants defined somewhere on top of the file. Also, while I understand what you want to do with the second check (cleanup only when 50% of tasks are cancelled), I'm not sure it's such a good idea. I would just set the threshold to a bit higher number (say 1000) and cleanup the queue every time we hit it. After all, it takes python about 50ms to rebuild a list of 1000000 items.
Sign in to reply to this message.
On 2014/09/20 22:42:44, yselivanov wrote: > https://codereview.appspot.com/145220043/diff/110001/asyncio/base_events.py > File asyncio/base_events.py (right): > > https://codereview.appspot.com/145220043/diff/110001/asyncio/base_events.py#n... > asyncio/base_events.py:148: self._cancel_count = 0 > Maybe we can come up with a bit more descriptive name? self._cancelled_count = 0 ? Anything much more descriptive would be a fairly long name. I haven't thought of anything better, any ideas? > > https://codereview.appspot.com/145220043/diff/110001/asyncio/base_events.py#n... > asyncio/base_events.py:995: if sched_count > 100 and self._cancel_count / > sched_count > 0.5: > 100 and 0.5 should be constants defined somewhere on top of the file. No problem doing this, but will wait until the following is resolved before I do. > > Also, while I understand what you want to do with the second check (cleanup only > when 50% of tasks are cancelled), I'm not sure it's such a good idea. I would > just set the threshold to a bit higher number (say 1000) and cleanup the queue > every time we hit it. After all, it takes python about 50ms to rebuild a list of > 1000000 items. I understand your reasoning, but I strongly disagree. Taking just 1,000,000 - If it takes 50ms to rebuild the 1,000,000 list, which would be built as the result of 500,000 active events and 500,000 cancelled events. i) The current algorithm would take 50ms if there were 500,000 uncancelled events. ii) The proposed change (static of 1000) would take ~ 500 * 25ms == 12.5 seconds!!! Explanation: - Once the system got to a steady state of 500,000 uncancelled events (and 1M == 50ms, 0.5M == 25ms) every time 1000 events were cancelled, a 25ms penalty would be paid. IMO, a core element such as an event scheduler should never be O(N) (which it still is with a static limit).
Sign in to reply to this message.
Thanks, this is a good start. Some remarks in addition to Yuri's review: - You are effectively adding a new interface to the event loop class. It is legal to implement a new event loop that derives from AbstractEventLoop but not from BaseEventLoop. So you either have to add _handle_cancelled to AbstractEventLoop (so people know that they have to implement it), or you have to check whether it exists before calling it. - The _scheduled list can only contain TimerHandle instances; I think the _handle_cancelled call should be made in that class's cancel() method, so you don't count cancellations of immediate callbacks. - I'm in agreement with you that we should avoid O(N) behavior and I like your current approach to rebuild at a fixed percentage. I'm not sure how to tune the percentage at which we decide to rebuild. Note that the heapify() call contributes way more than the list comprehension. - You are overcounting cancelled events -- calling cancel() after the callback has already run is common, e.g. sleep() does this. But I'm not sure of the best way to avoid this. Setting _cancelled when the callback is run feels wrong (it affects the repr) and Handle objects may be run multiple times (when used as I/O callbacks). Maybe a separate flag that only TimerHandle instances have. But you could still overcount, as the TimerHandle is transferred from _scheduled to _ready once it is due, but it may still be cancelled before it gets run. Perhaps none of this matters.
Sign in to reply to this message.
On 2014/09/21 02:30:59, GvR wrote: > Thanks, this is a good start. Some remarks in addition to Yuri's review: On the topic of Yuri's review: Constants were defined at the top of the file. Renaming has become less of a concern since I have pulled the scheduled logic into a separate class (details below), and all names are now relative to that class. > > - You are effectively adding a new interface to the event loop class. It is > legal to implement a new event loop that derives from AbstractEventLoop but not > from BaseEventLoop. So you either have to add _handle_cancelled to > AbstractEventLoop (so people know that they have to implement it), or you have > to check whether it exists before calling it. I have added the cancelled callback to AbstractEventLoop. > > - The _scheduled list can only contain TimerHandle instances; I think the > _handle_cancelled call should be made in that class's cancel() method, so you > don't count cancellations of immediate callbacks. Done. Additionally, I have renamed the method to _timer_handle_cancelled > > - I'm in agreement with you that we should avoid O(N) behavior and I like your > current approach to rebuild at a fixed percentage. I'm not sure how to tune the > percentage at which we decide to rebuild. Note that the heapify() call > contributes way more than the list comprehension. No ideas on how to tune the percentage. 50% was chosen simply on the heuristic "If there are more cancelled events than non-cancelled events, it's probably time to clean". > > - You are overcounting cancelled events -- calling cancel() after the callback > has already run is common, e.g. sleep() does this. But I'm not sure of the best > way to avoid this. Setting _cancelled when the callback is run feels wrong (it > affects the repr) and Handle objects may be run multiple times (when used as I/O > callbacks). Maybe a separate flag that only TimerHandle instances have. But > you could still overcount, as the TimerHandle is transferred from _scheduled to > _ready once it is due, but it may still be cancelled before it gets run. > Perhaps none of this matters. The algorithm will still function correctly if the count was off, but it will affect performance especially with the cancel semantics of sleep. It could be handled by tracking whether the TimerHandle is in the _scheduled list or not. If we can track that, then on cancellation if the handle is in _scheduled we alter the count, otherwise we ignore the cancellation. I made a fairly major change to the code, to include two different ways we could track that. Additionally, since in some cases it resulted in keeping 3 variables in sync, I made separate SchedBy* classes. (This was a bit of a pain since the unit tests expect direct access to a list when testing, but I made it work at least for the purposes of demonstration. We could always pull all variables back into BaseEventLoop and discard the classes if that is preferred, or we could alter the tests and get rid of a lot of the methods in the SchedBy* classes that are marked "for testing purposes only"). You can flip between the two types by editing lines 301 and 302 in base_events.py. SchedByFlag uses a _scheduled flag in TimerHandle that it modifies to track whether it is in the _scheduled list or not. SchedBySet uses a set to keep track of what exists in the _scheduled list and what does not. Note that _scheduled would be removed from TimerHandle if this algorithm was chosen as it is not needed. Finally, note it is very likely these SchedBy* classes are buggy. No testing has been done yet, they exist merely for demonstration of the concept. And I would probably rename the classes too if we decide to use a separate class and one was deleted. Looking forward to feedback.
Sign in to reply to this message.
Note: Apologies for the terrible formatting of my last post, I didn't realize it automatically breaks lines.
Sign in to reply to this message.
On 2014/09/21 18:59:03, chatgris wrote: > Note: Apologies for the terrible formatting of my last post, I didn't realize it > automatically breaks lines. Well, if you *do* want to break lines, break them around 72 chars, then you'll be safe. :-) Anyway, let's look at the various proposed solutions: (A) over-count cancellations when cancel() happens to a handle that's not scheduled, like in your original proposal (B) track whether a handle is in the scheduled queue (B1) using a flag in each timer handle (B2) using a separate set of scheduled timer handles I don't think B2 is needed -- the TimerHandle class is ours to modify, and having an extra field in every instance takes up less space than the space required for the set, *unless* the percentage of cancelled timer handles is under 25% or so. (Back-of-the-envelop calculation after observing that the memory taken by a set of N elements is around 32 bytes per element, i.e. 4 pointers -- the extra slot only costs one pointer per timer handle.) The set version is also always slower and takes more code. So IMO the contest is between (A) and (B1). Your current version of B1 looks pretty clunky -- perhaps you can inline it? There are really only very few choke points -- timer handles are only placed in the scheduled queue by call_at() and _add_callback(), and only removed by _run_once() (inn two different places). (Plus, I think _add_callback() should never be called with a TimerHandle argument. There's a test for this, but it's an internal method and none of the call sites should ever create a TimerHandle. I think these things are all relics from earlier designs.) I'll refrain from more in-detail review until you've got a new version up. FWIW, we've now missed the 3.4.2 release candidate and I think this is too much new stuff to sneak into a release at the last moment, so there's less time pressure.
Sign in to reply to this message.
On 2014/09/22 18:14:30, GvR wrote: > On 2014/09/21 18:59:03, chatgris wrote: > > Note: Apologies for the terrible formatting of my last post, I didn't realize > it > > automatically breaks lines. > > Well, if you *do* want to break lines, break them around 72 chars, then you'll > be safe. :-) > > Anyway, let's look at the various proposed solutions: > > (A) over-count cancellations when cancel() happens to a handle that's not > scheduled, like in your original proposal > (B) track whether a handle is in the scheduled queue > (B1) using a flag in each timer handle > (B2) using a separate set of scheduled timer handles > > I don't think B2 is needed -- the TimerHandle class is ours to modify, and > having an extra field in every instance takes up less space than the space > required for the set, *unless* the percentage of cancelled timer handles is > under 25% or so. (Back-of-the-envelop calculation after observing that the > memory taken by a set of N elements is around 32 bytes per element, i.e. 4 > pointers -- the extra slot only costs one pointer per timer handle.) The set > version is also always slower and takes more code. > > So IMO the contest is between (A) and (B1). We could tune the % cleanup to be 25% to keep memory usage down for (B2). IMO, the contest is between (B1) and (B2), because if (B2) is being rejected due to performance, then (A) should be rejected as well (due to the increased frequency of heapq.heapify). > Your current version of B1 looks > pretty clunky -- perhaps you can inline it? There are really only very few > choke points -- timer handles are only placed in the scheduled queue by > call_at() and _add_callback(), and only removed by _run_once() (inn two > different places). (B1) has been inlined in the latest patch set. > > (Plus, I think _add_callback() should never be called with a TimerHandle > argument. There's a test for this, but it's an internal method and none of the > call sites should ever create a TimerHandle. I think these things are all relics > from earlier designs.) Well, I'm happy to remove the "if isinstance(handle, events.TimerHandle):" block if you'd like me to from _add_callback(). Based on your reasoning it appears that it is redundant.
Sign in to reply to this message.
The patch is getting better and better! I'm +1 for B1 strategy you guys have decided upon. Below are some nits and suggestions. https://codereview.appspot.com/145220043/diff/150001/asyncio/base_events.py File asyncio/base_events.py (right): https://codereview.appspot.com/145220043/diff/150001/asyncio/base_events.py#n... asyncio/base_events.py:45: _MIN_SCHEDULED_HANDLES = 100 I'd prefer to see TIMER word in these names too. https://codereview.appspot.com/145220043/diff/150001/asyncio/base_events.py#n... asyncio/base_events.py:155: self._cancelled_count = 0 How about _timer_cancelled_count? https://codereview.appspot.com/145220043/diff/150001/asyncio/base_events.py#n... asyncio/base_events.py:1002: heapq.heappop(self._scheduled)._scheduled = False I'd split this line in two. https://codereview.appspot.com/145220043/diff/150001/asyncio/base_events.py#n... asyncio/base_events.py:1007: and self._cancelled_count / sched_count > _MIN_CANCELLED_HANDLES_PCT): This indent looks shaggy... https://codereview.appspot.com/145220043/diff/150001/asyncio/base_events.py#n... asyncio/base_events.py:1056: handle._scheduled = False Maybe we should add a method to TimerHandle for True/False flips? scheduled()/unscheduled()? https://codereview.appspot.com/145220043/diff/150001/asyncio/events.py File asyncio/events.py (right): https://codereview.appspot.com/145220043/diff/150001/asyncio/events.py#newcod... asyncio/events.py:249: def _timer_handle_cancelled(self, handle): A docstring here would be nice. https://codereview.appspot.com/145220043/diff/150001/tests/test_base_events.py File tests/test_base_events.py (right): https://codereview.appspot.com/145220043/diff/150001/tests/test_base_events.p... tests/test_base_events.py:321: for x in range(50): It would also be great if you can tie your tests logic to the new constants you have defined (as somebody may go and tweak them later), not numbers.
Sign in to reply to this message.
I'm with Yuri's comments (except where I explicitly disagree :-). Here are some nits. https://codereview.appspot.com/145220043/diff/150001/asyncio/base_events.py File asyncio/base_events.py (right): https://codereview.appspot.com/145220043/diff/150001/asyncio/base_events.py#n... asyncio/base_events.py:49: _MIN_CANCELLED_HANDLES_PCT = 0.5 It is not a percentage (that would be 50), it's a fraction. https://codereview.appspot.com/145220043/diff/150001/asyncio/base_events.py#n... asyncio/base_events.py:976: if isinstance(handle, events.TimerHandle): Fine to replace this with assert not isinstance(handle, events.TimerHandle). You also have to delete the one test that tests this. https://codereview.appspot.com/145220043/diff/150001/asyncio/base_events.py#n... asyncio/base_events.py:1004: # Remove delayed calls that were cancelled if their number is too high I suggest move this before the preceding while-loop. In the case where we decide to re-heapify, any effort spent on popping cancelled items from the front is wasted. https://codereview.appspot.com/145220043/diff/150001/asyncio/base_events.py#n... asyncio/base_events.py:1056: handle._scheduled = False On 2014/09/23 16:56:04, yselivanov wrote: > Maybe we should add a method to TimerHandle for True/False flips? > scheduled()/unscheduled()? I don't think so. The event loop already reaches into *Handle instances; the only public method is cancel() which is for the app, not for internal use. https://codereview.appspot.com/145220043/diff/150001/asyncio/events.py File asyncio/events.py (right): https://codereview.appspot.com/145220043/diff/150001/asyncio/events.py#newcod... asyncio/events.py:188: You don't need this blank line. https://codereview.appspot.com/145220043/diff/150001/tests/test_events.py File tests/test_events.py (right): https://codereview.appspot.com/145220043/diff/150001/tests/test_events.py#new... tests/test_events.py:1900: '<Handle cancelled noop(1, 2) at %s:%s created at %s:%s>' This line is too long.
Sign in to reply to this message.
Added new patch set to fix nits, will reply to a few specific issues in two replies. On 2014/09/23 16:56:05, yselivanov wrote: > https://codereview.appspot.com/145220043/diff/150001/asyncio/base_events.py#n... > asyncio/base_events.py:1007: and self._cancelled_count / sched_count > > _MIN_CANCELLED_HANDLES_PCT): > This indent looks shaggy... prefixing the and was my attempt to resolve this issue in pep8 "When the conditional part of an if-statement is long enough to require that it be written across multiple lines, it's worth noting that the combination of a two character keyword (i.e. if), plus a single space, plus an opening parenthesis creates a natural 4-space indent for the subsequent lines of the multiline conditional. This can produce a visual conflict with the indented suite of code nested inside the if-statement, which would also naturally be indented to 4 spaces. " However, it has been changed as your request. > https://codereview.appspot.com/145220043/diff/150001/tests/test_base_events.p... > tests/test_base_events.py:321: for x in range(50): > It would also be great if you can tie your tests logic to the new constants you > have defined (as somebody may go and tweak them later), not numbers. This is done. Note on line 303 I added # This test is invalid if _MIN_SCHEDULED_TIMER_HANDLES is too low self.assertLessEqual(cancelled_count + not_cancelled_count, base_events._MIN_SCHEDULED_TIMER_HANDLES) Since unittest doesn't have assertInconclusive, and I wanted at least 2 of each type to ensure that looping was performed (as opposed to a single removal) I have effectively added a lower limit of 5 to base_events._MIN_SCHEDULED_TIMER_HANDLES for the purposes of the test.
Sign in to reply to this message.
> https://codereview.appspot.com/145220043/diff/150001/asyncio/base_events.py#n... > asyncio/base_events.py:1004: # Remove delayed calls that were cancelled if their > number is too high > I suggest move this before the preceding while-loop. In the case where we decide > to re-heapify, any effort spent on popping cancelled items from the front is > wasted. Additionally, I have moved it to the else portion of the if statement, as if the statement had already run, it should not be possible to have a cancelled event at the head of the queue. > https://codereview.appspot.com/145220043/diff/150001/tests/test_events.py#new... > tests/test_events.py:1900: '<Handle cancelled noop(1, 2) at %s:%s created at > %s:%s>' > This line is too long. I have also fixed the previous statement I copied it from for consistency.
Sign in to reply to this message.
I just realized that the tests I changed to tie into the new constants is incorrect. Please hold while I fix them.
Sign in to reply to this message.
On 2014/09/23 18:43:27, chatgris wrote: > I just realized that the tests I changed to tie into the new constants is > incorrect. Please hold while I fix them. This should now be fixed.
Sign in to reply to this message.
Looks good, aside from just couple of really small notes. I also like how you refactored your unittest, now it's much easier to follow. https://codereview.appspot.com/145220043/diff/160005/asyncio/base_events.py File asyncio/base_events.py (right): https://codereview.appspot.com/145220043/diff/160005/asyncio/base_events.py#n... asyncio/base_events.py:44: # cancelled handles is performed Please add '.' at the end of the sentence. https://codereview.appspot.com/145220043/diff/160005/asyncio/events.py File asyncio/events.py (right): https://codereview.appspot.com/145220043/diff/160005/asyncio/events.py#newcod... asyncio/events.py:188: super().cancel() I'd move super call out of the 'if' statement. https://codereview.appspot.com/145220043/diff/160005/asyncio/events.py#newcod... asyncio/events.py:249: raise NotImplementedError We need a docstring here. https://codereview.appspot.com/145220043/diff/160005/tests/test_base_events.py File tests/test_base_events.py (right): https://codereview.appspot.com/145220043/diff/160005/tests/test_base_events.p... tests/test_base_events.py:310: self.loop._process_events = mock.Mock() Please move this line to the top of the test, so it's immediately obvious how we patch self.loop here. https://codereview.appspot.com/145220043/diff/160005/tests/test_base_events.p... tests/test_base_events.py:323: 0 < base_events._MIN_CANCELLED_TIMER_HANDLES_FRACTION < 1.0) This check also should be on top of the test.
Sign in to reply to this message.
On 2014/09/23 19:41:02, yselivanov wrote: > https://codereview.appspot.com/145220043/diff/160005/asyncio/base_events.py#n... > asyncio/base_events.py:44: # cancelled handles is performed > Please add '.' at the end of the sentence. I also did this on line 48 for consistency.
Sign in to reply to this message.
Looks good now.
Sign in to reply to this message.
Yuri, if you're happy with it, I have no further comments. Can you commit this for Joshua?
Sign in to reply to this message.
I was in the middle of committing the patch when I saw one more thing I'd like us to discuss. Please see the comment below. https://codereview.appspot.com/145220043/diff/200001/asyncio/base_events.py File asyncio/base_events.py (right): https://codereview.appspot.com/145220043/diff/200001/asyncio/base_events.py#n... asyncio/base_events.py:1006: self._scheduled = [x for x in self._scheduled if not x._cancelled] Just noticed that we have a double iteration over what might be a huge list potentially, and _run_once() is a central part of asyncio. Let's think about combining this list comprehension and the above loop in one thing, like this: new_scheduled = [] for handle in self._scheduled: if handle._cancelled: handle._scheduled = False else: new_scheduled.append(handle) self._scheduled = heapify(new_scheduled) Or should we use deque for new_scheduled? Can you test if there are any performance benefits in rewriting this piece?
Sign in to reply to this message.
On 2014/09/25 02:55:48, yselivanov wrote: > I was in the middle of committing the patch when I saw one more thing I'd like > us to discuss. Please see the comment below. > > https://codereview.appspot.com/145220043/diff/200001/asyncio/base_events.py > File asyncio/base_events.py (right): > > https://codereview.appspot.com/145220043/diff/200001/asyncio/base_events.py#n... > asyncio/base_events.py:1006: self._scheduled = [x for x in self._scheduled if > not x._cancelled] > Just noticed that we have a double iteration over what might be a huge list > potentially, and _run_once() is a central part of asyncio. > > Let's think about combining this list comprehension and the above loop in one > thing, like this: > > new_scheduled = [] > for handle in self._scheduled: > if handle._cancelled: > handle._scheduled = False > else: > new_scheduled.append(handle) > self._scheduled = heapify(new_scheduled) > > Or should we use deque for new_scheduled? Can you test if there are any > performance benefits in rewriting this piece? Heh, I suggested this in an earlier review. I expect it will be only a modest win, since heapify() is O(N) and its constant is much larger. But yeah, good idea to try. Note that heapify() doesn't return anything -- it rearranges the argument in place. So new_scheduled must be a list.
Sign in to reply to this message.
On 2014/09/25 03:16:59, GvR wrote: > On 2014/09/25 02:55:48, yselivanov wrote: > > I was in the middle of committing the patch when I saw one more thing I'd like > > us to discuss. Please see the comment below. > > > > https://codereview.appspot.com/145220043/diff/200001/asyncio/base_events.py > > File asyncio/base_events.py (right): > > > > > https://codereview.appspot.com/145220043/diff/200001/asyncio/base_events.py#n... > > asyncio/base_events.py:1006: self._scheduled = [x for x in self._scheduled if > > not x._cancelled] > > Just noticed that we have a double iteration over what might be a huge list > > potentially, and _run_once() is a central part of asyncio. > > > > Let's think about combining this list comprehension and the above loop in one > > thing, like this: > > > > new_scheduled = [] > > for handle in self._scheduled: > > if handle._cancelled: > > handle._scheduled = False > > else: > > new_scheduled.append(handle) > > self._scheduled = heapify(new_scheduled) > > > > Or should we use deque for new_scheduled? Can you test if there are any > > performance benefits in rewriting this piece? > > Heh, I suggested this in an earlier review. I expect it will be only a modest > win, since heapify() is O(N) and its constant is much larger. But yeah, good > idea to try. Sorry, I don't recall seeing this in an earlier review. It was something I briefly thought about but my internal heuristic thought that calling append would be slower than the list comprehension since in my experience python function calls have a lot of overhead. However, in the interests of completeness, below is a program I ran to test this (it is included inline at the end of this message). Based on that program, over the course of 5 runs: Option A took between 12.25 and 12.4 seconds. Option B took between 13.2 and 14.4. Option C took around 17s (NOTE: Option C is not correct, after the first iteration it will be running on a list half the size (and since it's slower I didn't go to the effort to create an apples to apples comparison). But I thought I'd try an in-place method as well.) It appears that choosing the option that uses the least python/most C code is the winning option. And that appears to be what is already in the patch. Note that the above timing is the elapsed time the program prints out which excludes creation of the list. ==================== import asyncio import time loop = asyncio.get_event_loop() def cb(self): pass for x in range(1000000): th = loop.call_later(3600, cb) if x % 2 == 1: th.cancel() start = time.time() orig_data = loop._scheduled for x in range(100): loop._scheduled = orig_data """Option A for handle in loop._scheduled: if handle._cancelled: handle._scheduled = False loop._scheduled = [x for x in loop._scheduled if not x._cancelled] """ """Option B new_scheduled = [] for handle in loop._scheduled: if handle._cancelled: handle._scheduled = False else: new_scheduled.append(handle) loop._scheduled = new_scheduled """ """Option C This is not correct as it is repeatedly operating on a list half the size offs = 0 for i in range(len(loop._scheduled)): handle = loop._scheduled[i] if handle._cancelled: handle._scheduled = False offs += 1 else: loop._scheduled[i-offs] = handle del loop._scheduled[len(loop._scheduled) - offs:] """ end = time.time() print(end - start)
Sign in to reply to this message.
Joshua, I've just committed the patch to CPython and tulip: - https://hg.python.org/cpython/rev/2a868c9f8f15 - https://hg.python.org/cpython/rev/a6aaacb2b807 - https://code.google.com/p/tulip/source/detail?r=df3fa0556765e51ed06551af6fc2e... Thank you for the patch and for your patience during the code review!
Sign in to reply to this message.
Sounds good to me. It's possible that the difference between A and B is due to the lookup of the append method -- list comprehensions use append too, but they don't have to look it up each time. On Thu, Sep 25, 2014 at 8:07 AM, <chatgris@gmail.com> wrote: > On 2014/09/25 03:16:59, GvR wrote: > >> On 2014/09/25 02:55:48, yselivanov wrote: >> > I was in the middle of committing the patch when I saw one more >> > thing I'd like > >> > us to discuss. Please see the comment below. >> > >> > >> > https://codereview.appspot.com/145220043/diff/200001/ > asyncio/base_events.py > >> > File asyncio/base_events.py (right): >> > >> > >> > > https://codereview.appspot.com/145220043/diff/200001/ > asyncio/base_events.py#newcode1006 > >> > asyncio/base_events.py:1006: self._scheduled = [x for x in >> > self._scheduled if > >> > not x._cancelled] >> > Just noticed that we have a double iteration over what might be a >> > huge list > >> > potentially, and _run_once() is a central part of asyncio. >> > >> > Let's think about combining this list comprehension and the above >> > loop in one > >> > thing, like this: >> > >> > new_scheduled = [] >> > for handle in self._scheduled: >> > if handle._cancelled: >> > handle._scheduled = False >> > else: >> > new_scheduled.append(handle) >> > self._scheduled = heapify(new_scheduled) >> > >> > Or should we use deque for new_scheduled? Can you test if there are >> > any > >> > performance benefits in rewriting this piece? >> > > Heh, I suggested this in an earlier review. I expect it will be only a >> > modest > >> win, since heapify() is O(N) and its constant is much larger. But >> > yeah, good > >> idea to try. >> > > Sorry, I don't recall seeing this in an earlier review. It was something > I briefly thought about but my internal heuristic thought that calling > append would be slower than the list comprehension since in my > experience python function calls have a lot of overhead. > > However, in the interests of completeness, below is a program I ran to > test this (it is included inline at the end of this message). > > Based on that program, over the course of 5 runs: > > Option A took between 12.25 and 12.4 seconds. > Option B took between 13.2 and 14.4. > Option C took around 17s (NOTE: Option C is not correct, after the first > iteration it will be running on a list half the size (and since it's > slower I didn't go to the effort to create an apples to apples > comparison). But I thought I'd try an in-place method as well.) > > It appears that choosing the option that uses the least python/most C > code is the winning option. And that appears to be what is already in > the patch. > > Note that the above timing is the elapsed time the program prints out > which excludes creation of the list. > > ==================== > > import asyncio > import time > > loop = asyncio.get_event_loop() > > def cb(self): > pass > > for x in range(1000000): > th = loop.call_later(3600, cb) > if x % 2 == 1: > th.cancel() > > start = time.time() > > orig_data = loop._scheduled > > for x in range(100): > loop._scheduled = orig_data > > """Option A > for handle in loop._scheduled: > if handle._cancelled: > handle._scheduled = False > > loop._scheduled = [x for x in loop._scheduled if not x._cancelled] > """ > > """Option B > new_scheduled = [] > for handle in loop._scheduled: > if handle._cancelled: > handle._scheduled = False > else: > new_scheduled.append(handle) > > loop._scheduled = new_scheduled > """ > > """Option C > This is not correct as it is repeatedly operating on a list half the > size > offs = 0 > for i in range(len(loop._scheduled)): > handle = loop._scheduled[i] > if handle._cancelled: > handle._scheduled = False > offs += 1 > else: > loop._scheduled[i-offs] = handle > > del loop._scheduled[len(loop._scheduled) - offs:] > """ > > end = time.time() > > print(end - start) > > > > https://codereview.appspot.com/145220043/ > -- --Guido van Rossum (python.org/~guido)
Sign in to reply to this message.
Thanks Yuri and Joshua!! This was a very useful exercise. On Thu, Sep 25, 2014 at 9:11 AM, <yselivanov@gmail.com> wrote: > Joshua, I've just committed the patch to CPython and tulip: > > - https://hg.python.org/cpython/rev/2a868c9f8f15 > - https://hg.python.org/cpython/rev/a6aaacb2b807 > - > https://code.google.com/p/tulip/source/detail?r= > df3fa0556765e51ed06551af6fc2e1f1e0ec18a0 > > Thank you for the patch and for your patience during the code review! > > https://codereview.appspot.com/145220043/ > -- --Guido van Rossum (python.org/~guido)
Sign in to reply to this message.
|