|
|
Descriptioncontext_id for Task
Patch Set 1 #
Total comments: 26
Patch Set 2 : 2nd attempt #
Total comments: 4
Patch Set 3 : third version #MessagesTotal messages: 16
That's the first version of the patch. No docstrings or documentation updates for now (will add them as soon as we agree on API). Please review.
Sign in to reply to this message.
I'm pretty conflicted. I see a lot of mechanism but I can't quite tell whether this will actually address the use case. Maybe one of us should try to add context_ids to a realistic example (e.g. crawl.py) to see if it actually satisfies the use case. https://codereview.appspot.com/87940044/diff/1/asyncio/base_events.py File asyncio/base_events.py (right): https://codereview.appspot.com/87940044/diff/1/asyncio/base_events.py#newcode242 asyncio/base_events.py:242: return self._context_id I think should also have a set_context_id() method. Having it set directly from outside the class feels odd. https://codereview.appspot.com/87940044/diff/1/asyncio/base_events.py#newcode316 asyncio/base_events.py:316: context_id=None) Did you mean context_id=context_id? https://codereview.appspot.com/87940044/diff/1/asyncio/events.py File asyncio/events.py (right): https://codereview.appspot.com/87940044/diff/1/asyncio/events.py#newcode33 asyncio/events.py:33: return res I wonder if the context_id should also be added (if not None). It might be handy. https://codereview.appspot.com/87940044/diff/1/asyncio/events.py#newcode40 asyncio/events.py:40: self._loop._context_id = self._context_id Should be inside the try. https://codereview.appspot.com/87940044/diff/1/asyncio/futures.py File asyncio/futures.py (right): https://codereview.appspot.com/87940044/diff/1/asyncio/futures.py#newcode154 asyncio/futures.py:154: context_id = self._loop.get_context_id() Hm. Not sure if I'm contradicting myself, but here I'd worry about the extra call overhead. https://codereview.appspot.com/87940044/diff/1/asyncio/futures.py#newcode276 asyncio/futures.py:276: self._loop.call_soon(fn, self, context_id=self._context_id) My spider-sense is tingling here. I wonder if it isn't reasonable to have different callbacks on the same Future associated with different context_ids. https://codereview.appspot.com/87940044/diff/1/asyncio/selector_events.py File asyncio/selector_events.py (right): https://codereview.appspot.com/87940044/diff/1/asyncio/selector_events.py#new... asyncio/selector_events.py:141: handle = events.Handle(callback, args, self, self._context_id) Again, my spider-sense is tingling. Couldn't a reader be associated with multiple context_ids, e.g. when using HTTP connection pooling and creating a new context_id for each successive request? https://codereview.appspot.com/87940044/diff/1/tests/test_base_events.py File tests/test_base_events.py (right): https://codereview.appspot.com/87940044/diff/1/tests/test_base_events.py#newc... tests/test_base_events.py:168: None, asyncio.Handle(cb, (), self.loop, self.loop._context_id), Add a line break after None for readability (each arg to run_in_executor on a line by itself). https://codereview.appspot.com/87940044/diff/1/tests/test_tasks.py File tests/test_tasks.py (right): https://codereview.appspot.com/87940044/diff/1/tests/test_tasks.py#newcode1456 tests/test_tasks.py:1456: self.assertEqual(task_context_id, 'spam') Does this work? I can't prove that bar() will even be called. In general I'd like to see a test for context_ids that shows their usefulness (use case), not just tests an edge case of their implementation.
Sign in to reply to this message.
I replied to almost all of the comments and deleted them. I think I need more time for myself to think about this in depth. Will post something soon.
Sign in to reply to this message.
https://codereview.appspot.com/87940044/diff/1/asyncio/base_events.py File asyncio/base_events.py (right): https://codereview.appspot.com/87940044/diff/1/asyncio/base_events.py#newcode320 asyncio/base_events.py:320: def run_in_executor(self, executor, callback, *args): No context_id for executors? context_id is somehow local to a thread? https://codereview.appspot.com/87940044/diff/1/asyncio/events.py File asyncio/events.py (right): https://codereview.appspot.com/87940044/diff/1/asyncio/events.py#newcode33 asyncio/events.py:33: return res On 2014/04/15 21:31:04, GvR wrote: > I wonder if the context_id should also be added (if not None). It might be > handy. Yes, add the context id. If you want a shorter representation, add a __str__() method. https://codereview.appspot.com/87940044/diff/1/asyncio/events.py#newcode40 asyncio/events.py:40: self._loop._context_id = self._context_id On 2014/04/15 21:31:04, GvR wrote: > Should be inside the try. I disagree. Why would it fail? https://codereview.appspot.com/87940044/diff/1/asyncio/futures.py File asyncio/futures.py (right): https://codereview.appspot.com/87940044/diff/1/asyncio/futures.py#newcode154 asyncio/futures.py:154: context_id = self._loop.get_context_id() On 2014/04/15 21:31:04, GvR wrote: > Hm. Not sure if I'm contradicting myself, but here I'd worry about the extra > call overhead. Since Future is part of asyncio, for me it's fine to access private attributes of BaseEventLoop outside BaseEventLoop definition. https://codereview.appspot.com/87940044/diff/1/asyncio/futures.py#newcode276 asyncio/futures.py:276: self._loop.call_soon(fn, self, context_id=self._context_id) On 2014/04/15 21:31:04, GvR wrote: > My spider-sense is tingling here. I wonder if it isn't reasonable to have > different callbacks on the same Future associated with different context_ids. I see no reason to restrict the context id to the future context id. You can use something like that: _NOT_SET = object() ... def add_done_callback(self, fn, *, context_id=_NOT_SET): if context_id is _NOT_SET: context_id = self._context_id ... (_NOT_SET is needed to be able to use the None value.) https://codereview.appspot.com/87940044/diff/1/asyncio/selector_events.py File asyncio/selector_events.py (right): https://codereview.appspot.com/87940044/diff/1/asyncio/selector_events.py#new... asyncio/selector_events.py:141: handle = events.Handle(callback, args, self, self._context_id) On 2014/04/15 21:31:04, GvR wrote: > Again, my spider-sense is tingling. > > Couldn't a reader be associated with multiple context_ids, e.g. when using HTTP > connection pooling and creating a new context_id for each successive request? Agreed, it's important to be allowed to specify a different context id here. https://codereview.appspot.com/87940044/diff/1/asyncio/selector_events.py#new... asyncio/selector_events.py:180: handle = events.Handle(callback, args, self, self._context_id) Ditto. https://codereview.appspot.com/87940044/diff/1/tests/test_base_events.py File tests/test_base_events.py (right): https://codereview.appspot.com/87940044/diff/1/tests/test_base_events.py#newc... tests/test_base_events.py:56: h = asyncio.Handle(lambda: False, (), self.loop, self.loop._context_id) You may just pass None for the context id, since the test doesn't check the context id and the loop context id is None. https://codereview.appspot.com/87940044/diff/1/tests/test_tasks.py File tests/test_tasks.py (right): https://codereview.appspot.com/87940044/diff/1/tests/test_tasks.py#newcode1432 tests/test_tasks.py:1432: def test_task_context_id(self): You need similar tests for all call*() functions. https://codereview.appspot.com/87940044/diff/1/tests/test_tasks.py#newcode1443 tests/test_tasks.py:1443: def foo(): Usually, I don't like calling self.assert...() in a coroutine, but here I would prefer the context id almost everywhere and specify a different context id for each callback and each task. You need probably two tests (two test functions): one test checking that it's possible to specify to specify a different context id everywhere (Future, Task, coroutine, calls, etc.), one test checking that the "current context id" is correctly passed to child tasks/calls/etc. https://codereview.appspot.com/87940044/diff/1/tests/test_tasks.py#newcode1446 tests/test_tasks.py:1446: yield from fut I expect a test here to make sure that the context id is restored after a yield-from to its previous value. https://codereview.appspot.com/87940044/diff/1/tests/test_tasks.py#newcode1451 tests/test_tasks.py:1451: task_context_id = self.loop.get_context_id() You need probably two tests (two test functions): one test checking that it's possible to specify to specify a different context id everywhere (Future, Task, coroutine, calls, etc.), one test checking that the "current context id" is correctly passed to child tasks/calls/etc.
Sign in to reply to this message.
https://codereview.appspot.com/87940044/diff/1/asyncio/events.py File asyncio/events.py (right): https://codereview.appspot.com/87940044/diff/1/asyncio/events.py#newcode40 asyncio/events.py:40: self._loop._context_id = self._context_id On 2014/04/16 04:53:27, haypo_gmail wrote: > On 2014/04/15 21:31:04, GvR wrote: > > Should be inside the try. > > I disagree. Why would it fail? That's not my point. The general pattern is <save state> try: <mutate state> finally: <restore state> It's generally important that all of <mutate state> is inside the try block, both from a correctness POV and to remind the reader what is going on. I've seen too many people put something that *can* fail just before the try because they're unclear on the pattern. https://codereview.appspot.com/87940044/diff/1/asyncio/futures.py File asyncio/futures.py (right): https://codereview.appspot.com/87940044/diff/1/asyncio/futures.py#newcode276 asyncio/futures.py:276: self._loop.call_soon(fn, self, context_id=self._context_id) FWIW, marker objects smell too. :-)
Sign in to reply to this message.
So here's the second patch. This new patch is radically different from what I proposed before (and I think that Guido had something like this in mind, when we discussed this issue). The idea is that any high-level application will, hopefully, be layered out as a combination of coroutines/tasks. Low-level callback APIs are for protocols implementation. So instead of patching the event loop, we only patch the Task class. The main motivation for me to work on this, is to allow developers to better instrument their code. And this patch makes it possible to have threadlocal-like objects for higher-level code. And that's usually the only place where you need such objects. Both for performance monitoring and making some request-related information always available. Of course, having context_id preserved for all callbacks in the loop is better, but it's a much more complex API to test & implement. So this new patch focuses on one problem: maintaining context_id across the Task objects. Speaking of realistic examples. I carefully reviewed how we analyze code performance in our current projects. Turns out, we only profile high-level operations, like how long was some database request, we don't care about individual sockets operations (for that we'd need context_id maintained for callbacks). And for that, making sure that context_id can be set per HTTP request and is properly maintained for all subsequent Tasks is fine.
Sign in to reply to this message.
I like this much better. And perhaps it can be even smaller! https://codereview.appspot.com/87940044/diff/20001/asyncio/tasks.py File asyncio/tasks.py (right): https://codereview.appspot.com/87940044/diff/20001/asyncio/tasks.py#newcode176 asyncio/tasks.py:176: def set_context_id(self, context_id): I know this was my idea, but now I'm wondering -- perhaps it would be cleaner if the context ID could only be set when creating a task? https://codereview.appspot.com/87940044/diff/20001/asyncio/tasks.py#newcode569 asyncio/tasks.py:569: def async(coro_or_future, *, loop=None, context_id=None): Again, I know this was my suggestion, but I now think I was wrong. It would be asking for trouble if calling async(X, context_id=Y) set the context ID in some cases but not in others (and the latter is the case if X is already a Task). "Solving" this by calling set_context_id() wouldn't be any better either. So maybe Task() should be the only way to set the context ID.
Sign in to reply to this message.
https://codereview.appspot.com/87940044/diff/1/asyncio/events.py File asyncio/events.py (right): https://codereview.appspot.com/87940044/diff/1/asyncio/events.py#newcode40 asyncio/events.py:40: self._loop._context_id = self._context_id Ah ok, you only want to move the line "self._loop._context_id = self._context_id" inside the try block, not the two lines. I agree that it would be better. https://codereview.appspot.com/87940044/diff/1/asyncio/futures.py File asyncio/futures.py (right): https://codereview.appspot.com/87940044/diff/1/asyncio/futures.py#newcode276 asyncio/futures.py:276: self._loop.call_soon(fn, self, context_id=self._context_id) On 2014/04/16 14:47:12, GvR wrote: > FWIW, marker objects smell too. :-) Do you suggest that None means "use the default"? I have no use case which requires to "clear" a context id. I don't know if it's useful.
Sign in to reply to this message.
(Pls ignore, Yuri has moved on. :-) https://codereview.appspot.com/87940044/diff/1/asyncio/futures.py File asyncio/futures.py (right): https://codereview.appspot.com/87940044/diff/1/asyncio/futures.py#newcode276 asyncio/futures.py:276: self._loop.call_soon(fn, self, context_id=self._context_id) On 2014/04/16 17:46:11, haypo_gmail wrote: > On 2014/04/16 14:47:12, GvR wrote: > > FWIW, marker objects smell too. :-) > > Do you suggest that None means "use the default"? Yes, that's what Yuri uses it for everywhere in the patch (and also in his new patch). > I have no use case which requires to "clear" a context id. I don't know if it's > useful. Well, it might be useful to explicitly state that a particular task should not be considered part of the current request (maybe if it's a cache operation for example). But there are other ways to do that (e.g. generating an explicit new ID, or using a predefined app-specific constant that means "do not log" or whatever.
Sign in to reply to this message.
On 2014/04/16 17:00:55, yselivanov wrote: > This new patch is radically different from what I proposed before (and I think > that Guido had something like this in mind, when we discussed this issue). Oh, wow. Now I'm lost. IMO it would help to have an example using the API to see if the proposed API fits the requirements. It looks useful to me to pass the context identifier to callbacks.
Sign in to reply to this message.
https://codereview.appspot.com/87940044/diff/20001/asyncio/tasks.py File asyncio/tasks.py (right): https://codereview.appspot.com/87940044/diff/20001/asyncio/tasks.py#newcode176 asyncio/tasks.py:176: def set_context_id(self, context_id): On 2014/04/16 17:46:07, GvR wrote: > I know this was my idea, but now I'm wondering -- perhaps it would be cleaner if > the context ID could only be set when creating a task? I think we don't need this, yes. Changing context_id of a task while it is being executed can create hard to debug bugs. https://codereview.appspot.com/87940044/diff/20001/asyncio/tasks.py#newcode569 asyncio/tasks.py:569: def async(coro_or_future, *, loop=None, context_id=None): On 2014/04/16 17:46:07, GvR wrote: > Again, I know this was my suggestion, but I now think I was wrong. It would be > asking for trouble if calling async(X, context_id=Y) set the context ID in some > cases but not in others Agree.
Sign in to reply to this message.
New patch, even shorted than before :) Victor, Guido, here's a very primitive example of how this API can be used: https://gist.github.com/1st1/10923339 -- some hypothetical web app that runs on some hypothetical web framework. Two things I tried to show: tracing performance & threadlocal-like-dict. Victor, the above example shows, that if you use coroutines & tasks, you don't really need to have context_id attached to every callback, as long as you use your context_id in a higher-level code. Like when you want to profile database queries, you can do that on the level of the database connection object, where it's perfectly fine to use coroutines. With this approach, however, you won't be able to get the current context_id in, say, socket reader/writer callbacks. But that's something that is rarely needed (or am I wrong?)
Sign in to reply to this message.
On 2014/04/16 19:31:43, yselivanov wrote: > https://gist.github.com/1st1/10923339 -- some hypothetical web app that runs on I've just fixed a couple of inconsistencies in my example.
Sign in to reply to this message.
Guys, any feedback?
Sign in to reply to this message.
On 2014/04/17 19:58:19, yselivanov wrote: > Guys, any feedback? This is a very nice small patch, but it is still a big feature, and I think we should ask for feedback on the tulip list. IIRC there was a previous proposal where Glyph objected strenuously, but I suspect the Twisted feature he dislikes is rather more complex than your nice minimalist design! The gist example is missing many details, like the implementation of the Local class -- there isn't a single line in the entire gist that actually calls get_context_id(). I think it would have to be something like this? context_id = asyncio.Task.current_task().get_context_id() I wonder if it makes sense to add a shortcut API so you can write this? context_id = asyncio.Task.current_context_id() Then again it wouldn't be as minimalist. :-) But my request for an example was to also show these idioms, to be sure that we have thought through most consequences. (I'm not asking for the example to be working code -- that would probably distract. But it should answer questions about "how do I do X".)
Sign in to reply to this message.
Sorry guys, I had some urgent family related matters to resolve. And just when I thought it's all in the past, I had a bunch of new ones. Will get back with more examples, better proposals etc in a few days.
Sign in to reply to this message.
|