Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(12722)

Issue 5847059: Rewrite the scheduler, for simplicity, and better error handling.

Can't Edit
Can't Publish+Mail
Start Review
Created:
12 years, 1 month ago by hazmat
Modified:
12 years, 1 month ago
Reviewers:
fwereade, mp+98104
Visibility:
Public.

Description

Rewrite the scheduler, for simplicity, and better error handling. During a recent code audit, a number of issues with the relation hook scheduler were discovered. Say we get notified of three membership events from the same child watch, two joins and a depart. if the first join hook errors, we need to keep both it and the subsequent membership events around. Previously on error we'd execute the next few hooks, regardless of the error, which is problematic. The error handler will stop the scheduler, but we'll still end up processing the events since the executioner doesn't check if its running till its done processing a clock. On error we need to stop processing immediately, we also need to keep all events for the around, including the failed one, to properly support retry (albeit still subject to reduction via additional events.) On a resolved --retry, we should continue the event processing stream where we left off (ie start execution against the failed hook.) The branch includes expanded unit test coverage of various failure scenarios, and concurrent activity scenarios, as well as two new implementations of the scheduler (one buried in the history which attempts to fix the previous impl). The latest schduler implementation, does away with the notion of clocks and instead uses a simple list (wrapped in a deferred queue) as the primary data structure. The additional member_versions, context_members data structures are used only at ingest/put time, the consumer relies soley on the queue. This simplifies the scheduler implementation, and avoided some issues with the previous implementation that had to bookeep across multiple data structures in the face of concurrent activity. The alias expansion of join to modified was also removed from the lifecycle and placed directly in the scheduler. This makes it more reliable in the face of an error of either join/modified hook, and subjects the additional synthetic event to reduction. https://code.launchpad.net/~hazmat/juju/scheduler-peek-list/+merge/98104 (do not edit description out of merge proposal)

Patch Set 1 #

Total comments: 14
Unified diffs Side-by-side diffs Delta from patch set Stats (+627 lines, -265 lines) Patch
M .bzrignore View 1 chunk +4 lines, -0 lines 0 comments Download
A [revision details] View 1 chunk +2 lines, -0 lines 0 comments Download
M juju/control/status.py View 1 chunk +2 lines, -1 line 0 comments Download
M juju/hooks/scheduler.py View 4 chunks +239 lines, -173 lines 8 comments Download
M juju/hooks/tests/test_scheduler.py View 24 chunks +363 lines, -77 lines 1 comment Download
M juju/unit/lifecycle.py View 2 chunks +4 lines, -10 lines 1 comment Download
M juju/unit/tests/test_lifecycle.py View 6 chunks +10 lines, -3 lines 4 comments Download
M juju/unit/tests/test_workflow.py View 2 chunks +2 lines, -1 line 0 comments Download
M juju/unit/workflow.py View 1 chunk +1 line, -0 lines 0 comments Download

Messages

Total messages: 3
hazmat
Please take a look.
12 years, 1 month ago (2012-03-18 17:17:05 UTC) #1
fwereade
LGTM, very nice; but it could use clearer docs/comments in one or two places. https://codereview.appspot.com/5847059/diff/1/juju/hooks/scheduler.py ...
12 years, 1 month ago (2012-03-22 10:13:16 UTC) #2
hazmat
12 years, 1 month ago (2012-03-22 13:45:50 UTC) #3
Thanks for the review!

https://codereview.appspot.com/5847059/diff/1/juju/hooks/scheduler.py
File juju/hooks/scheduler.py (right):

https://codereview.appspot.com/5847059/diff/1/juju/hooks/scheduler.py#newcode37
juju/hooks/scheduler.py:37: LIFO except for a None value which is FIFO
On 2012/03/22 10:13:16, fwereade wrote:
> other way round, surely?

indeed it is.

https://codereview.appspot.com/5847059/diff/1/juju/hooks/scheduler.py#newcode51
juju/hooks/scheduler.py:51: # XXX / Needed.
On 2012/03/22 10:13:16, fwereade wrote:
> I'm really not at all clear what's going on here.

change is None is the stop marker. if there's no one waiting, then the queue is
currently being processed, and the stop will be detected at next iteration. ie.
the stop value isn't needed because we're not sleeping on input.

https://codereview.appspot.com/5847059/diff/1/juju/hooks/scheduler.py#newcode315
juju/hooks/scheduler.py:315: #assert 0, "when does this happen"
On 2012/03/22 10:13:16, fwereade wrote:
> Hmm, I'm not sure; did you ever hit that assert 0 when it was active?

I did on tests that skip time (unit disconnected for a while), its definitely
valid, i'll remove the assert.

https://codereview.appspot.com/5847059/diff/1/juju/unit/tests/test_lifecycle.py
File juju/unit/tests/test_lifecycle.py (right):

https://codereview.appspot.com/5847059/diff/1/juju/unit/tests/test_lifecycle....
juju/unit/tests/test_lifecycle.py:930: self.capture_logging("charm.upgrade")
On 2012/03/22 10:13:16, fwereade wrote:
> unused?

if run in isolation without the test gets one of those no handler attached
messages, the capture is merely to avoid that.
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b