Descriptionstate: relation scope event coalescence
So, this is an interesting branch, and it surely won't land as it is now.
It *started* with lp:1192433; I think it's clear that dying units should
advertise their departure from relations before they actually, uh, depart.
The obvious answer seemed to be to allow them to write it in the relevant
scope doc, because that's watched anyway; but that imposes a bit of a
change on the watcher, because now it has to hit the DB to verify the
nature of an event.
Doing this 100 times for 100 changes in the same few ms seemed a bit crazy,
but I might have let that slide had I not started examining the behaviour of
the watcher under real usage (as opposed to how we unit test them). And, er,
it's a bit alarming.
Watchers are all unit tested like this:
* cause changes
* force sync
* read output channel
...and in that context they work perfectly: all events are coalesced nicely,
and everyone's happy. But in *real* usage the story's a bit different. Those
same three things are happening, but they're all happening concurrently and
independently. And so, in practice, anything using a watcher live is almost
continuously reading from its out chan; but the syncs happen every 5 seconds
and are followed by a fast -- but stuttering, not continuous -- stream of
events on the in chan.
The upshot is that we barely coalesce at all: 100 scope changes delivered in
response to a single sync will be received by the ultimate client as at
least 90 separate sends on a continuously-read Changes channel.
Soon, this won't matter much, because the watchers are exposed over the API
via the Next call, and we won't need to read from Changes until we get a
Next request; so the coalescence *will* be induced. But this then constrains
use to use the Next approach, and the Next approach itself will not
significantly help coalescence in situations where there's an additional
event-interpretation layer client-side (in particular uniter.filter and
the HookQueue bits *do* read as fast as possible and massage the watcher
data some more before passing it on).
So... there's some justification for fixing coalescence behaviour anyway;
and doing so in this case didn't seem optional anyway, because we weren't
doing it properly in the first place (units can be seen to enter and leave
in the same event, when such changes should be entirely stripped). And,
better yet, doing so allowed me to batch up the new db calls (FWIW, I think
LifecycleWatcher would really benefit from this too).
But then I had to figure out how to test it. And I think I managed that, but
there are a number of problems with the test; primarily that it takes a good
20s to run in the best case, and there appears to be an occasional problem
with the scope-manipulation workers, where they just don't get run for a
while (and since each starts 200 scope-leaving goroutines sharing the same
state conn, that's maybe not so surprising). When this happens, the test
takes about 60s on my machine (but still passes, yay).
And ofc we can't go around burning 20-60s on a single unit test in state;
and I don't really want to futz with the watcher timings, because I'm
trying to verify real behaviour.
How would people feel about something like a testing.SlowSuite function,
that only called gocheck.Suite if some flag like -juju.testing.slow were
passed on the command line? go-bot (or perhaps the putative live-tests CI
stuff) could certainly do that, and humans wouldn't need to be bothered by
it at all if they didn't want to be. Sensible?
https://code.launchpad.net/~fwereade/juju-core/interesting-watcher-derail/+merge/171069
(do not edit description out of merge proposal)
Patch Set 1 #
Total comments: 12
MessagesTotal messages: 1
|