|
|
Descriptionannotation api for gui and ls
Allow for annotating domain objects with 3rd party metadata for integration
purposes.
https://code.launchpad.net/~hazmat/juju/rapi-annotation/+merge/140358
(do not edit description out of merge proposal)
Patch Set 1 #
Total comments: 21
Patch Set 2 : annotation api for gui and ls #Patch Set 3 : annotation api for gui and ls #
MessagesTotal messages: 10
Please take a look.
Sign in to reply to this message.
Hi Kapil. This will be great to have, thank you. All my comments are questions/thoughts. Discard if they don't make sense to you. I'd like to spend more time reading annotations.py and the test_annotations.py but I ran out of time. If you haven't landed this by the time I get more free time, I'll read that some more and see if I have any more comments. Gary https://codereview.appspot.com/6936063/diff/1/juju/rapi/context.py File juju/rapi/context.py (right): https://codereview.appspot.com/6936063/diff/1/juju/rapi/context.py#newcode95 juju/rapi/context.py:95: """Annotate an entity by name. Would be nice to specify what data is (dict AIUI?) Should keys always be strings? If so, should that be documented? If not, specifying expected key types might still be nice. I'm guessing that the dict is merged with the existing data, so this is also the change annotation API. I wish that were a bit less surprising in the name. I notice that you use "set" as the verb on the AnnotationManager. I like that a lot better, FWIW (set_annotation, or, per discussion below, set_annotations). "set" conveys the semantics shown here, where "add" does not. Generally, for all of these, the fact that the API is singular confuses me. An API for add_annotation I would expect to be add_annotation(entity, name, value) or change_annotation(entity, name, map). I would call this API add_annotations(entity, map) or even change_annotations(entity, map). This "let's use plurals" comment also applies to get_annotation => get_annotations and remove_annotation => remove_annotations. https://codereview.appspot.com/6936063/diff/1/juju/rapi/context.py#newcode121 juju/rapi/context.py:121: """Retrieve annotations for an entity. by entity, I'm guessing this is name, as with add_annotation. Might as well clarify it in both doc strings (or neither, if it really should be understood?) https://codereview.appspot.com/6936063/diff/1/juju/rapi/delta.py File juju/rapi/delta.py (right): https://codereview.appspot.com/6936063/diff/1/juju/rapi/delta.py#newcode104 juju/rapi/delta.py:104: for u in status['services'][s].get('units', ()): Isn't this inner loop on units potentially problematic at large scale? I guess we can't jump to the unit somehow without iterating? I'd wonder if we could change the inner (unexposed) APIs to make this kind of iteration unnecessary. https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py File juju/state/annotation.py (right): https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcode23 juju/state/annotation.py:23: Annotations are cumalative:: typo: cumulative https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcode53 juju/state/annotation.py:53: Annotations are expected to be high volume of change, as such the run-on: ...volume of change. As such, the... https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcode64 juju/state/annotation.py:64: gc. The gc is soley for space management. typo: solely
Sign in to reply to this message.
Hi Kapil. I got some time to review annotation.py, with a few comments, at least one of which I suspect is a bit important (the isdigit). Thanks Gary https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py File juju/state/annotation.py (right): https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... juju/state/annotation.py:100: def _get_key(self, context, topology): Might be nice to have a docstring that clarified that context was a string or an integer. And that returning None indicates no match, maybe? https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... juju/state/annotation.py:104: if context.isdigit() or isinstance(context, int): I suspect you want to reverse those conditions. Ints don't have an "isdigit" method. Ideally, this would suggest a test as well. https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... juju/state/annotation.py:122: def _get_context(self, key, topology): maybe clarify in docstring that this is the inverse of _get_key? https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... juju/state/annotation.py:140: # json with unicode keys can bypass c ext, how to delete key Is this an XXX? It sounds like "how to delete key" is an answered issue, and I don't understand what the point of the other unicode comment is, so it could either be expanded or removed. https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... juju/state/annotation.py:146: return self._dump(store) Oh! So, this is just a big dict of all annotations, which we load and dump altogether, right? Huh. I guess this is your point in the module docstring: this is OK for our use case of annotating services, but otherwise is potentially problematic in the general case for scaling. https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... juju/state/annotation.py:154: k = context and (yield self._verify_context(context)) So...if context is None or an empty string we just use that as the key, which will return an empty dict. I don't really see the point, but I don't see a problem either. https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... juju/state/annotation.py:208: waste = set(used_keys) - set(valid_keys) I can see a couple of ways to make this a bit more efficient, but there's probably not much point. The kinds of things I have in mind: valid_keys could be a set to begin with. You could just define waste as set(store) - valid_keys and eliminate an intermediate data structure (used_keys). Probably not worth it. If you like it, take it, and otherwise ignore it. https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... juju/state/annotation.py:216: return AnnotationStream(self._client, self) We are always making one...interesting. I guess you are thinking of it like __iter__. Why didn't you just call this method __iter__, in fact? https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... juju/state/annotation.py:245: # s-a added, annotated, s-a removed, s-a added.. in the span "s-a"? service A? https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... juju/state/annotation.py:250: # label/name. we must then delete or update delta stream to Couldn't a consumer know to delete annotations for a context if the context is removed? That doesn't seem unreasonable on the face of it, as long as it is documented.
Sign in to reply to this message.
On 2012/12/18 14:42:58, gary.poster wrote: > Hi Kapil. This will be great to have, thank you. > > All my comments are questions/thoughts. Discard if they don't make sense to > you. > > I'd like to spend more time reading annotations.py and the test_annotations.py > but I ran out of time. If you haven't landed this by the time I get more free > time, I'll read that some more and see if I have any more comments. > > Gary > > https://codereview.appspot.com/6936063/diff/1/juju/rapi/context.py > File juju/rapi/context.py (right): > > https://codereview.appspot.com/6936063/diff/1/juju/rapi/context.py#newcode95 > juju/rapi/context.py:95: """Annotate an entity by name. > Would be nice to specify what data is (dict AIUI?) > > Should keys always be strings? If so, should that be documented? If not, > specifying expected key types might still be nice. at the moments its free form, reasonably keys should always be strings. i'll add some enforcement of that (in annotation manager). we might end up restricting to known keys (ls & gui) but i prefer to key it open for exploration for now. there is enforcement that data is a dict at the annotation manager level. > > I'm guessing that the dict is merged with the existing data, so this is also the > change annotation API. I wish that were a bit less surprising in the name. I > notice that you use "set" as the verb on the AnnotationManager. I like that a > lot better, FWIW (set_annotation, or, per discussion below, set_annotations). > "set" conveys the semantics shown here, where "add" does not. > naming is hard :-) that's interesting, i changed the name at the exposed api/rapi level because i had a different i went with add because its clearly cumulative.. while set might be taken for an equality/total value. from our irc discussion, i'll go with update_annotations. its a little ambigious that you update an annotation to create it, but in practice i think it should be okay. > Generally, for all of these, the fact that the API is singular confuses me. An > API for add_annotation I would expect to be add_annotation(entity, name, value) > or change_annotation(entity, name, map). i eschewed single value forms to avoid round trips. i originally had a notion of entity, namespace, map but i realized its just adding complexity when simplicity would serve. our annotations aren't private to the creator ala zope.annotation, we have cross annotation usage so a shared space serves nicely, and makes the consumption a bit simpler as well. > I would call this API > add_annotations(entity, map) or even change_annotations(entity, map). This > "let's use plurals" comment also applies to get_annotation => get_annotations > and remove_annotation => remove_annotations. > plurals +1 > https://codereview.appspot.com/6936063/diff/1/juju/rapi/context.py#newcode121 > juju/rapi/context.py:121: """Retrieve annotations for an entity. > by entity, I'm guessing this is name, as with add_annotation. Might as well > clarify it in both doc strings (or neither, if it really should be understood?) sounds good re doc string w/ entity def. > > https://codereview.appspot.com/6936063/diff/1/juju/rapi/delta.py > File juju/rapi/delta.py (right): > > https://codereview.appspot.com/6936063/diff/1/juju/rapi/delta.py#newcode104 > juju/rapi/delta.py:104: for u in status['services'][s].get('units', ()): > Isn't this inner loop on units potentially problematic at large scale? not really, its a fairly tight comparision, simple iteration around large sets isn't problematic if the work is miniscule. >>> import time >>> def x(): ... a = {'x': 1} ... a = {'x': 1} ... b = {'z': 2} ... c = {'x': 1} ... t = time.time() ... for i in range(100000): ... d = {} ... if a == c: ... d.update(a) ... if c == b: ... d.update(b) ... print "ran in ", time.time() - t ... >>> x() ran in 0.0492899417877 for any environment this size or larger, the real time costs to deltas aren't in the delta but the status which would need to become computed incrementally/streamed or push/heartbeat based from the units. > I guess > we can't jump to the unit somehow without iterating? I'd wonder if we could > change the inner (unexposed) APIs to make this kind of iteration unnecessary. yeah.. its just not a significant issue for us imo. there are some optimizations around delta transfer to the gui that are worth exploring. upfront send of services with unit counts so we don't need units streamed to get accurate svc size counts, and also on remove service, omitting unit removal so the client can just be intelligent amount removing the remainder. > > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py > File juju/state/annotation.py (right): > > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcode23 > juju/state/annotation.py:23: Annotations are cumalative:: > typo: cumulative > > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcode53 > juju/state/annotation.py:53: Annotations are expected to be high volume of > change, as such the > run-on: ...volume of change. As such, the... > they used to call me marathon man ;-) > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcode64 > juju/state/annotation.py:64: gc. The gc is soley for space management. > typo: solely thanks.
Sign in to reply to this message.
On 2012/12/18 16:43:15, gary.poster wrote: > Hi Kapil. I got some time to review annotation.py, with a few comments, at > least one of which I suspect is a bit important (the isdigit). > > Thanks > > Gary > > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py > File juju/state/annotation.py (right): > > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... > juju/state/annotation.py:100: def _get_key(self, context, topology): > Might be nice to have a docstring that clarified that context was a string or an > integer. And that returning None indicates no match, maybe? sounds reasonable. > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... > juju/state/annotation.py:104: if context.isdigit() or isinstance(context, int): > I suspect you want to reverse those conditions. Ints don't have an "isdigit" > method. > > Ideally, this would suggest a test as well. > nice catch, thanks! > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... > juju/state/annotation.py:122: def _get_context(self, key, topology): > maybe clarify in docstring that this is the inverse of _get_key? > sounds good. > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... > juju/state/annotation.py:140: # json with unicode keys can bypass c ext, how to > delete key > Is this an XXX? It sounds like "how to delete key" is an answered issue, and I > don't understand what the point of the other unicode comment is, so it could > either be expanded or removed. > yeah.. some of this was written science fiction style, and filled in later, that how to delete key comment should get yanked. the unicode comment was referencing a concern i had about being able to ensure the c json encoder for encode/decode speed but looking through json/(encoder/decoder) it seems like its not relevant, i also obsolete. > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... > juju/state/annotation.py:146: return self._dump(store) > Oh! So, this is just a big dict of all annotations, which we load and dump > altogether, right? Huh. I guess this is your point in the module docstring: > this is OK for our use case of annotating services, but otherwise is potentially > problematic in the general case for scaling. > yeah.. the upper bound in zk is roughly 1mb for the node data, compressed that gets us to roughly 20k entities with annotations which is fine. if need be its fairly easy to use a consistent hashing function on the context/keyspace to partition annotations across multiple node... there are bigger fish to fry i think though. fwiw here's a sample script i used for a sanity check (and come up with 20k number) using random data for the annotations, but constant key set. http://paste.ubuntu.com/1448006/ > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... > juju/state/annotation.py:154: k = context and (yield > self._verify_context(context)) > So...if context is None or an empty string we just use that as the key, which > will return an empty dict. I don't really see the point, but I don't see a > problem either. the logic here is overly sophisticated. if context exists gets its key and use it.. if it doesn't exist, then get all annotations, and convert them back to context ids. > > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... > juju/state/annotation.py:208: waste = set(used_keys) - set(valid_keys) > I can see a couple of ways to make this a bit more efficient, but there's > probably not much point. The kinds of things I have in mind: valid_keys could > be a set to begin with. You could just define waste as set(store) - valid_keys > and eliminate an intermediate data structure (used_keys). Probably not worth > it. If you like it, take it, and otherwise ignore it. i tend to avoid micro optimizations till they've been profiled as a problem. early optimization and all that. we're also playing fairly fast and loose with memory here but we're governed by the zk node size atm (1mb compressed). > > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... > juju/state/annotation.py:216: return AnnotationStream(self._client, self) > We are always making one...interesting. I guess you are thinking of it like > __iter__. Why didn't you just call this method __iter__, in fact? > under methods don't play nice with deferreds. there's only 1 stream per api server. the api creates new annotation managers on demand for api invocation, but the stream is only used by the persistent delta stream manager which is a singleton for the process. > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... > juju/state/annotation.py:245: # s-a added, annotated, s-a removed, s-a added.. > in the span > "s-a"? service A? yeah.. service a shorthand. > > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... > juju/state/annotation.py:250: # label/name. we must then delete or update delta > stream to > Couldn't a consumer know to delete annotations for a context if the context is > removed? That doesn't seem unreasonable on the face of it, as long as it is > documented. the consumer doesn't potentially even detect the deletion/context removal. -> add service A (internal id service-01) -> annotate service A with 'foo' <- delta stream pump (A w/ foo) -> remove service A -> add service A (internal id service-02) <- delta stream pump (remove unit a/0, add unit a/1, ditto for relations.) in this case the delta stream consumer can't detect the service removal. there's some ambiguity here, but the problem is in the delta stream pump since its relying on status output, and status uses service names which are the end user labels for the service instead of the internal ids. the annotation manager does the right thing, the annotation stream has no content because the original context is dead, and a retrieval against service-a would return no annotations. There's some work to be done here on the delta stream manager to take this case into account. I looked at it but its a bigger task, which i'd rather tackle as i work towards some other optimizations around the delta stream (svcs with indexes, send only svc delete w/o explicit kill for all its units). though in part that depends on where things go. thanks again for review, much appreciated.
Sign in to reply to this message.
Quick review, this looks pretty good and like you said, given the trajectory of the code base I think this is basically enough for now. I'd like your thoughts on moving how /annotations are stored (below), and if its worth observing annotation changes apart from the normal delta, but I'm basically happy to move forward with this now. https://codereview.appspot.com/6936063/diff/1/juju/rapi/context.py File juju/rapi/context.py (right): https://codereview.appspot.com/6936063/diff/1/juju/rapi/context.py#newcode95 juju/rapi/context.py:95: """Annotate an entity by name. On 2012/12/18 14:42:58, gary.poster wrote: > Would be nice to specify what data is (dict AIUI?) > > Should keys always be strings? If so, should that be documented? If not, > specifying expected key types might still be nice. > > I'm guessing that the dict is merged with the existing data, so this is also the > change annotation API. I wish that were a bit less surprising in the name. I > notice that you use "set" as the verb on the AnnotationManager. I like that a > lot better, FWIW (set_annotation, or, per discussion below, set_annotations). > "set" conveys the semantics shown here, where "add" does not. > > Generally, for all of these, the fact that the API is singular confuses me. An > API for add_annotation I would expect to be add_annotation(entity, name, value) > or change_annotation(entity, name, map). I would call this API > add_annotations(entity, map) or even change_annotations(entity, map). This > "let's use plurals" comment also applies to get_annotation => get_annotations > and remove_annotation => remove_annotations. I agree with this comment. Context here is a key entry point into the API and should be outlined in more detail. https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py File juju/state/annotation.py (right): https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... juju/state/annotation.py:100: def _get_key(self, context, topology): Will we ever want to annotate relations directly? Juju-Gui's positioning cases don't depend on this now, but its easy to imaging recording a path for a manually sculpted relation line. Just something to keep in mind. https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... juju/state/annotation.py:137: assert isinstance(data, dict), "Invalid annotation" An assertion error seems a little harsh here. Client side things haven't been aggressively typechecked so its possible to pass something that json can pull out as a dict. You already have InvalidAnnotationContext, maybe this is another specific error that can be caught. https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... juju/state/annotation.py:146: return self._dump(store) On 2012/12/18 16:43:15, gary.poster wrote: > Oh! So, this is just a big dict of all annotations, which we load and dump > altogether, right? Huh. I guess this is your point in the module docstring: > this is OK for our use case of annotating services, but otherwise is potentially > problematic in the general case for scaling. The way the .stream stuff is hooked up I'm not sure we can/should couple things like position annotations to the delta stream. The update interval is too low for a real time type of feel. We can interpolate the movement client side but I could also see this being another channel coming over the websocket. What are your thoughts on adding an /annotations child to any object with annotations, these are not directly watched but if something uses the annotation manager they can still easily attach them a proper way. That might also simplify the gc case (or make it more complex as ZK gc isn't well done in pyjuju anyway). Just food for thought. https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... juju/state/annotation.py:220: If AnnotationManager retained references to streams it created it could mark them dirty when set/remove was used which might allow a fast path through next.
Sign in to reply to this message.
On 2012/12/19 14:44:36, benjamin.saller wrote: > Quick review, this looks pretty good and like you said, given the trajectory of > the code base I think this is basically enough for now. I'd like your thoughts > on moving how /annotations are stored (below), and if its worth observing > annotation changes apart from the normal delta, but I'm basically happy to move > forward with this now. > > https://codereview.appspot.com/6936063/diff/1/juju/rapi/context.py > File juju/rapi/context.py (right): > > https://codereview.appspot.com/6936063/diff/1/juju/rapi/context.py#newcode95 > juju/rapi/context.py:95: """Annotate an entity by name. > On 2012/12/18 14:42:58, gary.poster wrote: > > Would be nice to specify what data is (dict AIUI?) > > > > Should keys always be strings? If so, should that be documented? If not, > > specifying expected key types might still be nice. > > > > I'm guessing that the dict is merged with the existing data, so this is also > the > > change annotation API. I wish that were a bit less surprising in the name. I > > notice that you use "set" as the verb on the AnnotationManager. I like that a > > lot better, FWIW (set_annotation, or, per discussion below, set_annotations). > > "set" conveys the semantics shown here, where "add" does not. > > > > Generally, for all of these, the fact that the API is singular confuses me. > An > > API for add_annotation I would expect to be add_annotation(entity, name, > value) > > or change_annotation(entity, name, map). I would call this API > > add_annotations(entity, map) or even change_annotations(entity, map). This > > "let's use plurals" comment also applies to get_annotation => get_annotations > > and remove_annotation => remove_annotations. > > I agree with this comment. Context here is a key entry point into the API and > should be outlined in more detail. > > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py > File juju/state/annotation.py (right): > > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... > juju/state/annotation.py:100: def _get_key(self, context, topology): > Will we ever want to annotate relations directly? Juju-Gui's positioning cases > don't depend on this now, but its easy to imaging recording a path for a > manually sculpted relation line. Just something to keep in mind. easy enough to add when we need it, till then yagni. > > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... > juju/state/annotation.py:137: assert isinstance(data, dict), "Invalid > annotation" > An assertion error seems a little harsh here. Client side things haven't been > aggressively typechecked so its possible to pass something that json can pull > out as a dict. > > You already have InvalidAnnotationContext, maybe this is another specific error > that can be caught. switched to stateerror w/ msg > > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... > juju/state/annotation.py:146: return self._dump(store) > On 2012/12/18 16:43:15, gary.poster wrote: > > Oh! So, this is just a big dict of all annotations, which we load and dump > > altogether, right? Huh. I guess this is your point in the module docstring: > > this is OK for our use case of annotating services, but otherwise is > potentially > > problematic in the general case for scaling. > > The way the .stream stuff is hooked up I'm not sure we can/should couple things > like position annotations to the delta stream. The update interval is too low > for a real time type of feel. We can interpolate the movement client side but I > could also see this being another channel coming over the websocket. > if it were a game where position info was critical i would care more about the latency, but position is secondary information to the content on display not primary. the notion of real-time here can be evaluated again based on feedback, but i don't think its a common enough concern at the moment to optimize/implement separately for. also for bulk mods, the buffer serves as a useful guard against jitter storms in the client. easy enough to tweak this behavior as needed, and push the annotation stream to a separate timer. > What are your thoughts on adding an /annotations child to any object with > annotations, these are not directly watched but if something uses the annotation > manager they can still easily attach them a proper way. That might also simplify > the gc case (or make it more complex as ZK gc isn't well done in pyjuju anyway). > Just food for thought. > i did originally implement with a watch, but its problematic due to the rate of churn expected between bulk operations on units/machines from landscape and rearrangement storm on services. per entity annotations would scale poorly to the target of 10k, and would involve a large amount of watches in a single client session. also zk doesn't allow with deletion of nodes with children. > https://codereview.appspot.com/6936063/diff/1/juju/state/annotation.py#newcod... > juju/state/annotation.py:220: > If AnnotationManager retained references to streams it created it could mark > them dirty when set/remove was used which might allow a fast path through next doesn't work if there is more than one server, except as a latency optimization in addition to the existing impl which is just adding complexity given previous latency comments. we might need some tracking of ops by self to preclude notices, but thats an optimization for later. for now a simple impl keeps us moving forward, we can revisit based on need.
Sign in to reply to this message.
Please take a look.
Sign in to reply to this message.
Land as is. Looks good to me. Thank you. Gary
Sign in to reply to this message.
|