|
|
Created:
7 years, 10 months ago by toshi Modified:
7 years, 9 months ago CC:
dynamorio-devs_googlegroups.com Visibility:
Public. |
DescriptionCommit log for first patchset:
---------------
i#513 drx_buf Part 1: Adds drx_buf code and tests
Constructs a convenience API for initializing and filling three
differentt kinds of buffers: a fast and a slow circular buffer, and a
"faulting" buffer which notifies the client when the buffer fills.
Currently has only been tested for x86 and x86_64. Compilation on ARM
has been verified via cross-compiler. We disable tests for drx_buf on
ARM and AARCH64.
---------------
Patch Set 1 #
Total comments: 99
Patch Set 2 : Addressed reviewer concerns #
Total comments: 11
Patch Set 3 : Addressed reviewer concerns #
Total comments: 76
Patch Set 4 : Addressed reviewer concerns #Patch Set 5 : Addressed reviewer concerns #
Total comments: 24
Patch Set 6 : Addressed reviewer concerns, adds ARM support #
Total comments: 3
Patch Set 7 : Committed #
MessagesTotal messages: 45
spelling: "differentt"
Sign in to reply to this message.
Should I fix the spelling error now or should I wait until you're done with this iteration of the review? Thanks! On Thu, Jul 7, 2016 at 9:18 PM, <bruening@google.com> wrote: > spelling: "differentt" > > https://codereview.appspot.com/297700043/ >
Sign in to reply to this message.
It can wait, just haven't gotten to the code yet. On Fri, Jul 8, 2016 at 5:16 PM, Toshi Piazza <dsotmman@gmail.com> wrote: > Should I fix the spelling error now or should I wait until you're done > with this iteration of the review? > > Thanks! > > On Thu, Jul 7, 2016 at 9:18 PM, <bruening@google.com> wrote: > >> spelling: "differentt" >> >> https://codereview.appspot.com/297700043/ >> > >
Sign in to reply to this message.
Looks pretty good, especially the thorough tests. I have a lot of minor comments mostly about docs and comments. However, there are two issues in drx.c that seem serious: the cfree, and using the count as the index. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox File ext/drx/drx.dox (right): https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode74 ext/drx/drx.dox:74: \section sec_drx_buf \p DynamoRIO eXtension buffer utilities How about "Buffer Filling API" or "Trace Buffer API" instead of "Buffer Utilities". inconsistent cap: Buffer Utilities "DynamoRIO eXtension" should be implied so prob best to remove and match the other sections. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode86 ext/drx/drx.dox:86: a crash at the risk of some data loss. The main implementation for the I'd replace the first sentence here with one that talks about this being for a use case that only wants to remember the most recent portion of a sequence of events instead of recording an entire trace of events. (It seems weird to talk about it as preventing a crash or to think of it as just some data loss...) https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode89 ext/drx/drx.dox:89: the fault and reset the buffer pointer. I don't think there's any need to expose the implementation in the documentation like this: we may want to change the implementation later. It may be worth mentioning, maybe up top or in a special section, that faults may be used to implement detection of full buffers, so that users are not surprised if they see extra faults in gdb or sthg. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode93 ext/drx/drx.dox:93: The only special case mentioned in \ref sec_drx_buf_circular is a buffer ? I don't see this being mentioned? https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode99 ext/drx/drx.dox:99: \p bbbuf.c, where we only write \p app_pc sized values. Since the buffer s/\p bbuf.c/the sample client \p bbbuf (see \ref API_samples)/ https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode103 ext/drx/drx.dox:103: \section sec_drx_buf_faulting Faulting Buffer Again, let's not expose the implementation: there's no need to force ourselves to always use a fault (e.g., it could be faster to explicitly check for the end of a tiny buffer that fills up so frequently the overhead of a fault is too much). Can we have a different name for this? I would suggest putting this first and calling it a Trace Buffer. This is the most common usage scenario I would think. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode106 ext/drx/drx.dox:106: up. However, its main use case is to allow the user to dump the buffer suggest: Make this more general than "dump": the user is notified and can write the contents to disk or to a pipe or whatnot. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode109 ext/drx/drx.dox:109: such as in \p bbbuf.c where we only write \p app_pc sized values. If the Ditto https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode111 ext/drx/drx.dox:111: writing the element at the highest offset in the struct first, or if that I know what you mean, but the problem being solved may not be clear: perhaps another sentence or two elaborating on the problem talking about writing multiple fields with the buffer filling up in the middle. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode114 ext/drx/drx.dox:114: I think a section on how to write to the buffer, with sample code, is needed: show the sequence of API calls to make to write several data items and then update the buffer pointer, and how using offsets for subsequent fields with a single pointer update is the most efficient method. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode114 ext/drx/drx.dox:114: Also talk about how the user can write his/her own stores into the buffer and that the provided store routines are for convenience only -- but if the user does write them, they need to have xl8 set. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h File ext/drx/drx.h (right): https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode262 ext/drx/drx.h:262: * DRX BUFFER LIBRARY suggest: BUFFER FILLING LIBRARY https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode267 ext/drx/drx.h:267: * called when the buffer has been filled. Prob worth explaining that the buffer data is [buf_base..buf_ceil) (i.e., making it clear both that the end is open-ended and that the size may not be the same every time (e.g., at thread exit) https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode269 ext/drx/drx.h:269: typedef void (*drx_buf_fill_cb_t)(void *drcontext, void *buf_base, void *buf_ceil); suggest: s/fill/full/ https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode269 ext/drx/drx.h:269: typedef void (*drx_buf_fill_cb_t)(void *drcontext, void *buf_base, void *buf_ceil); suggest: s/ceil/end/ (just seems more typical), or, beter, pass the size (which can handle an address at the very top of the address space, where the open-ended endpoint overflows) https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode276 ext/drx/drx.h:276: typedef void (*drx_buf_init_cb_t)(void *drcontext); Seems nice to pass in the buffer, right? https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode280 ext/drx/drx.h:280: * before the final drx_buf_fill_cb_t has been called back, if applicable. Hmm, I would expect it to be called *after* the final fill is called (assuming the final fill is called regardless of whether the buffer is actually full), because if there are other resources tied to the buffer (like a file) they cannot be freed until *after* the final fill operation. What is the reason for the order you describe? https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode319 ext/drx/drx.h:319: drx_buf_init_circular_buffer(int buffer_size, s/int/size_t/ https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode333 ext/drx/drx.h:333: drx_buf_init_faulting_buffer(int buffer_size, s/int/size_t/ https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode340 ext/drx/drx.h:340: * Cleans up the dr_buf extension. This sounds like a global cleanup: could you reword this to just talk about cleaning up the buffer and not the extension? I also wonder if a different name pair would be more appropriate: maybe "create" and "destroy" (or "free") instead of "init" and "exit"? https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode351 ext/drx/drx.h:351: * Loads the address of the TLS buffer at \p where into \p buf_ptr. Better to explicitly say "Inserts instructions to ..." https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode361 ext/drx/drx.h:361: * Updates the buffer pointer to accomodate the writes that occurred since Better to explicitly say "Inserts instructions to ..." Ditto for routines below https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode361 ext/drx/drx.h:361: * Updates the buffer pointer to accomodate the writes that occurred since spelling: accommodate https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode362 ext/drx/drx.h:362: * the last time the base pointer was loaded. This sounds like the library is tracking the number of writes -- make it clear that this simply adds \p stride, right? https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode452 ext/drx/drx.h:452: drx_buf_get_buffer_base(void *drcontext, drx_buf_t *buf); Should this also return the buffer size (or have a separate routine to get the size)? https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c File ext/drx/drx_buf.c (right): https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode36 ext/drx/drx_buf.c:36: #include "dr_events.h" This should not be necessary (comes from dr_api.h) https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode56 ext/drx/drx_buf.c:56: #define ALIGN_FORWARD(x, alignment) \ include ext_utils.h instead https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode67 ext/drx/drx_buf.c:67: } drx_buf_type; style: _t https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode80 ext/drx/drx_buf.c:80: int tls_slot; /* corresponds to the index into the clients vector */ See below: I think this has to be split into two https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode82 ext/drx/drx_buf.c:82: drx_buf_fill_cb_t fill; "full" seems much more descriptive to me https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode91 ext/drx/drx_buf.c:91: static uint tls_offs; See below: I believe this needs to be per-buf https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode178 ext/drx/drx_buf.c:178: return NULL; Hmm, it would be ok to just require drx_init and do all this there? Tradeoff I suppose. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode192 ext/drx/drx_buf.c:192: init_drmgr_exception = true; Ditto. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode198 ext/drx/drx_buf.c:198: && !dr_raw_tls_cfree(tls_offs, count)) This seems problematic: there's already emitted code in the cache that is referencing the prior slots. I think this has to keep the old ones and just ask for one more for the new one. This means the tls_offs needs to be per-buf and not global, and is decoupled from the vector index. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode217 ext/drx/drx_buf.c:217: drvector_set_entry(&clients, count, new_client); This also seems problematic: let's say I create 2 buffers, A and B, in that order. A has vector index 0, B has 1. I then free buffer A. I subsequently create a new buffer C -- but you will clobber B's vector index 1, right? https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode234 ext/drx/drx_buf.c:234: * tls_slot is identical to its index in the vector. I think this will no longer hold https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode366 ext/drx/drx_buf.c:366: * buf_size bytes useable before we hit the ro page. usable https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode399 ext/drx/drx_buf.c:399: buf_ptr, scratch, stride); style violation: {} around multi-line body. Ditto below. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode488 ext/drx/drx_buf.c:488: MINSERT(ilist, where, instr); Hmm, I was going to ask why we needed to provide these store API routines, but I suppose the answer is to ensure they are set up properly for fault handling? https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode616 ext/drx/drx_buf.c:616: dr_sigsegv_event(void *drcontext, byte *target, The name is a little confusing: suggest "process_fault" or "fault_event_helper" or sthg, since it's cross-platform and it's not the main event callback https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode637 ext/drx/drx_buf.c:637: data->cli_base, buf); style: {} https://codereview.appspot.com/297700043/diff/1/suite/tests/client-interface/... File suite/tests/client-interface/drx_buf-test.c (right): https://codereview.appspot.com/297700043/diff/1/suite/tests/client-interface/... suite/tests/client-interface/drx_buf-test.c:125: /* try to cause a segfault and make sure it didn't trigger the buffer to dump */ Good to see a thorough test! https://codereview.appspot.com/297700043/diff/1/suite/tests/client-interface/... File suite/tests/client-interface/drx_buf-test.dll.c (right): https://codereview.appspot.com/297700043/diff/1/suite/tests/client-interface/... suite/tests/client-interface/drx_buf-test.dll.c:239: scratch, opnd_create_immed_int(0x4443, OPSZ_2), 2); Line too long
Sign in to reply to this message.
https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox File ext/drx/drx.dox (right): https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode76 ext/drx/drx.dox:76: The \p drx library also demonstrates a minimalistic buffer API. Its API is Talk about these being buffers containing traces of data gathered during instrumentation, such as memory traces, instruction traces, etc. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode114 ext/drx/drx.dox:114: The docs should also state that per-thread buffers are used.
Sign in to reply to this message.
Done w/ all, though generally there is one outstanding question on my part: (see my comment in drx.h) Should I perhaps remove the event_thread_{init,exit}() code seeing as I can't actually guarantee their init/exit order wrt the client and just implement drx_buf_thread_{init,exit}() methods to be called by the user as appropriate? This would kinda look like how we register a drmgr tls field up front, but explicitly set it in a thread_init event. At the very least this gives the user more flexibility in when and if the full/exit callbacks occur, and will make sure that a client doesn't release its resources before dumping the buffer for the last time. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox File ext/drx/drx.dox (right): https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode74 ext/drx/drx.dox:74: \section sec_drx_buf \p DynamoRIO eXtension buffer utilities On 2016/07/10 04:16:52, bruening wrote: > How about "Buffer Filling API" or "Trace Buffer API" instead of "Buffer > Utilities". > > inconsistent cap: Buffer Utilities > > "DynamoRIO eXtension" should be implied so prob best to remove and match the > other sections. Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode76 ext/drx/drx.dox:76: The \p drx library also demonstrates a minimalistic buffer API. Its API is On 2016/07/10 04:51:54, bruening wrote: > Talk about these being buffers containing traces of data gathered during > instrumentation, such as memory traces, instruction traces, etc. Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode86 ext/drx/drx.dox:86: a crash at the risk of some data loss. The main implementation for the On 2016/07/10 04:16:52, bruening wrote: > I'd replace the first sentence here with one that talks about this being for a > use case that only wants to remember the most recent portion of a sequence of > events instead of recording an entire trace of events. (It seems weird to talk > about it as preventing a crash or to think of it as just some data loss...) Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode89 ext/drx/drx.dox:89: the fault and reset the buffer pointer. On 2016/07/10 04:16:52, bruening wrote: > I don't think there's any need to expose the implementation in the documentation > like this: we may want to change the implementation later. It may be worth > mentioning, maybe up top or in a special section, that faults may be used to > implement detection of full buffers, so that users are not surprised if they see > extra faults in gdb or sthg. Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode93 ext/drx/drx.dox:93: The only special case mentioned in \ref sec_drx_buf_circular is a buffer On 2016/07/10 04:16:52, bruening wrote: > ? I don't see this being mentioned? Right, I might've taken that mention out at some point by accident. Added the mention back in, above. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode99 ext/drx/drx.dox:99: \p bbbuf.c, where we only write \p app_pc sized values. Since the buffer On 2016/07/10 04:16:52, bruening wrote: > s/\p bbuf.c/the sample client \p bbbuf (see \ref API_samples)/ Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode103 ext/drx/drx.dox:103: \section sec_drx_buf_faulting Faulting Buffer On 2016/07/10 04:16:52, bruening wrote: > Again, let's not expose the implementation: there's no need to force ourselves > to always use a fault (e.g., it could be faster to explicitly check for the end > of a tiny buffer that fills up so frequently the overhead of a fault is too > much). Can we have a different name for this? I would suggest putting this > first and calling it a Trace Buffer. This is the most common usage scenario I > would think. Done. Should I change the API to reflect this (s/faulting/trace/g)? https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode106 ext/drx/drx.dox:106: up. However, its main use case is to allow the user to dump the buffer On 2016/07/10 04:16:52, bruening wrote: > suggest: Make this more general than "dump": the user is notified and can write > the contents to disk or to a pipe or whatnot. Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode109 ext/drx/drx.dox:109: such as in \p bbbuf.c where we only write \p app_pc sized values. If the On 2016/07/10 04:16:52, bruening wrote: > Ditto Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode111 ext/drx/drx.dox:111: writing the element at the highest offset in the struct first, or if that On 2016/07/10 04:16:52, bruening wrote: > I know what you mean, but the problem being solved may not be clear: perhaps > another sentence or two elaborating on the problem talking about writing > multiple fields with the buffer filling up in the middle. Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode114 ext/drx/drx.dox:114: On 2016/07/10 04:51:54, bruening wrote: > The docs should also state that per-thread buffers are used. Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode114 ext/drx/drx.dox:114: On 2016/07/10 04:16:52, bruening wrote: > Also talk about how the user can write his/her own stores into the buffer and > that the provided store routines are for convenience only -- but if the user > does write them, they need to have xl8 set. Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode114 ext/drx/drx.dox:114: On 2016/07/10 04:16:52, bruening wrote: > I think a section on how to write to the buffer, with sample code, is needed: > show the sequence of API calls to make to write several data items and then > update the buffer pointer, and how using offsets for subsequent fields with a > single pointer update is the most efficient method. Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h File ext/drx/drx.h (right): https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode262 ext/drx/drx.h:262: * DRX BUFFER LIBRARY On 2016/07/10 04:16:53, bruening wrote: > suggest: BUFFER FILLING LIBRARY Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode267 ext/drx/drx.h:267: * called when the buffer has been filled. On 2016/07/10 04:16:53, bruening wrote: > Prob worth explaining that the buffer data is [buf_base..buf_ceil) (i.e., making > it clear both that the end is open-ended and that the size may not be the same > every time (e.g., at thread exit) Done, noting that I will use the size of the buffer instead as in the below comment. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode269 ext/drx/drx.h:269: typedef void (*drx_buf_fill_cb_t)(void *drcontext, void *buf_base, void *buf_ceil); On 2016/07/10 04:16:53, bruening wrote: > suggest: s/ceil/end/ (just seems more typical), or, beter, pass the size (which > can handle an address at the very top of the address space, where the open-ended > endpoint overflows) Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode269 ext/drx/drx.h:269: typedef void (*drx_buf_fill_cb_t)(void *drcontext, void *buf_base, void *buf_ceil); On 2016/07/10 04:16:53, bruening wrote: > suggest: s/fill/full/ Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode276 ext/drx/drx.h:276: typedef void (*drx_buf_init_cb_t)(void *drcontext); On 2016/07/10 04:16:53, bruening wrote: > Seems nice to pass in the buffer, right? Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode280 ext/drx/drx.h:280: * before the final drx_buf_fill_cb_t has been called back, if applicable. On 2016/07/10 04:16:53, bruening wrote: > Hmm, I would expect it to be called *after* the final fill is called (assuming > the final fill is called regardless of whether the buffer is actually full), > because if there are other resources tied to the buffer (like a file) they > cannot be freed until *after* the final fill operation. What is the reason for > the order you describe? Specifically, drcachesim does some post-processing on the buffer to make it known that the thread exited correctly, before calling the callback. In order to mimic this behavior this exit callback must occur before the final callback. That being said I just noticed that I don't take into account the order of the thread_exit events, so a user could clean up his/her resources first before the drx_buf thread_exit event even gets called. Should I instead make the client call drx_buf_thread_{init,free}() methods from their own thread events and drx_buf wouldn't even register for them? That way we wouldn't even need to have this callback system for init and exit events. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode319 ext/drx/drx.h:319: drx_buf_init_circular_buffer(int buffer_size, On 2016/07/10 04:16:53, bruening wrote: > s/int/size_t/ Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode333 ext/drx/drx.h:333: drx_buf_init_faulting_buffer(int buffer_size, On 2016/07/10 04:16:53, bruening wrote: > s/int/size_t/ Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode340 ext/drx/drx.h:340: * Cleans up the dr_buf extension. On 2016/07/10 04:16:53, bruening wrote: > This sounds like a global cleanup: could you reword this to just talk about > cleaning up the buffer and not the extension? > > I also wonder if a different name pair would be more appropriate: maybe "create" > and "destroy" (or "free") instead of "init" and "exit"? Done. Changed to "create" and "free" since it seems more appropriate now that the extension supports multiple buffers. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode351 ext/drx/drx.h:351: * Loads the address of the TLS buffer at \p where into \p buf_ptr. On 2016/07/10 04:16:53, bruening wrote: > Better to explicitly say "Inserts instructions to ..." Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode361 ext/drx/drx.h:361: * Updates the buffer pointer to accomodate the writes that occurred since On 2016/07/10 04:16:53, bruening wrote: > spelling: accommodate Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode361 ext/drx/drx.h:361: * Updates the buffer pointer to accomodate the writes that occurred since On 2016/07/10 04:16:53, bruening wrote: > Better to explicitly say "Inserts instructions to ..." > > Ditto for routines below Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode362 ext/drx/drx.h:362: * the last time the base pointer was loaded. On 2016/07/10 04:16:53, bruening wrote: > This sounds like the library is tracking the number of writes -- make it clear > that this simply adds \p stride, right? Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode452 ext/drx/drx.h:452: drx_buf_get_buffer_base(void *drcontext, drx_buf_t *buf); On 2016/07/10 04:16:53, bruening wrote: > Should this also return the buffer size (or have a separate routine to get the > size)? Done, added new routine. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c File ext/drx/drx_buf.c (right): https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode36 ext/drx/drx_buf.c:36: #include "dr_events.h" On 2016/07/10 04:16:54, bruening wrote: > This should not be necessary (comes from dr_api.h) Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode56 ext/drx/drx_buf.c:56: #define ALIGN_FORWARD(x, alignment) \ On 2016/07/10 04:16:54, bruening wrote: > include ext_utils.h instead Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode67 ext/drx/drx_buf.c:67: } drx_buf_type; On 2016/07/10 04:16:54, bruening wrote: > style: _t Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode80 ext/drx/drx_buf.c:80: int tls_slot; /* corresponds to the index into the clients vector */ On 2016/07/10 04:16:54, bruening wrote: > See below: I think this has to be split into two How do you mean? From the below comments it seems more like we have to consolidate the current raw tls globals into the drx_buf_t struct. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode82 ext/drx/drx_buf.c:82: drx_buf_fill_cb_t fill; On 2016/07/10 04:16:54, bruening wrote: > "full" seems much more descriptive to me Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode91 ext/drx/drx_buf.c:91: static uint tls_offs; On 2016/07/10 04:16:54, bruening wrote: > See below: I believe this needs to be per-buf Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode178 ext/drx/drx_buf.c:178: return NULL; On 2016/07/10 04:16:54, bruening wrote: > Hmm, it would be ok to just require drx_init and do all this there? Tradeoff I > suppose. I'll hold off on this since in drx.h I commented that it might be better to just remove the thread init/exit events altogether. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode192 ext/drx/drx_buf.c:192: init_drmgr_exception = true; On 2016/07/10 04:16:54, bruening wrote: > Ditto. Acknowledged, waiting on the above response. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode198 ext/drx/drx_buf.c:198: && !dr_raw_tls_cfree(tls_offs, count)) On 2016/07/10 04:16:54, bruening wrote: > This seems problematic: there's already emitted code in the cache that is > referencing the prior slots. I think this has to keep the old ones and just ask > for one more for the new one. > > This means the tls_offs needs to be per-buf and not global, and is decoupled > from the vector index. Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode217 ext/drx/drx_buf.c:217: drvector_set_entry(&clients, count, new_client); On 2016/07/10 04:16:54, bruening wrote: > This also seems problematic: let's say I create 2 buffers, A and B, in that > order. A has vector index 0, B has 1. I then free buffer A. I subsequently > create a new buffer C -- but you will clobber B's vector index 1, right? Whoops! Good catch, Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode234 ext/drx/drx_buf.c:234: * tls_slot is identical to its index in the vector. On 2016/07/10 04:16:54, bruening wrote: > I think this will no longer hold Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode366 ext/drx/drx_buf.c:366: * buf_size bytes useable before we hit the ro page. On 2016/07/10 04:16:54, bruening wrote: > usable Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode399 ext/drx/drx_buf.c:399: buf_ptr, scratch, stride); On 2016/07/10 04:16:54, bruening wrote: > style violation: {} around multi-line body. Ditto below. Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode488 ext/drx/drx_buf.c:488: MINSERT(ilist, where, instr); On 2016/07/10 04:16:54, bruening wrote: > Hmm, I was going to ask why we needed to provide these store API routines, but I > suppose the answer is to ensure they are set up properly for fault handling? Right, I remember the docs desired a way to store an immediate in a cross platform way without having to resort to just always moving an immediate into a register first, even on x86. Also in the design docs we described drx_buf_insert_buf_store_ptr() or sthg which would copy a whole buffer over at once, but I elected not to implement it here since it would make more sense in a client (memval probably, when I get around to testing it with the new drreg fixes that were introduced). The biggest problem with the implementation is that for the subroutine to behave like the other drx_buf_insert_buf_store*() operations, it would have to be implemented with a single instruction (rep movb), which I don't think exists on ARM. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode616 ext/drx/drx_buf.c:616: dr_sigsegv_event(void *drcontext, byte *target, On 2016/07/10 04:16:53, bruening wrote: > The name is a little confusing: suggest "process_fault" or "fault_event_helper" > or sthg, since it's cross-platform and it's not the main event callback Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode637 ext/drx/drx_buf.c:637: data->cli_base, buf); On 2016/07/10 04:16:54, bruening wrote: > style: {} Done. https://codereview.appspot.com/297700043/diff/1/suite/tests/client-interface/... File suite/tests/client-interface/drx_buf-test.c (right): https://codereview.appspot.com/297700043/diff/1/suite/tests/client-interface/... suite/tests/client-interface/drx_buf-test.c:125: /* try to cause a segfault and make sure it didn't trigger the buffer to dump */ On 2016/07/10 04:16:54, bruening wrote: > Good to see a thorough test! Thanks! https://codereview.appspot.com/297700043/diff/1/suite/tests/client-interface/... File suite/tests/client-interface/drx_buf-test.dll.c (right): https://codereview.appspot.com/297700043/diff/1/suite/tests/client-interface/... suite/tests/client-interface/drx_buf-test.dll.c:239: scratch, opnd_create_immed_int(0x4443, OPSZ_2), 2); On 2016/07/10 04:16:54, bruening wrote: > Line too long Done.
Sign in to reply to this message.
Commit log for latest patchset: --------------- i#513 drx_buf Part 1: Adds drx_buf code and tests Constructs a convenience API for initializing and filling three different kinds of buffers: a fast and a slow circular buffer, and a "faulting" buffer which notifies the client when the buffer fills. Currently has only been tested for x86 and x86_64. Compilation on ARM has been verified via cross-compiler. We disable tests for drx_buf on ARM and AARCH64. Review-URL: https://codereview.appspot.com/297700043 ---------------
Sign in to reply to this message.
https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox File ext/drx/drx.dox (right): https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.dox#newcode103 ext/drx/drx.dox:103: \section sec_drx_buf_faulting Faulting Buffer On 2016/07/11 03:10:21, DSOTMman wrote: > On 2016/07/10 04:16:52, bruening wrote: > > Again, let's not expose the implementation: there's no need to force ourselves > > to always use a fault (e.g., it could be faster to explicitly check for the > end > > of a tiny buffer that fills up so frequently the overhead of a fault is too > > much). Can we have a different name for this? I would suggest putting this > > first and calling it a Trace Buffer. This is the most common usage scenario I > > would think. > > Done. Should I change the API to reflect this (s/faulting/trace/g)? Yes, that's best IMHO. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h File ext/drx/drx.h (right): https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode280 ext/drx/drx.h:280: * before the final drx_buf_fill_cb_t has been called back, if applicable. On 2016/07/11 03:10:22, DSOTMman wrote: > On 2016/07/10 04:16:53, bruening wrote: > > Hmm, I would expect it to be called *after* the final fill is called (assuming > > the final fill is called regardless of whether the buffer is actually full), > > because if there are other resources tied to the buffer (like a file) they > > cannot be freed until *after* the final fill operation. What is the reason > for > > the order you describe? > > Specifically, drcachesim does some post-processing on the buffer to make it > known that the thread exited correctly, before calling the callback. In order to > mimic this behavior this exit callback must occur before the final callback. It looks to me like drcachesim would work fine with a full-then-exit order: its exit handler would write the thread-exit marker to the pipe directly, right? > That being said I just noticed that I don't take into account the order of the > thread_exit events, so a user could clean up his/her resources first before the > drx_buf thread_exit event even gets called. Should I instead make the client > call drx_buf_thread_{init,free}() methods from their own thread events and > drx_buf wouldn't even register for them? That way we wouldn't even need to have > this callback system for init and exit events. Hmm, so drmgr is not providing priority control for thread init/exit. We (I guess I really mean *you* :)) could add drmgr_register_thread_init_event_ex() which takes in a priority and then you could control the order. If we could control the order, we also wouldn't need the init/exit events as the client could call them from regular thread init/exit events? Though I wonder if it's nicer for the client to have explicit per-thread-buffer init/exit? Hmm -- Qin, any opinion? https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c File ext/drx/drx_buf.c (right): https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode80 ext/drx/drx_buf.c:80: int tls_slot; /* corresponds to the index into the clients vector */ On 2016/07/11 03:10:23, DSOTMman wrote: > On 2016/07/10 04:16:54, bruening wrote: > > See below: I think this has to be split into two > > How do you mean? From the below comments it seems more like we have to > consolidate the current raw tls globals into the drx_buf_t struct. By "split" I mean the vector index has to be a separate value from the TLS slot offset. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode488 ext/drx/drx_buf.c:488: MINSERT(ilist, where, instr); On 2016/07/11 03:10:23, DSOTMman wrote: > On 2016/07/10 04:16:54, bruening wrote: > > Hmm, I was going to ask why we needed to provide these store API routines, but > I > > suppose the answer is to ensure they are set up properly for fault handling? > > Right, I remember the docs desired a way to store an immediate in a cross > platform way without having to resort to just always moving an immediate into a > register first, even on x86. > > Also in the design docs we described drx_buf_insert_buf_store_ptr() or sthg > which would copy a whole buffer over at once, but I elected not to implement it > here since it would make more sense in a client (memval probably, when I get > around to testing it with the new drreg fixes that were introduced). The biggest > problem with the implementation is that for the subroutine to behave like the > other drx_buf_insert_buf_store*() operations, it would have to be implemented > with a single instruction (rep movb), which I don't think exists on ARM. Right. This seems lower priority and it's prob ok to not provide it in this library.
Sign in to reply to this message.
just looked at the drx.h, some question about the api might worth discussing. Basically I prefer fewer routines. Can we get rid of _fast, and _1byte, _2bytes? Any benefit to expose them instead of internal handling? https://codereview.appspot.com/297700043/diff/20001/ext/drx/drx.h File ext/drx/drx.h (right): https://codereview.appspot.com/297700043/diff/20001/ext/drx/drx.h#newcode304 ext/drx/drx.h:304: drx_buf_exit_cb_t exit); do we really need this one? it could be just an internal function called from drx_buffer_create_circular_buffer with buffer size being 65536 https://codereview.appspot.com/297700043/diff/20001/ext/drx/drx.h#newcode334 ext/drx/drx.h:334: drx_buf_create_faulting_buffer(size_t buffer_size, Isn't faulting the internal implementation details. I am not sure it should be stated in the name. We could use a different implementation without faulting. https://codereview.appspot.com/297700043/diff/20001/ext/drx/drx.h#newcode414 ext/drx/drx.h:414: * size. do we need the above two (1byte and 2bytes), can we have internal dispatch within this function instead? https://codereview.appspot.com/297700043/diff/20001/ext/drx/drx.h#newcode459 ext/drx/drx.h:459: * Retrieves a pointer to the base of the buffer. wrong comment, size?
Sign in to reply to this message.
Done w/ all of Qin's suggestions I'll wait to push the changes for Qin's review though, at least until we come to a consensus on drmgr_register_thread_init_event_ex() and whether I should put the drx_buf init stuff right into drx_init(). Having comments across two patchsets is confusing, at least to me :P https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h File ext/drx/drx.h (right): https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode280 ext/drx/drx.h:280: * before the final drx_buf_fill_cb_t has been called back, if applicable. On 2016/07/11 16:20:54, bruening wrote: > On 2016/07/11 03:10:22, DSOTMman wrote: > > On 2016/07/10 04:16:53, bruening wrote: > > > Hmm, I would expect it to be called *after* the final fill is called > (assuming > > > the final fill is called regardless of whether the buffer is actually full), > > > because if there are other resources tied to the buffer (like a file) they > > > cannot be freed until *after* the final fill operation. What is the reason > > for > > > the order you describe? > > > > Specifically, drcachesim does some post-processing on the buffer to make it > > known that the thread exited correctly, before calling the callback. In order > to > > mimic this behavior this exit callback must occur before the final callback. > > It looks to me like drcachesim would work fine with a full-then-exit order: its > exit handler would write the thread-exit marker to the pipe directly, right? > Looking at it again that's right, looks like it was less complicated than I thought it was originally. > > That being said I just noticed that I don't take into account the order of the > > thread_exit events, so a user could clean up his/her resources first before > the > > drx_buf thread_exit event even gets called. Should I instead make the client > > call drx_buf_thread_{init,free}() methods from their own thread events and > > drx_buf wouldn't even register for them? That way we wouldn't even need to > have > > this callback system for init and exit events. > > Hmm, so drmgr is not providing priority control for thread init/exit. We (I > guess I really mean *you* :)) could add drmgr_register_thread_init_event_ex() > which takes in a priority and then you could control the order. If we could > control the order, we also wouldn't need the init/exit events as the client > could call them from regular thread init/exit events? Though I wonder if it's > nicer for the client to have explicit per-thread-buffer init/exit? Hmm -- Qin, > any opinion? I don't think Qin commented explicitly on this yet. I think looking into a drmgr_register_thread_init_event_ex() makes sense, since I think it would be a little convoluted having a drx_buf_{create,free}() and an explicit drx_buf_thread_{init,free}() or something along those lines. However, I don't understand what you meant when you said that being able to control the order of the drmgr thread events would allow us to get rid of the init/exit events. Could you elaborate on this? Seeing as Qin doesn't want the API to have so many methods, and if there are no objections I'll look into implementing a drmgr_register_thread_init_event_ex(). https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c File ext/drx/drx_buf.c (right): https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode80 ext/drx/drx_buf.c:80: int tls_slot; /* corresponds to the index into the clients vector */ On 2016/07/11 16:20:54, bruening wrote: > On 2016/07/11 03:10:23, DSOTMman wrote: > > On 2016/07/10 04:16:54, bruening wrote: > > > See below: I think this has to be split into two > > > > How do you mean? From the below comments it seems more like we have to > > consolidate the current raw tls globals into the drx_buf_t struct. > > By "split" I mean the vector index has to be a separate value from the TLS slot > offset. Done. https://codereview.appspot.com/297700043/diff/1/ext/drx/drx_buf.c#newcode192 ext/drx/drx_buf.c:192: init_drmgr_exception = true; On 2016/07/11 03:10:23, DSOTMman wrote: > On 2016/07/10 04:16:54, bruening wrote: > > Ditto. > > Acknowledged, waiting on the above response. Also wondering if I should make this threadsafe (init_drmgr_exception). https://codereview.appspot.com/297700043/diff/20001/ext/drx/drx.h File ext/drx/drx.h (right): https://codereview.appspot.com/297700043/diff/20001/ext/drx/drx.h#newcode304 ext/drx/drx.h:304: drx_buf_exit_cb_t exit); On 2016/07/12 19:37:40, zhaoqin wrote: > do we really need this one? > it could be just an internal function called from > drx_buffer_create_circular_buffer with buffer size being 65536 Done. https://codereview.appspot.com/297700043/diff/20001/ext/drx/drx.h#newcode334 ext/drx/drx.h:334: drx_buf_create_faulting_buffer(size_t buffer_size, On 2016/07/12 19:37:40, zhaoqin wrote: > Isn't faulting the internal implementation details. > I am not sure it should be stated in the name. > > We could use a different implementation without faulting. Done. https://codereview.appspot.com/297700043/diff/20001/ext/drx/drx.h#newcode414 ext/drx/drx.h:414: * size. On 2016/07/12 19:37:40, zhaoqin wrote: > do we need the above two (1byte and 2bytes), can we have internal dispatch > within this function instead? Done. https://codereview.appspot.com/297700043/diff/20001/ext/drx/drx.h#newcode459 ext/drx/drx.h:459: * Retrieves a pointer to the base of the buffer. On 2016/07/12 19:37:40, zhaoqin wrote: > wrong comment, size? Done.
Sign in to reply to this message.
https://codereview.appspot.com/297700043/diff/20001/ext/drx/drx.h File ext/drx/drx.h (right): https://codereview.appspot.com/297700043/diff/20001/ext/drx/drx.h#newcode334 ext/drx/drx.h:334: drx_buf_create_faulting_buffer(size_t buffer_size, On 2016/07/12 19:37:40, zhaoqin wrote: > Isn't faulting the internal implementation details. > I am not sure it should be stated in the name. > > We could use a different implementation without faulting. I guess you didn't see my earlier comments: this was already discussed and a name change requested.
Sign in to reply to this message.
https://codereview.appspot.com/297700043/diff/20001/ext/drx/drx.h File ext/drx/drx.h (right): https://codereview.appspot.com/297700043/diff/20001/ext/drx/drx.h#newcode414 ext/drx/drx.h:414: * size. On 2016/07/13 03:12:01, DSOTMman wrote: > On 2016/07/12 19:37:40, zhaoqin wrote: > > do we need the above two (1byte and 2bytes), can we have internal dispatch > > within this function instead? > > Done. Just think about the ARM case, the 1 byte store may still use the full register name, so we might have to add a parameter to indicate the size. Adding a parameter would still better than 3 separate functions.
Sign in to reply to this message.
https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h File ext/drx/drx.h (right): https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode280 ext/drx/drx.h:280: * before the final drx_buf_fill_cb_t has been called back, if applicable. On 2016/07/13 03:12:01, DSOTMman wrote: > On 2016/07/11 16:20:54, bruening wrote: > > On 2016/07/11 03:10:22, DSOTMman wrote: > > > On 2016/07/10 04:16:53, bruening wrote: > > > > Hmm, I would expect it to be called *after* the final fill is called > > (assuming > > > > the final fill is called regardless of whether the buffer is actually > full), > > > > because if there are other resources tied to the buffer (like a file) they > > > > cannot be freed until *after* the final fill operation. What is the > reason > > > for > > > > the order you describe? > > > > > > Specifically, drcachesim does some post-processing on the buffer to make it > > > known that the thread exited correctly, before calling the callback. In > order > > to > > > mimic this behavior this exit callback must occur before the final callback. > > > > It looks to me like drcachesim would work fine with a full-then-exit order: > its > > exit handler would write the thread-exit marker to the pipe directly, right? > > > Looking at it again that's right, looks like it was less complicated than I > thought it was originally. > > > > That being said I just noticed that I don't take into account the order of > the > > > thread_exit events, so a user could clean up his/her resources first before > > the > > > drx_buf thread_exit event even gets called. Should I instead make the client > > > call drx_buf_thread_{init,free}() methods from their own thread events and > > > drx_buf wouldn't even register for them? That way we wouldn't even need to > > have > > > this callback system for init and exit events. > > > > Hmm, so drmgr is not providing priority control for thread init/exit. We (I > > guess I really mean *you* :)) could add drmgr_register_thread_init_event_ex() > > which takes in a priority and then you could control the order. If we could > > control the order, we also wouldn't need the init/exit events as the client > > could call them from regular thread init/exit events? Though I wonder if it's > > nicer for the client to have explicit per-thread-buffer init/exit? Hmm -- > Qin, > > any opinion? > > I don't think Qin commented explicitly on this yet. I think looking into a > drmgr_register_thread_init_event_ex() makes sense, since I think it would be a > little convoluted having a drx_buf_{create,free}() and an explicit > drx_buf_thread_{init,free}() or something along those lines. > > However, I don't understand what you meant when you said that being able to > control the order of the drmgr thread events would allow us to get rid of the > init/exit events. Could you elaborate on this? It would let us remove them if we tell the user to perform buffer cleanup in his own thread exit event, as we can guarantee that will happen before we tear down the buffer itself. To help decide, could you list what actions each current buffer user (drcachesim tracer, bbbuf, memtrace_{simple,x86}, instrace_{simple,x86}) needs to take for thread init and for thread exit? We want to see what's easiest for the user. We don't want to force the user to store some TLS data just for this, if it wouldn't need to do so if we called a buffer thread exit routine ourselves.
Sign in to reply to this message.
https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h File ext/drx/drx.h (right): https://codereview.appspot.com/297700043/diff/1/ext/drx/drx.h#newcode280 ext/drx/drx.h:280: * before the final drx_buf_fill_cb_t has been called back, if applicable. On 2016/07/13 17:57:23, bruening wrote: > On 2016/07/13 03:12:01, DSOTMman wrote: > > On 2016/07/11 16:20:54, bruening wrote: > > > On 2016/07/11 03:10:22, DSOTMman wrote: > > > > On 2016/07/10 04:16:53, bruening wrote: > > > > > Hmm, I would expect it to be called *after* the final fill is called > > > (assuming > > > > > the final fill is called regardless of whether the buffer is actually > > full), > > > > > because if there are other resources tied to the buffer (like a file) > they > > > > > cannot be freed until *after* the final fill operation. What is the > > reason > > > > for > > > > > the order you describe? > > > > > > > > Specifically, drcachesim does some post-processing on the buffer to make > it > > > > known that the thread exited correctly, before calling the callback. In > > order > > > to > > > > mimic this behavior this exit callback must occur before the final > callback. > > > > > > It looks to me like drcachesim would work fine with a full-then-exit order: > > its > > > exit handler would write the thread-exit marker to the pipe directly, right? > > > > > Looking at it again that's right, looks like it was less complicated than I > > thought it was originally. > > > > > > That being said I just noticed that I don't take into account the order of > > the > > > > thread_exit events, so a user could clean up his/her resources first > before > > > the > > > > drx_buf thread_exit event even gets called. Should I instead make the > client > > > > call drx_buf_thread_{init,free}() methods from their own thread events and > > > > drx_buf wouldn't even register for them? That way we wouldn't even need to > > > have > > > > this callback system for init and exit events. > > > > > > Hmm, so drmgr is not providing priority control for thread init/exit. We (I > > > guess I really mean *you* :)) could add > drmgr_register_thread_init_event_ex() > > > which takes in a priority and then you could control the order. If we could > > > control the order, we also wouldn't need the init/exit events as the client > > > could call them from regular thread init/exit events? Though I wonder if > it's > > > nicer for the client to have explicit per-thread-buffer init/exit? Hmm -- > > Qin, > > > any opinion? > > > > I don't think Qin commented explicitly on this yet. I think looking into a > > drmgr_register_thread_init_event_ex() makes sense, since I think it would be a > > little convoluted having a drx_buf_{create,free}() and an explicit > > drx_buf_thread_{init,free}() or something along those lines. > > > > However, I don't understand what you meant when you said that being able to > > control the order of the drmgr thread events would allow us to get rid of the > > init/exit events. Could you elaborate on this? > > It would let us remove them if we tell the user to perform buffer cleanup in his > own thread exit event, as we can guarantee that will happen before we tear down > the buffer itself. > Gotcha, I understand now. > To help decide, could you list what actions each current buffer user (drcachesim > tracer, bbbuf, memtrace_{simple,x86}, instrace_{simple,x86}) needs to take for > thread init and for thread exit? We want to see what's easiest for the user. > We don't want to force the user to store some TLS data just for this, if it > wouldn't need to do so if we called a buffer thread exit routine ourselves. For the most part the trace samples ({mem,inst}trace_{simple,x86}) currently allocate the buffer and do nothing else to process the buffer. However memtrace_x86 also does a bit of arithmetic to get the address of the end of the buffer. In the thread_exit event they all dump the buffer one last time and then finally de-allocate the buffer. bbbuf currently wants to allocate the buffer and then memset it to 0, but that's it. Its thread_exit event just de-allocates the buffer. drcachesim also actively manipulates the buffer in the event_init method, by doing a memset, setting up the redzone (which we determined wasn't quite necessary anymore), and finally by writing a thread init marker onto the buffer to write it out to a pipe. In the exit_event it writes a thread exit marker onto the buffer and writes out to the pipe one last time. So far, it seems that fewer than half these clients actively process the buffer in the init and exit events. https://codereview.appspot.com/297700043/diff/20001/ext/drx/drx.h File ext/drx/drx.h (right): https://codereview.appspot.com/297700043/diff/20001/ext/drx/drx.h#newcode414 ext/drx/drx.h:414: * size. On 2016/07/13 14:56:15, zhaoqin wrote: > On 2016/07/13 03:12:01, DSOTMman wrote: > > On 2016/07/12 19:37:40, zhaoqin wrote: > > > do we need the above two (1byte and 2bytes), can we have internal dispatch > > > within this function instead? > > > > Done. > > Just think about the ARM case, the 1 byte store may still use the full register > name, so we might have to add a parameter to indicate the size. > Adding a parameter would still better than 3 separate functions. Ah okay, I'm not too familiar with ARM (nor am I able to test on ARM currently) but I think I understand. So if I didn't pass in the size as an argument, on ARM I would always accidentally be invoking the 4 or 8 byte store? Is it not legal to call reg_resize_to_opsz() on an ARM register? Done.
Sign in to reply to this message.
> > Just think about the ARM case, the 1 byte store may still use the full > register > > name, so we might have to add a parameter to indicate the size. > > Adding a parameter would still better than 3 separate functions. > > Ah okay, I'm not too familiar with ARM (nor am I able to test on ARM currently) > but I think I understand. So if I didn't pass in the size as an argument, on ARM > I would always accidentally be invoking the 4 or 8 byte store? Is it not legal > to call reg_resize_to_opsz() on an ARM register? Done. You can call reg_resize_to_opsz on an ARM reg, which just return the same reg. https://github.com/DynamoRIO/dynamorio/blob/e74114f638b35c401b1a2b271737646f1... else if (sz == OPSZ_2) return IF_ARM_ELSE(reg, reg_32_to_16(reg)); else if (sz == OPSZ_1) return IF_ARM_ELSE(reg, reg_32_to_8(reg)); So yes, in ARM (AArch32), the reg is reg for 2 and 1 byte. Though, for AArch64, 4 byte and 8 byte regs are different.
Sign in to reply to this message.
> For the most part the trace samples ({mem,inst}trace_{simple,x86}) currently > allocate the buffer and do nothing else to process the buffer. However > memtrace_x86 also does a bit of arithmetic to get the address of the end of the > buffer. In the thread_exit event they all dump the buffer one last time and then > finally de-allocate the buffer. I think it is because memtrace_x86 has to manange the buffer itself. Once we have the buf API, it only need print the buffer on the callback. > > bbbuf currently wants to allocate the buffer and then memset it to 0, but that's > it. Its thread_exit event just de-allocates the buffer. I wrote it while imaging it is used to stored the last N basic blocks history for debugging or something. > > drcachesim also actively manipulates the buffer in the event_init method, by > doing a memset, setting up the redzone (which we determined wasn't quite > necessary anymore), and finally by writing a thread init marker onto the buffer > to write it out to a pipe. In the exit_event it writes a thread exit marker onto > the buffer and writes out to the pipe one last time. > > So far, it seems that fewer than half these clients actively process the buffer > in the init and exit events. right, so far I cannot remember any client that requires special resource management with the buffer, (though it does not mean it did not exist such requirement). Can you image some possible case that a client would need some resource, and the thread exit ordering messes things up? > > https://codereview.appspot.com/297700043/diff/20001/ext/drx/drx.h > File ext/drx/drx.h (right): > > https://codereview.appspot.com/297700043/diff/20001/ext/drx/drx.h#newcode414 > ext/drx/drx.h:414: * size. > On 2016/07/13 14:56:15, zhaoqin wrote: > > On 2016/07/13 03:12:01, DSOTMman wrote: > > > On 2016/07/12 19:37:40, zhaoqin wrote: > > > > do we need the above two (1byte and 2bytes), can we have internal dispatch > > > > within this function instead? > > > > > > Done. > > > > Just think about the ARM case, the 1 byte store may still use the full > register > > name, so we might have to add a parameter to indicate the size. > > Adding a parameter would still better than 3 separate functions. > > Ah okay, I'm not too familiar with ARM (nor am I able to test on ARM currently) > but I think I understand. So if I didn't pass in the size as an argument, on ARM > I would always accidentally be invoking the 4 or 8 byte store? Is it not legal > to call reg_resize_to_opsz() on an ARM register? Done.
Sign in to reply to this message.
On 2016/07/15 15:07:21, zhaoqin wrote: > > > > bbbuf currently wants to allocate the buffer and then memset it to 0, but > that's > > it. Its thread_exit event just de-allocates the buffer. > > I wrote it while imaging it is used to stored the last N basic blocks history > for debugging or something. So here is one case of a client needing to take an action on each thread buffer. Qin, the question is, would we make the client do this in its own normal thread_init event handler (and thus require drmgr priorities to put drx_buf first so it can allocate the buffer in the first place), or would drx_buf provide an event for thread buffer init? > > So far, it seems that fewer than half these clients actively process the > buffer > > in the init and exit events. > > right, so far I cannot remember any client that requires special resource > management with the buffer, (though it does not mean it did not exist such > requirement). The most common is going to be a per-thread file, instead of global, where the buffer is written. > Can you image some possible case that a client would need some resource, and the > thread exit ordering messes things up? Yes, the client needs the final buffer output callback to go first, then the client's closing of the file, and finally drx_buf's free of the buffer (I suppose the last two could maybe go any order in this case but it seems best to control the order in case the file close wants to query the buffer state). The same question applies: is the file close in a regular thread_exit or in a special buffer_thread_exit?
Sign in to reply to this message.
On 2016/07/15 15:19:58, bruening wrote: > On 2016/07/15 15:07:21, zhaoqin wrote: > > > > > > bbbuf currently wants to allocate the buffer and then memset it to 0, but > > that's > > > it. Its thread_exit event just de-allocates the buffer. > > > > I wrote it while imaging it is used to stored the last N basic blocks history > > for debugging or something. > > So here is one case of a client needing to take an action on each thread buffer. > Qin, the question is, would we make the client do this in its own normal > thread_init event handler (and thus require drmgr priorities to put drx_buf > first so it can allocate the buffer in the first place), or would drx_buf > provide an event for thread buffer init? > > > > So far, it seems that fewer than half these clients actively process the > > buffer > > > in the init and exit events. > > > > right, so far I cannot remember any client that requires special resource > > management with the buffer, (though it does not mean it did not exist such > > requirement). > > The most common is going to be a per-thread file, instead of global, where the > buffer is written. > > > Can you image some possible case that a client would need some resource, and > the > > thread exit ordering messes things up? > > Yes, the client needs the final buffer output callback to go first, then the > client's closing of the file, and finally drx_buf's free of the buffer (I > suppose the last two could maybe go any order in this case but it seems best to > control the order in case the file close wants to query the buffer state). The > same question applies: is the file close in a regular thread_exit or in a > special buffer_thread_exit? For some reason I can't acknowledge Qin's comment responding to my reg_resize_to_opsz() method question (I can't even see the comment in the patchset view). Acknowledged here. I also have an unrelated note. A while ago I commented on drx_buf.c:192 that I failed to make the init_drmgr_exception var threadsafe (I don't think anyone saw it yet). Should I make it threadsafe? I could trivially use dr_atomic_add32_return_sum() but I think the intent would be clearer if I used some sort of atomic CAS routine, for example: if (bt != CIRCULAR_FAST && dr_atomic_cas32(&init_drmgr_exception, false, true)) .... IIRC there is an atomic_compare_exchange() method in arch_exports.h that I could use to add an dr_atomic_cas32() method alongside dr_atomic_add32_return_sum(). If I shouldn't do this, I can just fall back to using dr_atomic_add32_return_sum() to get threadsafety here.
Sign in to reply to this message.
On Sun, Jul 17, 2016 at 12:21 PM, <DSOTMman@gmail.com> wrote: > For some reason I can't acknowledge Qin's comment responding to my > reg_resize_to_opsz() > method question (I can't even see the comment in the patchset view). > Acknowledged here. > Probably he replied to the email instead of going to the code review site to reply to the per-line comment. I also have an unrelated note. A while ago I commented on drx_buf.c:192 > that I > failed to make the init_drmgr_exception var threadsafe (I don't think > anyone saw it > yet). Should I make it threadsafe? So you do not like the idea of requiring drx_buf users to call drx_init() and setting up all global state in drx_init(), just like the drx note users need to use -- hey wait, drx_insert_counter_update() claims "May be called without calling drx_init()" and yet it uses drx note values that are not reserved unless drx_init() was called. That doesn't look right. > I could trivially use > dr_atomic_add32_return_sum() > but I think the intent would be clearer if I used some sort of atomic > CAS routine, > for example: > > if (bt != CIRCULAR_FAST && dr_atomic_cas32(&init_drmgr_exception, false, > true)) > .... > > IIRC there is an atomic_compare_exchange() method in arch_exports.h that > I could > use to add an dr_atomic_cas32() method alongside > dr_atomic_add32_return_sum(). If > I shouldn't do this, I can just fall back to using > dr_atomic_add32_return_sum() to > get threadsafety here. > If you want to add a lower-level atomic routine that seems fine, but I would spell it out rather than use "cas" (to make it as clear as possible, and clarify whether it's "swap" or "set").
Sign in to reply to this message.
On Mon, Jul 18, 2016 at 4:30 PM, Derek Bruening <bruening@google.com> wrote: > So you do not like the idea of requiring drx_buf users to call drx_init() > and setting up all global state in drx_init(), just like the drx note users > need to use Whoops, I didn't realize that putting the init stuff into drx_init() would necessarily mean that there could be no conditional, so we'd always register the restore state and fault handlers (after all, we can't know if a user is going to use a slow circular/trace buffer just from drx_init() alone). I still don't have a strong opinion for or against this, but I'm okay with putting it all into drx_init() and doing away with the conditional. By extension, that also means we don't have to worry about adding a dr_atomic_cas32() method, which is nice.
Sign in to reply to this message.
It's basically simplicity of the implementation (all init in drx_init) vs slightly simpler and more efficient to only initialize what the user uses but at the cost of more complex code. I could also see it either way, but it doesn't seem much of a burden to call drx_init and register an event all the time. On Mon, Jul 18, 2016 at 6:16 PM, Toshi Piazza <dsotmman@gmail.com> wrote: > > On Mon, Jul 18, 2016 at 4:30 PM, Derek Bruening <bruening@google.com> > wrote: > >> So you do not like the idea of requiring drx_buf users to call drx_init() >> and setting up all global state in drx_init(), just like the drx note users >> need to use > > > Whoops, I didn't realize that putting the init stuff into drx_init() would > necessarily mean that there could be no conditional, so we'd always > register the restore state and fault handlers (after all, we can't know if > a user is going to use a slow circular/trace buffer just from drx_init() > alone). I still don't have a strong opinion for or against this, but I'm > okay with putting it all into drx_init() and doing away with the > conditional. > > By extension, that also means we don't have to worry about adding a > dr_atomic_cas32() method, which is nice. >
Sign in to reply to this message.
Agreed. > On Jul 20, 2016, at 10:58 PM, Derek Bruening <bruening@google.com> wrote: > > It's basically simplicity of the implementation (all init in drx_init) vs slightly simpler and more efficient to only initialize what the user uses but at the cost of more complex code. I could also see it either way, but it doesn't seem much of a burden to call drx_init and register an event all the time. > >> On Mon, Jul 18, 2016 at 6:16 PM, Toshi Piazza <dsotmman@gmail.com> wrote: >> >>> On Mon, Jul 18, 2016 at 4:30 PM, Derek Bruening <bruening@google.com> wrote: >>> So you do not like the idea of requiring drx_buf users to call drx_init() and setting up all global state in drx_init(), just like the drx note users need to use >> >> Whoops, I didn't realize that putting the init stuff into drx_init() would necessarily mean that there could be no conditional, so we'd always register the restore state and fault handlers (after all, we can't know if a user is going to use a slow circular/trace buffer just from drx_init() alone). I still don't have a strong opinion for or against this, but I'm okay with putting it all into drx_init() and doing away with the conditional. >> >> By extension, that also means we don't have to worry about adding a dr_atomic_cas32() method, which is nice. >
Sign in to reply to this message.
It seems like Qin still hasn't responded to our questions; should I send up my patch as is and see if he gives it the OK then? This patchset would have the minor fixes Qin pointed out a while ago and also includes moving all the init stuff into drx_init and adding the drmgr_register_event_thread_{init,exit}_ex() methods. Also, I got a raspberry pi to test on and it seems my test is failing on ARM. I'll investigate and make changes as necessary. These changes for ARM will appear in a later patchset (I still haven't run the full test suite for ARM on my pi for storage reasons, but I'm looking into getting a large usb thumb drive to build the test suite on). On Fri, Jul 22, 2016 at 12:18 PM, Toshi Piazza <dsotmman@gmail.com> wrote: > Agreed. > > On Jul 20, 2016, at 10:58 PM, Derek Bruening <bruening@google.com> wrote: > > It's basically simplicity of the implementation (all init in drx_init) vs > slightly simpler and more efficient to only initialize what the user uses > but at the cost of more complex code. I could also see it either way, but > it doesn't seem much of a burden to call drx_init and register an event all > the time. > > On Mon, Jul 18, 2016 at 6:16 PM, Toshi Piazza <dsotmman@gmail.com> wrote: > >> >> On Mon, Jul 18, 2016 at 4:30 PM, Derek Bruening <bruening@google.com> >> wrote: >> >>> So you do not like the idea of requiring drx_buf users to call >>> drx_init() and setting up all global state in drx_init(), just like the drx >>> note users need to use >> >> >> Whoops, I didn't realize that putting the init stuff into drx_init() >> would necessarily mean that there could be no conditional, so we'd always >> register the restore state and fault handlers (after all, we can't know if >> a user is going to use a slow circular/trace buffer just from drx_init() >> alone). I still don't have a strong opinion for or against this, but I'm >> okay with putting it all into drx_init() and doing away with the >> conditional. >> >> By extension, that also means we don't have to worry about adding a >> dr_atomic_cas32() method, which is nice. >> > >
Sign in to reply to this message.
Sure. On Sun, Jul 24, 2016 at 3:24 PM, Toshi Piazza <dsotmman@gmail.com> wrote: > It seems like Qin still hasn't responded to our questions; should I send > up my patch as is and see if he gives it the OK then? This patchset would > have the minor fixes Qin pointed out a while ago and also includes moving > all the init stuff into drx_init and adding the > drmgr_register_event_thread_{init,exit}_ex() methods. > > Also, I got a raspberry pi to test on and it seems my test is failing on > ARM. I'll investigate and make changes as necessary. These changes for ARM > will appear in a later patchset (I still haven't run the full test suite > for ARM on my pi for storage reasons, but I'm looking into getting a large > usb thumb drive to build the test suite on). > > On Fri, Jul 22, 2016 at 12:18 PM, Toshi Piazza <dsotmman@gmail.com> wrote: > >> Agreed. >> >> On Jul 20, 2016, at 10:58 PM, Derek Bruening <bruening@google.com> wrote: >> >> It's basically simplicity of the implementation (all init in drx_init) vs >> slightly simpler and more efficient to only initialize what the user uses >> but at the cost of more complex code. I could also see it either way, but >> it doesn't seem much of a burden to call drx_init and register an event all >> the time. >> >> On Mon, Jul 18, 2016 at 6:16 PM, Toshi Piazza <dsotmman@gmail.com> wrote: >> >>> >>> On Mon, Jul 18, 2016 at 4:30 PM, Derek Bruening <bruening@google.com> >>> wrote: >>> >>>> So you do not like the idea of requiring drx_buf users to call >>>> drx_init() and setting up all global state in drx_init(), just like the drx >>>> note users need to use >>> >>> >>> Whoops, I didn't realize that putting the init stuff into drx_init() >>> would necessarily mean that there could be no conditional, so we'd always >>> register the restore state and fault handlers (after all, we can't know if >>> a user is going to use a slow circular/trace buffer just from drx_init() >>> alone). I still don't have a strong opinion for or against this, but I'm >>> okay with putting it all into drx_init() and doing away with the >>> conditional. >>> >>> By extension, that also means we don't have to worry about adding a >>> dr_atomic_cas32() method, which is nice. >>> >> >> >
Sign in to reply to this message.
Thanks, will put up my patchset tomorrow. Just want to do a quick look-over first. On Sun, Jul 24, 2016 at 5:29 PM, Derek Bruening <bruening@google.com> wrote: > Sure. > > On Sun, Jul 24, 2016 at 3:24 PM, Toshi Piazza <dsotmman@gmail.com> wrote: > >> It seems like Qin still hasn't responded to our questions; should I send >> up my patch as is and see if he gives it the OK then? This patchset would >> have the minor fixes Qin pointed out a while ago and also includes moving >> all the init stuff into drx_init and adding the >> drmgr_register_event_thread_{init,exit}_ex() methods. >> >> Also, I got a raspberry pi to test on and it seems my test is failing on >> ARM. I'll investigate and make changes as necessary. These changes for ARM >> will appear in a later patchset (I still haven't run the full test suite >> for ARM on my pi for storage reasons, but I'm looking into getting a large >> usb thumb drive to build the test suite on). >> >> On Fri, Jul 22, 2016 at 12:18 PM, Toshi Piazza <dsotmman@gmail.com> >> wrote: >> >>> Agreed. >>> >>> On Jul 20, 2016, at 10:58 PM, Derek Bruening <bruening@google.com> >>> wrote: >>> >>> It's basically simplicity of the implementation (all init in drx_init) >>> vs slightly simpler and more efficient to only initialize what the user >>> uses but at the cost of more complex code. I could also see it either way, >>> but it doesn't seem much of a burden to call drx_init and register an event >>> all the time. >>> >>> On Mon, Jul 18, 2016 at 6:16 PM, Toshi Piazza <dsotmman@gmail.com> >>> wrote: >>> >>>> >>>> On Mon, Jul 18, 2016 at 4:30 PM, Derek Bruening <bruening@google.com> >>>> wrote: >>>> >>>>> So you do not like the idea of requiring drx_buf users to call >>>>> drx_init() and setting up all global state in drx_init(), just like the drx >>>>> note users need to use >>>> >>>> >>>> Whoops, I didn't realize that putting the init stuff into drx_init() >>>> would necessarily mean that there could be no conditional, so we'd always >>>> register the restore state and fault handlers (after all, we can't know if >>>> a user is going to use a slow circular/trace buffer just from drx_init() >>>> alone). I still don't have a strong opinion for or against this, but I'm >>>> okay with putting it all into drx_init() and doing away with the >>>> conditional. >>>> >>>> By extension, that also means we don't have to worry about adding a >>>> dr_atomic_cas32() method, which is nice. >>>> >>> >>> >> >
Sign in to reply to this message.
Commit log for latest patchset: --------------- i#513 drx_buf Part 1: Adds drx_buf code and tests Constructs a convenience API for initializing and filling three different kinds of buffers: a fast and a slow circular buffer, and a "faulting" buffer which notifies the client when the buffer fills. Currently has only been tested for x86 and x86_64. Compilation on ARM has been verified via cross-compiler. We disable tests for drx_buf on ARM and AARCH64. Review-URL: https://codereview.appspot.com/297700043 ---------------
Sign in to reply to this message.
Made it to drx_buf.c, splitting the review into two b/c it is large: part 1 of 2. Commit message should probably be updated for "trace buffer". Please add this to the changelist in api/docs/release.dox. https://codereview.appspot.com/297700043/diff/40001/ext/drmgr/drmgr.h File ext/drmgr/drmgr.h (right): https://codereview.appspot.com/297700043/diff/40001/ext/drmgr/drmgr.h#newcode777 ext/drmgr/drmgr.h:777: * creates a new thread. \return whether successful. The drmgr changes should be a separate CL, with a test (even if just a sanity check call), and an addition to the changelist in api/docs/release.dox. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.c File ext/drx/drx.c (right): https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.c#newcode138 ext/drx/drx.c:138: if (!drvector_init(&clients, 1, false/*!synch*/, NULL) || Another way to do it would be to invoke drx_buf_init(), to keep all this localized in drx_buf.c. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.dox File ext/drx/drx.dox (right): https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.dox#newcode140 ext/drx/drx.dox:140: DR_REG_NULL, opnd_create_reg(scratch), OPSZ_4, 4); Maybe better to have scratch2 instead of writing the same value twice. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.dox#newcode145 ext/drx/drx.dox:145: reg_tmp, sizeof(int)*2); This is a little unclear: it says "2x the size of scratch", which implies sizeof(reg_t), not sizeof(int) -- i.e., 16 bytes for x64_64. Please either change "whatever is in the scratch reg" and "size of scratch" to make it clear it's the bottom 4 bytes, or change OPSZ_4 to OPSZ_PTR and sizeof(int) to sizeof(reg_t). https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.dox#newcode152 ext/drx/drx.dox:152: only, to ensure that an xl8 is set for each instruction. If a user writes to Spell out "xl8" https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h File ext/drx/drx.h (right): https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h#newcode274 ext/drx/drx.h:274: /** Opaque handle which represents a buffer for use by the drx_buf framework */ nit: . https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h#newcode291 ext/drx/drx.h:291: * 65536 bytes is optimized for better performance, and is equivalent Add a ref to DRX_BUF_FAST_CIRCULAR_BUFSZ https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h#newcode292 ext/drx/drx.h:292: * to calling drx_buf_create_circular_buffer_fast(). Comment needs updating: _fast() doesn't exist. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h#newcode329 ext/drx/drx.h:329: * \note \p scratch is only necessary on ARM, in the case of the fast circular Best to explicitly say "and is referred to as a 'fast circular buffer'" up above. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h#newcode369 ext/drx/drx.h:369: drx_buf_set_buffer_ptr(void *drcontext, drx_buf_t *buf, app_pc new_ptr); app_pc is for an instruction: should match the get_ routine and be void*, right? https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h#newcode377 ext/drx/drx.h:377: /** Retrieves the size of the buffer. */ Clarify whether this is the capacity or the currently-used size. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c File ext/drx/drx_buf.c (right): https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode49 ext/drx/drx_buf.c:49: #endif This can be removed: it's in the DR API headers
Sign in to reply to this message.
Done w/ all, also changed the commit message. https://codereview.appspot.com/297700043/diff/40001/ext/drmgr/drmgr.h File ext/drmgr/drmgr.h (right): https://codereview.appspot.com/297700043/diff/40001/ext/drmgr/drmgr.h#newcode777 ext/drmgr/drmgr.h:777: * creates a new thread. \return whether successful. On 2016/07/26 03:04:36, bruening wrote: > The drmgr changes should be a separate CL, with a test (even if just a sanity > check call), and an addition to the changelist in api/docs/release.dox. Just for clarification, when you say that the changes should be in a separate CL, you mean within api/docs/release.dox, right? You don't mean that I should upload a new issue/patchset just for the drmgr stuff? Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.c File ext/drx/drx.c (right): https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.c#newcode138 ext/drx/drx.c:138: if (!drvector_init(&clients, 1, false/*!synch*/, NULL) || On 2016/07/26 03:04:36, bruening wrote: > Another way to do it would be to invoke drx_buf_init(), to keep all this > localized in drx_buf.c. Just making sure, but since drx_buf_init() constructs a buffer as well, should I instead write a new method, drx_buf_only_once_init() that only sets up the state? Or should I have it so that on the first invocation of drx_buf_init() we don't construct a buffer, but on subsequent invocations we do? Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.dox File ext/drx/drx.dox (right): https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.dox#newcode140 ext/drx/drx.dox:140: DR_REG_NULL, opnd_create_reg(scratch), OPSZ_4, 4); On 2016/07/26 03:04:36, bruening wrote: > Maybe better to have scratch2 instead of writing the same value twice. Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.dox#newcode145 ext/drx/drx.dox:145: reg_tmp, sizeof(int)*2); On 2016/07/26 03:04:36, bruening wrote: > This is a little unclear: it says "2x the size of scratch", which implies > sizeof(reg_t), not sizeof(int) -- i.e., 16 bytes for x64_64. Please either > change "whatever is in the scratch reg" and "size of scratch" to make it clear > it's the bottom 4 bytes, or change OPSZ_4 to OPSZ_PTR and sizeof(int) to > sizeof(reg_t). Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.dox#newcode152 ext/drx/drx.dox:152: only, to ensure that an xl8 is set for each instruction. If a user writes to On 2016/07/26 03:04:36, bruening wrote: > Spell out "xl8" Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h File ext/drx/drx.h (right): https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h#newcode274 ext/drx/drx.h:274: /** Opaque handle which represents a buffer for use by the drx_buf framework */ On 2016/07/26 03:04:36, bruening wrote: > nit: . Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h#newcode291 ext/drx/drx.h:291: * 65536 bytes is optimized for better performance, and is equivalent On 2016/07/26 03:04:36, bruening wrote: > Add a ref to DRX_BUF_FAST_CIRCULAR_BUFSZ Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h#newcode292 ext/drx/drx.h:292: * to calling drx_buf_create_circular_buffer_fast(). On 2016/07/26 03:04:36, bruening wrote: > Comment needs updating: _fast() doesn't exist. Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h#newcode329 ext/drx/drx.h:329: * \note \p scratch is only necessary on ARM, in the case of the fast circular On 2016/07/26 03:04:36, bruening wrote: > Best to explicitly say "and is referred to as a 'fast circular buffer'" up > above. Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h#newcode369 ext/drx/drx.h:369: drx_buf_set_buffer_ptr(void *drcontext, drx_buf_t *buf, app_pc new_ptr); On 2016/07/26 03:04:36, bruening wrote: > app_pc is for an instruction: should match the get_ routine and be void*, right? Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h#newcode377 ext/drx/drx.h:377: /** Retrieves the size of the buffer. */ On 2016/07/26 03:04:36, bruening wrote: > Clarify whether this is the capacity or the currently-used size. Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c File ext/drx/drx_buf.c (right): https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode49 ext/drx/drx_buf.c:49: #endif On 2016/07/26 03:04:36, bruening wrote: > This can be removed: it's in the DR API headers Done.
Sign in to reply to this message.
On 2016/07/27 01:45:24, toshi wrote: > Done w/ all, also changed the commit message. > > https://codereview.appspot.com/297700043/diff/40001/ext/drmgr/drmgr.h > File ext/drmgr/drmgr.h (right): > > https://codereview.appspot.com/297700043/diff/40001/ext/drmgr/drmgr.h#newcode777 > ext/drmgr/drmgr.h:777: * creates a new thread. \return whether successful. > On 2016/07/26 03:04:36, bruening wrote: > > The drmgr changes should be a separate CL, with a test (even if just a sanity > > check call), and an addition to the changelist in api/docs/release.dox. > > Just for clarification, when you say that the changes should be in a separate > CL, you mean within api/docs/release.dox, right? You don't mean that I should > upload a new issue/patchset just for the drmgr stuff? > Done. > > https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.c > File ext/drx/drx.c (right): > > https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.c#newcode138 > ext/drx/drx.c:138: if (!drvector_init(&clients, 1, false/*!synch*/, NULL) || > On 2016/07/26 03:04:36, bruening wrote: > > Another way to do it would be to invoke drx_buf_init(), to keep all this > > localized in drx_buf.c. > > Just making sure, but since drx_buf_init() constructs a buffer as well, should I > instead write a new method, drx_buf_only_once_init() that only sets up the > state? Or should I have it so that on the first invocation of drx_buf_init() we > don't construct a buffer, but on subsequent invocations we do? > > Done. > > https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.dox > File ext/drx/drx.dox (right): > > https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.dox#newcode140 > ext/drx/drx.dox:140: DR_REG_NULL, opnd_create_reg(scratch), OPSZ_4, 4); > On 2016/07/26 03:04:36, bruening wrote: > > Maybe better to have scratch2 instead of writing the same value twice. > > Done. > > https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.dox#newcode145 > ext/drx/drx.dox:145: reg_tmp, sizeof(int)*2); > On 2016/07/26 03:04:36, bruening wrote: > > This is a little unclear: it says "2x the size of scratch", which implies > > sizeof(reg_t), not sizeof(int) -- i.e., 16 bytes for x64_64. Please either > > change "whatever is in the scratch reg" and "size of scratch" to make it clear > > it's the bottom 4 bytes, or change OPSZ_4 to OPSZ_PTR and sizeof(int) to > > sizeof(reg_t). > > Done. > > https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.dox#newcode152 > ext/drx/drx.dox:152: only, to ensure that an xl8 is set for each instruction. If > a user writes to > On 2016/07/26 03:04:36, bruening wrote: > > Spell out "xl8" > > Done. > > https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h > File ext/drx/drx.h (right): > > https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h#newcode274 > ext/drx/drx.h:274: /** Opaque handle which represents a buffer for use by the > drx_buf framework */ > On 2016/07/26 03:04:36, bruening wrote: > > nit: . > > Done. > > https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h#newcode291 > ext/drx/drx.h:291: * 65536 bytes is optimized for better performance, and is > equivalent > On 2016/07/26 03:04:36, bruening wrote: > > Add a ref to DRX_BUF_FAST_CIRCULAR_BUFSZ > > Done. > > https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h#newcode292 > ext/drx/drx.h:292: * to calling drx_buf_create_circular_buffer_fast(). > On 2016/07/26 03:04:36, bruening wrote: > > Comment needs updating: _fast() doesn't exist. > > Done. > > https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h#newcode329 > ext/drx/drx.h:329: * \note \p scratch is only necessary on ARM, in the case of > the fast circular > On 2016/07/26 03:04:36, bruening wrote: > > Best to explicitly say "and is referred to as a 'fast circular buffer'" up > > above. > > Done. > > https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h#newcode369 > ext/drx/drx.h:369: drx_buf_set_buffer_ptr(void *drcontext, drx_buf_t *buf, > app_pc new_ptr); > On 2016/07/26 03:04:36, bruening wrote: > > app_pc is for an instruction: should match the get_ routine and be void*, > right? > > Done. > > https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.h#newcode377 > ext/drx/drx.h:377: /** Retrieves the size of the buffer. */ > On 2016/07/26 03:04:36, bruening wrote: > > Clarify whether this is the capacity or the currently-used size. > > Done. > > https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c > File ext/drx/drx_buf.c (right): > > https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode49 > ext/drx/drx_buf.c:49: #endif > On 2016/07/26 03:04:36, bruening wrote: > > This can be removed: it's in the DR API headers > > Done. Seems I got the same problem as Branden when submitting my patch set, where I timed out in the upload.py script. Unfortunately it looks like there are no deltas from the previous patch set. Should I re-upload or should I let it be?
Sign in to reply to this message.
Comments on the rest. The only serious issue is what looks like erroneous synchronization of the vector. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c File ext/drx/drx_buf.c (right): https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode62 ext/drx/drx_buf.c:62: TRACE style: use a prefix on each https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode74 ext/drx/drx_buf.c:74: size_t buf_size; Is there a different name for either per_thread_t.buf_size or this one to avoid confusion? https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode76 ext/drx/drx_buf.c:76: drx_buf_full_cb_t full; suggest: "full_cb" (just "full" sounds like a bool) https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode79 ext/drx/drx_buf.c:79: uint tls_offs; So we use both a drmgr TLS slot and a raw TLS slot -- we could eliminate the drmgr slot by storing a pointer prior to the actual buffer or sthg (i.e., at a negative offset). The raw ones are the scarce ones so I suppose it doesn't save us much. Never mind. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode84 ext/drx/drx_buf.c:84: drvector_t clients; nit: extra spaces https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode143 ext/drx/drx_buf.c:143: int tls_idx; nit: a bunch of these var decls seem odd: extra spaces in some places which is presumably trying to align the var name, yet not aligning adjacent names (tls_seg here) https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode156 ext/drx/drx_buf.c:156: new_client = dr_global_alloc(sizeof(drx_buf_t)); Safer to do sizeof(*new_client) https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode164 ext/drx/drx_buf.c:164: new_client->vec_idx = clients.entries; drx_buf_free sets the entry to NULL, but it never gets reused -- we don't anticipate much churn at all, so it seems ok, but it's worth a comment that it's a deliberate choice https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode176 ext/drx/drx_buf.c:176: if (!(buf && drvector_get_entry(&clients, buf->vec_idx) == buf)) style: buf != NULL https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode176 ext/drx/drx_buf.c:176: if (!(buf && drvector_get_entry(&clients, buf->vec_idx) == buf)) bug: Please explain the synch policy for "clients" in a comment by its declaration: you ask for external synch, and grab the lock to append on init, but there is no lock on any read here (or below) or on the NULL-out below. If it never resized, a lockless read approach could be made to work, but it can resize itself, making this unsafe. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode232 ext/drx/drx_buf.c:232: if (buf->buf_type == CIRCULAR_FAST) bug: drx_buf_free sets entries to NULL which will result in a crash here https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode247 ext/drx/drx_buf.c:247: per_thread_t *data = drmgr_get_tls_field(drcontext, buf->tls_idx); bug: same thing: crash on NULL https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode250 ext/drx/drx_buf.c:250: if (buf->full) { style: != NULL https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode250 ext/drx/drx_buf.c:250: if (buf->full) { This is confusing: it really looks like a bool (esp w/o the != NULL) -- I would suggest renaming to full_cb everywhere. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode256 ext/drx/drx_buf.c:256: } So if the client makes a per-thread file and writes the buffer contents there, the client can close the file (maybe aggregating into a global file first) in its own thread exit event because this event will go first? It may still be best to expose the priority of this exit event in the header and talk about it in the docs so a client can be sure of the ordering. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode289 ext/drx/drx_buf.c:289: /* We construct a buffer right before a fault by reading in as "reading in"? https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode298 ext/drx/drx_buf.c:298: dr_memory_protect(ret + per_thread->buf_size - PAGE_SIZE, Best to check the return value even if it's just a debug-build assert https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode458 ext/drx/drx_buf.c:458: ptr_int_t pc = opnd_get_immed_int(opnd); Calling it "pc" is misleading https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode461 ext/drx/drx_buf.c:461: OPND_CREATE_MEMPTR(buf_ptr, offset), I don't understand: if this is x86_64 and I want to store 4 bytes, you're going to store 8 bytes instead? https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode476 ext/drx/drx_buf.c:476: instr_set_meta(mov1); This is unnecessary (other code doing it has been removed very recently) https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode478 ext/drx/drx_buf.c:478: if (mov2 != NULL) { This needs to be a loop now (see very recent commit) https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode479 ext/drx/drx_buf.c:479: instr_set_meta(mov2); Ditto https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode564 ext/drx/drx_buf.c:564: static bool Comment what the return value means https://codereview.appspot.com/297700043/diff/40001/suite/tests/client-interf... File suite/tests/client-interface/drx_buf-test.c (right): https://codereview.appspot.com/297700043/diff/40001/suite/tests/client-interf... suite/tests/client-interface/drx_buf-test.c:97: /* FIXME: We can also trigger a segfault by trying to execute the buffer. We could s/FIXME/XXX/ (since optional improvement, not correctness issue)
Sign in to reply to this message.
https://codereview.appspot.com/297700043/diff/40001/ext/drmgr/drmgr.h File ext/drmgr/drmgr.h (right): https://codereview.appspot.com/297700043/diff/40001/ext/drmgr/drmgr.h#newcode777 ext/drmgr/drmgr.h:777: * creates a new thread. \return whether successful. On 2016/07/27 01:45:23, toshi wrote: > On 2016/07/26 03:04:36, bruening wrote: > > The drmgr changes should be a separate CL, with a test (even if just a sanity > > check call), and an addition to the changelist in api/docs/release.dox. > > Just for clarification, when you say that the changes should be in a separate > CL, you mean within api/docs/release.dox, right? You don't mean that I should > upload a new issue/patchset just for the drmgr stuff? > Done. "CL" is "Change List" which is the equivalent of a commit in git: I do mean a separate code review and separate git commit (generally it's best to split new features in other libraries, esp when the rest of this is large). https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.c File ext/drx/drx.c (right): https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.c#newcode138 ext/drx/drx.c:138: if (!drvector_init(&clients, 1, false/*!synch*/, NULL) || On 2016/07/27 01:45:24, toshi wrote: > On 2016/07/26 03:04:36, bruening wrote: > > Another way to do it would be to invoke drx_buf_init(), to keep all this > > localized in drx_buf.c. > > Just making sure, but since drx_buf_init() constructs a buffer as well, should I > instead write a new method, drx_buf_only_once_init() that only sets up the > state? Or should I have it so that on the first invocation of drx_buf_init() we > don't construct a buffer, but on subsequent invocations we do? > > Done. The former, though something like drx_buf_init_library() sounds better to me, or rename drx_buf_init() to drx_buf_create().
Sign in to reply to this message.
On Wed, Jul 27, 2016 at 1:51 AM, <bruening@google.com> wrote: > "CL" is "Change List" which is the equivalent of a commit in git: I do > mean a separate code review and separate git commit (generally it's best > to split new features in other libraries, esp when the rest of this is > large). > Will do!
Sign in to reply to this message.
Done w/ all, also changed the commit message. https://codereview.appspot.com/297700043/diff/40001/ext/drmgr/drmgr.h File ext/drmgr/drmgr.h (right): https://codereview.appspot.com/297700043/diff/40001/ext/drmgr/drmgr.h#newcode777 ext/drmgr/drmgr.h:777: * creates a new thread. \return whether successful. On 2016/07/27 05:51:49, bruening wrote: > On 2016/07/27 01:45:23, toshi wrote: > > On 2016/07/26 03:04:36, bruening wrote: > > > The drmgr changes should be a separate CL, with a test (even if just a > sanity > > > check call), and an addition to the changelist in api/docs/release.dox. > > > > Just for clarification, when you say that the changes should be in a separate > > CL, you mean within api/docs/release.dox, right? You don't mean that I should > > upload a new issue/patchset just for the drmgr stuff? > > Done. > > "CL" is "Change List" which is the equivalent of a commit in git: I do mean a > separate code review and separate git commit (generally it's best to split new > features in other libraries, esp when the rest of this is large). Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.c File ext/drx/drx.c (right): https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx.c#newcode138 ext/drx/drx.c:138: if (!drvector_init(&clients, 1, false/*!synch*/, NULL) || On 2016/07/27 05:51:49, bruening wrote: > On 2016/07/27 01:45:24, toshi wrote: > > On 2016/07/26 03:04:36, bruening wrote: > > > Another way to do it would be to invoke drx_buf_init(), to keep all this > > > localized in drx_buf.c. > > > > Just making sure, but since drx_buf_init() constructs a buffer as well, should > I > > instead write a new method, drx_buf_only_once_init() that only sets up the > > state? Or should I have it so that on the first invocation of drx_buf_init() > we > > don't construct a buffer, but on subsequent invocations we do? > > > > Done. > > The former, though something like drx_buf_init_library() sounds better to me, or > rename drx_buf_init() to drx_buf_create(). Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c File ext/drx/drx_buf.c (right): https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode62 ext/drx/drx_buf.c:62: TRACE On 2016/07/27 05:45:43, bruening wrote: > style: use a prefix on each Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode74 ext/drx/drx_buf.c:74: size_t buf_size; On 2016/07/27 05:45:43, bruening wrote: > Is there a different name for either per_thread_t.buf_size or this one to avoid > confusion? Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode76 ext/drx/drx_buf.c:76: drx_buf_full_cb_t full; On 2016/07/27 05:45:42, bruening wrote: > suggest: "full_cb" (just "full" sounds like a bool) Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode79 ext/drx/drx_buf.c:79: uint tls_offs; On 2016/07/27 05:45:43, bruening wrote: > So we use both a drmgr TLS slot and a raw TLS slot -- we could eliminate the > drmgr slot by storing a pointer prior to the actual buffer or sthg (i.e., at a > negative offset). The raw ones are the scarce ones so I suppose it doesn't save > us much. Never mind. Acknowledged. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode84 ext/drx/drx_buf.c:84: drvector_t clients; On 2016/07/27 05:45:43, bruening wrote: > nit: extra spaces Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode143 ext/drx/drx_buf.c:143: int tls_idx; On 2016/07/27 05:45:42, bruening wrote: > nit: a bunch of these var decls seem odd: extra spaces in some places which is > presumably trying to align the var name, yet not aligning adjacent names > (tls_seg here) Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode156 ext/drx/drx_buf.c:156: new_client = dr_global_alloc(sizeof(drx_buf_t)); On 2016/07/27 05:45:42, bruening wrote: > Safer to do sizeof(*new_client) Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode164 ext/drx/drx_buf.c:164: new_client->vec_idx = clients.entries; On 2016/07/27 05:45:42, bruening wrote: > drx_buf_free sets the entry to NULL, but it never gets reused -- we don't > anticipate much churn at all, so it seems ok, but it's worth a comment that it's > a deliberate choice Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode176 ext/drx/drx_buf.c:176: if (!(buf && drvector_get_entry(&clients, buf->vec_idx) == buf)) On 2016/07/27 05:45:42, bruening wrote: > bug: Please explain the synch policy for "clients" in a comment by its > declaration: you ask for external synch, and grab the lock to append on init, > but there is no lock on any read here (or below) or on the NULL-out below. If > it never resized, a lockless read approach could be made to work, but it can > resize itself, making this unsafe. Whoops, forgot to protect all the vector accesses by a lock after I changed it above for the append. Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode176 ext/drx/drx_buf.c:176: if (!(buf && drvector_get_entry(&clients, buf->vec_idx) == buf)) On 2016/07/27 05:45:43, bruening wrote: > style: buf != NULL Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode232 ext/drx/drx_buf.c:232: if (buf->buf_type == CIRCULAR_FAST) On 2016/07/27 05:45:43, bruening wrote: > bug: drx_buf_free sets entries to NULL which will result in a crash here Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode247 ext/drx/drx_buf.c:247: per_thread_t *data = drmgr_get_tls_field(drcontext, buf->tls_idx); On 2016/07/27 05:45:43, bruening wrote: > bug: same thing: crash on NULL Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode250 ext/drx/drx_buf.c:250: if (buf->full) { On 2016/07/27 05:45:43, bruening wrote: > style: != NULL Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode250 ext/drx/drx_buf.c:250: if (buf->full) { On 2016/07/27 05:45:43, bruening wrote: > This is confusing: it really looks like a bool (esp w/o the != NULL) -- I would > suggest renaming to full_cb everywhere. Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode256 ext/drx/drx_buf.c:256: } On 2016/07/27 05:45:43, bruening wrote: > So if the client makes a per-thread file and writes the buffer contents there, > the client can close the file (maybe aggregating into a global file first) in > its own thread exit event because this event will go first? It may still be > best to expose the priority of this exit event in the header and talk about it > in the docs so a client can be sure of the ordering. Gotcha, that makes sense. Done. Also put the names of the priorities into the header file, just like drreg. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode289 ext/drx/drx_buf.c:289: /* We construct a buffer right before a fault by reading in as On 2016/07/27 05:45:42, bruening wrote: > "reading in"? s/reading in/allocating/, Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode298 ext/drx/drx_buf.c:298: dr_memory_protect(ret + per_thread->buf_size - PAGE_SIZE, On 2016/07/27 05:45:43, bruening wrote: > Best to check the return value even if it's just a debug-build assert Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode458 ext/drx/drx_buf.c:458: ptr_int_t pc = opnd_get_immed_int(opnd); On 2016/07/27 05:45:43, bruening wrote: > Calling it "pc" is misleading Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode461 ext/drx/drx_buf.c:461: OPND_CREATE_MEMPTR(buf_ptr, offset), On 2016/07/27 05:45:43, bruening wrote: > I don't understand: if this is x86_64 and I want to store 4 bytes, you're going > to store 8 bytes instead? Whoops, good catch. I wonder why this didn't fail in the test. Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode476 ext/drx/drx_buf.c:476: instr_set_meta(mov1); On 2016/07/27 05:45:43, bruening wrote: > This is unnecessary (other code doing it has been removed very recently) Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode478 ext/drx/drx_buf.c:478: if (mov2 != NULL) { On 2016/07/27 05:45:42, bruening wrote: > This needs to be a loop now (see very recent commit) Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode479 ext/drx/drx_buf.c:479: instr_set_meta(mov2); On 2016/07/27 05:45:42, bruening wrote: > Ditto Done. https://codereview.appspot.com/297700043/diff/40001/ext/drx/drx_buf.c#newcode564 ext/drx/drx_buf.c:564: static bool On 2016/07/27 05:45:42, bruening wrote: > Comment what the return value means Done. https://codereview.appspot.com/297700043/diff/40001/suite/tests/client-interf... File suite/tests/client-interface/drx_buf-test.c (right): https://codereview.appspot.com/297700043/diff/40001/suite/tests/client-interf... suite/tests/client-interface/drx_buf-test.c:97: /* FIXME: We can also trigger a segfault by trying to execute the buffer. We could On 2016/07/27 05:45:43, bruening wrote: > s/FIXME/XXX/ (since optional improvement, not correctness issue) Done.
Sign in to reply to this message.
Commit log for latest patchset: --------------- i#513 drx_buf Part 2: Adds drx_buf code and tests Constructs a convenience API for initializing and filling three different kinds of buffers: a fast and a slow circular buffer, and a trace buffer which notifies the client when the buffer fills. Currently has only been tested for x86 and x86_64. Compilation on ARM has been verified via cross-compiler. We disable tests for drx_buf on ARM and AARCH64. Review-URL: https://codereview.appspot.com/297700043 ---------------
Sign in to reply to this message.
https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx_buf.c File ext/drx/drx_buf.c (right): https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx_buf.c#newcode670 ext/drx/drx_buf.c:670: buf = drvector_get_entry(&clients, i); Whoops, forgot to lock on these other vector accesses (the ones in the thread events and here). I'll address these in a future patchset.
Sign in to reply to this message.
Missing locks (as you noted) and missing doxygen links https://codereview.appspot.com/297700043/diff/80001/api/docs/release.dox File api/docs/release.dox (right): https://codereview.appspot.com/297700043/diff/80001/api/docs/release.dox#newc... api/docs/release.dox:146: - Added drx_buf() library. drx_buf is not a function name, so why ()? () is what gets you a Doxygen link for a function but there is no such function. suggest: describe it, with direct links to some key routines: "Added a \ref sec_drx_buf to drx: drx_buf_create_circular_buffer(), drx_buf_create_trace_buffer(), and more." https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx.h File ext/drx/drx.h (right): https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx.h#newcode287 ext/drx/drx.h:287: * of drx_buf can use the name DRMGR_PRIORITY_NAME_DRX_BUF in the You need # to get a doxygen link ("#DRMGR_PRIORITY_NAME_DRX_BUF") https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx.h#newcode293 ext/drx/drx.h:293: /** Priority of drx_buf thread init */ thread init event. https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx.h#newcode295 ext/drx/drx.h:295: /** Priority of drx_buf thread exit */ thread exit event. https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx.h#newcode311 ext/drx/drx.h:311: * \p DRX_BUF_FAST_CIRCULAR_BUFSZ bytes is specially optimized for performance. # https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx_buf.c File ext/drx/drx_buf.c (right): https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx_buf.c#newcode54 ext/drx/drx_buf.c:54: DR_CIRCULAR_FAST, DRX_BUF_ seems like a better prefix than DR_ https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx_buf.c#newcode279 ext/drx/drx_buf.c:279: drx_buf_t *buf = drvector_get_entry(&clients, i); Missing lock https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx_buf.c#newcode296 ext/drx/drx_buf.c:296: drx_buf_t *buf = drvector_get_entry(&clients, i); Missing lock https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx_buf.c#newcode562 ext/drx/drx_buf.c:562: ilist, where, &first, &last); We need the loop for first & last, right? (Share w/ X86?) https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx_buf.c#newcode606 ext/drx/drx_buf.c:606: DR_ASSERT(false); How about having a bool return value? https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx_buf.c#newcode670 ext/drx/drx_buf.c:670: buf = drvector_get_entry(&clients, i); Yes, missing lock
Sign in to reply to this message.
Done w/ all, however before I upload the patchset I want to resolve my question at https://github.com/DynamoRIO/dynamorio/issues/1982 (it's not a blocker). That is the last issue I need to work through to get the tests to pass on ARM, so this coming patchset will include the review concerns mentioned above, but also a fully tested ARM implementation. https://codereview.appspot.com/297700043/diff/80001/api/docs/release.dox File api/docs/release.dox (right): https://codereview.appspot.com/297700043/diff/80001/api/docs/release.dox#newc... api/docs/release.dox:146: - Added drx_buf() library. On 2016/08/01 16:13:55, bruening wrote: > drx_buf is not a function name, so why ()? () is what gets you a Doxygen link > for a function but there is no such function. > > suggest: describe it, with direct links to some key routines: "Added a \ref > sec_drx_buf to drx: drx_buf_create_circular_buffer(), > drx_buf_create_trace_buffer(), and more." Done. https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx.h File ext/drx/drx.h (right): https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx.h#newcode287 ext/drx/drx.h:287: * of drx_buf can use the name DRMGR_PRIORITY_NAME_DRX_BUF in the On 2016/08/01 16:13:55, bruening wrote: > You need # to get a doxygen link ("#DRMGR_PRIORITY_NAME_DRX_BUF") Done. https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx.h#newcode293 ext/drx/drx.h:293: /** Priority of drx_buf thread init */ On 2016/08/01 16:13:55, bruening wrote: > thread init event. Done. https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx.h#newcode295 ext/drx/drx.h:295: /** Priority of drx_buf thread exit */ On 2016/08/01 16:13:55, bruening wrote: > thread exit event. Done. https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx.h#newcode311 ext/drx/drx.h:311: * \p DRX_BUF_FAST_CIRCULAR_BUFSZ bytes is specially optimized for performance. On 2016/08/01 16:13:55, bruening wrote: > # Done. https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx_buf.c File ext/drx/drx_buf.c (right): https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx_buf.c#newcode54 ext/drx/drx_buf.c:54: DR_CIRCULAR_FAST, On 2016/08/01 16:13:55, bruening wrote: > DRX_BUF_ seems like a better prefix than DR_ Done. https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx_buf.c#newcode279 ext/drx/drx_buf.c:279: drx_buf_t *buf = drvector_get_entry(&clients, i); On 2016/08/01 16:13:55, bruening wrote: > Missing lock Done. https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx_buf.c#newcode296 ext/drx/drx_buf.c:296: drx_buf_t *buf = drvector_get_entry(&clients, i); On 2016/08/01 16:13:55, bruening wrote: > Missing lock Done. https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx_buf.c#newcode562 ext/drx/drx_buf.c:562: ilist, where, &first, &last); On 2016/08/01 16:13:55, bruening wrote: > We need the loop for first & last, right? (Share w/ X86?) Why do we need the loop here for the ARM case? We're only moving an immediate into a register on ARM, which can't fault, so I moved the INSTR_XL8() code above. Previously, the loop was doing two things: setting the XL8, and also setting the instruction to be a meta instruction. ARM needed the meta, and x86 needed both. But since setting the the instruction as a meta instruction is unnecessary now, in my understanding the loop is also unnecessary on ARM. https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx_buf.c#newcode606 ext/drx/drx_buf.c:606: DR_ASSERT(false); On 2016/08/01 16:13:55, bruening wrote: > How about having a bool return value? Done. https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx_buf.c#newcode670 ext/drx/drx_buf.c:670: buf = drvector_get_entry(&clients, i); On 2016/08/01 16:13:55, bruening wrote: > Yes, missing lock Done.
Sign in to reply to this message.
https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx_buf.c File ext/drx/drx_buf.c (right): https://codereview.appspot.com/297700043/diff/80001/ext/drx/drx_buf.c#newcode562 ext/drx/drx_buf.c:562: ilist, where, &first, &last); On 2016/08/02 01:14:32, toshi wrote: > On 2016/08/01 16:13:55, bruening wrote: > > We need the loop for first & last, right? (Share w/ X86?) > > Why do we need the loop here for the ARM case? We're only moving an immediate > into a register on ARM, which can't fault, so I moved the INSTR_XL8() code > above. > > Previously, the loop was doing two things: setting the XL8, and also setting the > instruction to be a meta instruction. ARM needed the meta, and x86 needed both. > But since setting the the instruction as a meta instruction is unnecessary now, > in my understanding the loop is also unnecessary on ARM. You're right, agreed.
Sign in to reply to this message.
Commit log for latest patchset: --------------- i#513 drx_buf Part 2: Adds drx_buf code and tests Constructs a convenience API for initializing and filling three different kinds of buffers: a fast and a slow circular buffer, and a trace buffer which notifies the client when the buffer fills. Review-URL: https://codereview.appspot.com/297700043 ---------------
Sign in to reply to this message.
https://codereview.appspot.com/297700043/diff/100001/ext/drx/drx_buf.c File ext/drx/drx_buf.c (right): https://codereview.appspot.com/297700043/diff/100001/ext/drx/drx_buf.c#newcod... ext/drx/drx_buf.c:404: MINSERT(ilist, where, XINST_CREATE_load_int I noticed when submitting and looking into #1982 that on ARM XINST_CREATE_add() does not support 2-byte immediates. Lots of places in the code (notably drx_buf_insert_update_buf_ptr_fault(), which was based on the various memtrace samples) use the XINST_CREATE_add(dc, reg, imm) instruction however, so I was wondering if I should do so here. Currently the only place where a > 255 stride is used is in testing, to test the wrapping behavior. Removing the load int would break my test, but would probably bring a performance boost for all ARM users. Also I doubt anyone needs a stride that large. Should I spend the time to remove the load_int and fix my test, or should we just wait until we come up with a much more optimized sequence to revisit this? There were some suggestions in #1982 to look at too, though this sequence was just the first one that came to mind, and regardless of the sequence the question of whether to support the full 2-byte range still remains.
Sign in to reply to this message.
https://codereview.appspot.com/297700043/diff/100001/ext/drx/drx_buf.c File ext/drx/drx_buf.c (right): https://codereview.appspot.com/297700043/diff/100001/ext/drx/drx_buf.c#newcod... ext/drx/drx_buf.c:404: MINSERT(ilist, where, XINST_CREATE_load_int On 2016/08/04 00:31:19, toshi wrote: > I noticed when submitting and looking into #1982 that on ARM XINST_CREATE_add() > does not support 2-byte immediates. Lots of places in the code (notably > drx_buf_insert_update_buf_ptr_fault(), which was based on the various memtrace > samples) use the XINST_CREATE_add(dc, reg, imm) instruction however, so I was > wondering if I should do so here. > > Currently the only place where a > 255 stride is used is in testing, to test the > wrapping behavior. Removing the load int would break my test, but would probably > bring a performance boost for all ARM users. Also I doubt anyone needs a stride > that large. > > Should I spend the time to remove the load_int and fix my test, or should we > just wait until we come up with a much more optimized sequence to revisit this? > There were some suggestions in #1982 to look at too, though this sequence was > just the first one that came to mind, and regardless of the sequence the > question of whether to support the full 2-byte range still remains. How about adding a conditional on ARM that only uses load_int if stride is > 255
Sign in to reply to this message.
LGTM
Sign in to reply to this message.
https://codereview.appspot.com/297700043/diff/100001/ext/drx/drx_buf.c File ext/drx/drx_buf.c (right): https://codereview.appspot.com/297700043/diff/100001/ext/drx/drx_buf.c#newcod... ext/drx/drx_buf.c:404: MINSERT(ilist, where, XINST_CREATE_load_int On 2016/08/04 14:11:53, bruening wrote: > On 2016/08/04 00:31:19, toshi wrote: > > I noticed when submitting and looking into #1982 that on ARM > XINST_CREATE_add() > > does not support 2-byte immediates. Lots of places in the code (notably > > drx_buf_insert_update_buf_ptr_fault(), which was based on the various memtrace > > samples) use the XINST_CREATE_add(dc, reg, imm) instruction however, so I was > > wondering if I should do so here. > > > > Currently the only place where a > 255 stride is used is in testing, to test > the > > wrapping behavior. Removing the load int would break my test, but would > probably > > bring a performance boost for all ARM users. Also I doubt anyone needs a > stride > > that large. > > > > Should I spend the time to remove the load_int and fix my test, or should we > > just wait until we come up with a much more optimized sequence to revisit > this? > > There were some suggestions in #1982 to look at too, though this sequence was > > just the first one that came to mind, and regardless of the sequence the > > question of whether to support the full 2-byte range still remains. > > How about adding a conditional on ARM that only uses load_int if stride is > 255 Done.
Sign in to reply to this message.
Committed as https://github.com/DynamoRIO/dynamorio/commit/914d4df6ca324b8b37bcfec8567fe9e... Final commit log: --------------- i#513 drx_buf Part 2: Adds drx_buf code and tests Constructs a convenience API for initializing and filling three different kinds of buffers: a fast and a slow circular buffer, and a trace buffer which notifies the client when the buffer fills. Review-URL: https://codereview.appspot.com/297700043 ---------------
Sign in to reply to this message.
|