|
|
Created:
7 years, 3 months ago by toshi Modified:
7 years, 2 months ago CC:
dynamorio-devs_googlegroups.com Visibility:
Public. |
DescriptionCommit log for first patchset:
---------------
i#1834 memval: Adds memval sample
Introduces a sample client which inserts instrumentation after the
current instruction to dereference app memory, and to fill a per-thread
buffer.
Fixes #1834
---------------
Patch Set 1 #
Total comments: 66
Patch Set 2 : Addresses reviewer concerns #Patch Set 3 : Addresses optimization for gpr-sized writes only #
Total comments: 5
Patch Set 4 : Introduces correct load/store behavior in presence of E{BP,SI,DI} #
Total comments: 3
MessagesTotal messages: 23
https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c File api/samples/memval.c (right): https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode222 api/samples/memval.c:222: dr_insert_clean_call(drcontext, ilist, where, (void *)memcpy, false, 3, Should we use memcpy here? It simplifies the code, but we don't know whether it is compatible with drx_buf. If we have too many large writes in a row, faulting the buffer, will the application crash? Should we write our own drx_buf_insert_create_memcpy()?
Sign in to reply to this message.
https://codereview.appspot.com/317090043/diff/1/api/samples/CMakeLists.txt File api/samples/CMakeLists.txt (right): https://codereview.appspot.com/317090043/diff/1/api/samples/CMakeLists.txt#ne... api/samples/CMakeLists.txt:214: add_sample_client(memval "memval.c;utils.c" "drmgr;drreg;drutil;drx") nit: align to match line below https://codereview.appspot.com/317090043/diff/1/api/samples/CMakeLists.txt#ne... api/samples/CMakeLists.txt:214: add_sample_client(memval "memval.c;utils.c" "drmgr;drreg;drutil;drx") nit: group with memtrace+instrace https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c File api/samples/memval.c (right): https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode36 api/samples/memval.c:36: * Tracks all app writes and dumps them to a file. suggest: elaborate on what is being recorded+dumped: the address *and* the value https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode38 api/samples/memval.c:38: * (1) It fills two per-thread-buffers with inlined instrumentation. Explain why two (see related comments below.) https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode39 api/samples/memval.c:39: * (2) It calls a clean call to dump the buffers into a file. This doesn't seem accurate: I see a fault handler for that https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode49 api/samples/memval.c:49: * manner Isn't the key thing in this sample, getting the value post-instr, missing from this list? https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode67 api/samples/memval.c:67: app_pc addr; /* mem ref addr */ Without thinking hard, most people reading this would say "why not put the value here and have just one buffer". Explaining why not in comments would help. How hard would it be to have a var-len component? Explain how the code correlates values between the buffers (i.e., how to find the value in the other buffer for a given mem_ref_t entry in the trace buffer). https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode75 api/samples/memval.c:75: * but we give ourselves some leeway and say a write should generally be under 32 bytes. Some are quite large: e.g., OP_xsave is hundreds of bytes. So we're assuming here that the *average* size for any 4096-long consecutive sequence is <= 32 bytes. Seems reasonable for a sample: can you update the comment: "generally" should be more about the "average". https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode102 api/samples/memval.c:102: static void suggest: a comment explaining "fault" (i.e., it's not an unexpected fatal crash) https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode115 api/samples/memval.c:115: for (mem_ref = trace_base; mem_ref < trace_ptr; mem_ref++) style violation: {} for multi-line body https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode127 api/samples/memval.c:127: write_hexdump(hex_buf, write_base, mem_ref)); Use libc fprintf (see memtrace_simple.c and i#1929) https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode127 api/samples/memval.c:127: write_hexdump(hex_buf, write_base, mem_ref)); Add comments that a binary dump is *much* faster than fprintf. Should this be renamed memval_simple.c? memtrace and instrace have the _simple use text, while the _x86 uses binary by default (along with having other instru opts). https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode145 api/samples/memval.c:145: if (drreg_reserve_register(drcontext, ilist, where, NULL, ®_tmp) != DRREG_SUCCESS) { style: line too long https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode149 api/samples/memval.c:149: if (drreg_reserve_register(drcontext, ilist, where, NULL, ®_ptr) != DRREG_SUCCESS) { style: line too long https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode162 api/samples/memval.c:162: opnd_create_reg(reg_tmp), OPSZ_PTR, offsetof(mem_ref_t, addr)); style: line too long https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode165 api/samples/memval.c:165: dr_save_reg(drcontext, ilist, where, reg_tmp, slot_tmp); Not sure that this is needed, will comment down below... https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode174 api/samples/memval.c:174: OPND_CREATE_INT16(size), OPSZ_2, offsetof(mem_ref_t, size)); style: line too long https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode178 api/samples/memval.c:178: if (instr_is_call(where)) { This should be x86-only as on ARM a call writes to the link register, not memory. I guess this function won't be called on ARM: so maybe an assert or comment if not an actual ifdef? Either way. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode186 api/samples/memval.c:186: drx_buf_insert_buf_store(drcontext, trace_buffer, ilist, where, reg_ptr, DR_REG_NULL, style: line too long https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode198 api/samples/memval.c:198: dr_restore_reg(drcontext, ilist, where, reg_tmp, slot_tmp); This is unnecessary. It's only *DR* spill slots that have that prohibition: *drreg* slots are valid across app instrs. If you can think of a way to augment the drreg (or DR) docs to make this clear, please add it to the CL or send as a separate CL. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode212 api/samples/memval.c:212: if (drreg_reserve_register(drcontext, ilist, where, NULL, ®_ptr) != DRREG_SUCCESS) { style: line too long https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode220 api/samples/memval.c:220: * opt to simply use memcpy to copy the memory into the write buffer. What I would do is have fast single-store cases for the common sizes: 1, 2, 4, 8 (for 64-bit) bytes, and fall back to this only for larger or unusual sizes. Otherwise this is going to dominate the overhead. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode222 api/samples/memval.c:222: dr_insert_clean_call(drcontext, ilist, where, (void *)memcpy, false, 3, On 2017/01/12 12:12:57, toshi wrote: > Should we use memcpy here? It simplifies the code, but we don't know whether it > is compatible with drx_buf. If we have too many large writes in a row, faulting > the buffer, will the application crash? > > Should we write our own drx_buf_insert_create_memcpy()? First, this is a circular buffer, which can't handle var-len writes, right? You're risking overflow instead of wrapping around. Re: faults: do circular buffers have a redzone for faults at all? For faults in memcpy, good question: today a fault in a "client lib" results in an abort. If memcpy were inlined into the client lib rather than in a private libc, it won't work. But, it seems like we should give the client a chance to handle all faults, even in its own lib -- xref i#50. The risk is that if the client state is compromised it might flail while we try to call its event handler. Prob still worth it: I would say let's put in calling of the handler (separate CL). https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode292 api/samples/memval.c:292: * assume no instruction has multiple distinct memory operands. AVX2's VSIB unfortunately breaks this: it can reference 4 different doubles scattered around memory. In DR we decided not to try and split it in some way. Dr. Memory does not yet handle them (https://github.com/DynamoRIO/drmemory/issues/1375) so fine to punt here, but at least talk about them with a XXX comment. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode292 api/samples/memval.c:292: * assume no instruction has multiple distinct memory operands. You mean "no instruction has multiple distinct *stores*", right? Certainly several have a load and a store (push, movs, etc.) https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode317 api/samples/memval.c:317: * instrumentation after the last instruction in a basic block. We don't need this if we just have a check for drmgr_is_last_instr() that unreserves the reg, right? https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode378 api/samples/memval.c:378: if(drreg_init(&ops) != DRREG_SUCCESS) style: "if (" https://codereview.appspot.com/317090043/diff/1/ext/drx/drx.h File ext/drx/drx.h (right): https://codereview.appspot.com/317090043/diff/1/ext/drx/drx.h#newcode441 ext/drx/drx.h:441: * instrumentation after every instruction. See earlier comment: I don't think we need this
Sign in to reply to this message.
Responded to your immediate questions, and will mark Done to the style issues and comment requests soon. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c File api/samples/memval.c (right): https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode165 api/samples/memval.c:165: dr_save_reg(drcontext, ilist, where, reg_tmp, slot_tmp); On 2017/01/12 20:31:49, bruening wrote: > Not sure that this is needed, will comment down below... Acknowledged. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode198 api/samples/memval.c:198: dr_restore_reg(drcontext, ilist, where, reg_tmp, slot_tmp); On 2017/01/12 20:31:50, bruening wrote: > This is unnecessary. It's only *DR* spill slots that have that prohibition: > *drreg* slots are valid across app instrs. > > If you can think of a way to augment the drreg (or DR) docs to make this clear, > please add it to the CL or send as a separate CL. reg_tmp is a drreg-reserved register, so yeah it's value is persisted across app instructions. But, the problem shown here is that, after we call drutil_insert_get_mem_addr(), which loads the write address into reg_tmp, we subsequently clobber it later when loading type/size. However, we need that value in a reg to persist till after the current instruction. That was the motivation for performing this spill/restore. I also considered re-calling drutil_insert_get_mem_addr(), but it's also non-trivial to do so because the reserved registers could possibly be the same as the ones used in the memref, which have since been clobbered. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode220 api/samples/memval.c:220: * opt to simply use memcpy to copy the memory into the write buffer. On 2017/01/12 20:31:50, bruening wrote: > What I would do is have fast single-store cases for the common sizes: 1, 2, 4, 8 > (for 64-bit) bytes, and fall back to this only for larger or unusual sizes. > Otherwise this is going to dominate the overhead. Can do. Should I add a method in drx.c? https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode222 api/samples/memval.c:222: dr_insert_clean_call(drcontext, ilist, where, (void *)memcpy, false, 3, On 2017/01/12 20:31:49, bruening wrote: > On 2017/01/12 12:12:57, toshi wrote: > > Should we use memcpy here? It simplifies the code, but we don't know whether > it > > is compatible with drx_buf. If we have too many large writes in a row, > faulting > > the buffer, will the application crash? > > > > Should we write our own drx_buf_insert_create_memcpy()? > > First, this is a circular buffer, which can't handle var-len writes, right? > You're risking overflow instead of wrapping around. The circular buffer, without the 16K buffer size optimization uses faults as well. However, I highly doubt that if memcpy() goes over, we'd be able to handle it, as our handler expects some faulting instruction of the form mov $imm/%reg, (%base + $disp). I'm assuming memcpy() does some sort of word copy, i.e. *p++ = *q++, which probably wouldn't produce the necessary instrumentation for us to catch it. Also, circular buffer does allow variable-length writes. > > Re: faults: do circular buffers have a redzone for faults at all? > > For faults in memcpy, good question: today a fault in a "client lib" results in > an abort. If memcpy were inlined into the client lib rather than in a private > libc, it won't work. But, it seems like we should give the client a chance to > handle all faults, even in its own lib -- xref i#50. The risk is that if the > client state is compromised it might flail while we try to call its event > handler. Prob still worth it: I would say let's put in calling of the handler > (separate CL). Acknowledged. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode292 api/samples/memval.c:292: * assume no instruction has multiple distinct memory operands. On 2017/01/12 20:31:49, bruening wrote: > AVX2's VSIB unfortunately breaks this: it can reference 4 different doubles > scattered around memory. In DR we decided not to try and split it in some way. > Dr. Memory does not yet handle them > (https://github.com/DynamoRIO/drmemory/issues/1375) so fine to punt here, but at > least talk about them with a XXX comment. Understood. If it's not worth it, I won't pursue it. Though, one way this would be possible to implement would be to read the last few elements from the trace_buffer (as many as there are distinct dest operands) and extract the write address from each of them, then perform the memcpy per operand. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode317 api/samples/memval.c:317: * instrumentation after the last instruction in a basic block. On 2017/01/12 20:31:49, bruening wrote: > We don't need this if we just have a check for drmgr_is_last_instr() that > unreserves the reg, right? If I understand correctly, instead of adding a NOP, we would just always unreserve all outstanding registers on the last instruction, right? Usually, this would work fine, since a terminating instruction exists for most basic blocks. But I've been bitten by this where there was a basic block that was not terminated by any special instruction -- it was a simple move that wrote memory. If we blindly unreserved the register, then we'd never write it's memref to disk (and we can get other insidious errors because the trace buffer has an entry that's not in the write buffer).
Sign in to reply to this message.
https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c File api/samples/memval.c (right): https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode198 api/samples/memval.c:198: dr_restore_reg(drcontext, ilist, where, reg_tmp, slot_tmp); On 2017/01/12 21:33:20, toshi wrote: > On 2017/01/12 20:31:50, bruening wrote: > > This is unnecessary. It's only *DR* spill slots that have that prohibition: > > *drreg* slots are valid across app instrs. > > > > If you can think of a way to augment the drreg (or DR) docs to make this > clear, > > please add it to the CL or send as a separate CL. > > reg_tmp is a drreg-reserved register, so yeah it's value is persisted across app > instructions. But, the problem shown here is that, after we call > drutil_insert_get_mem_addr(), which loads the write address into reg_tmp, we > subsequently clobber it later when loading type/size. However, we need that > value in a reg to persist till after the current instruction. That was the > motivation for performing this spill/restore. > > I also considered re-calling drutil_insert_get_mem_addr(), but it's also > non-trivial to do so because the reserved registers could possibly be the same > as the ones used in the memref, which have since been clobbered. I see. It may be more efficient to ask drreg for a 3rd reg and copy the address there, as drreg will avoid extra spill+restore memrefs if there are dead regs. reg_tmp is only clobbered for ARM, right? It would be nice to automate the ARM-vs-x86 scratch w/o ifdefs here, but that's not easy to do w/o moving some drreg reservations inside drx_buf which is a whole different approach from the current library register model. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode220 api/samples/memval.c:220: * opt to simply use memcpy to copy the memory into the write buffer. On 2017/01/12 21:33:20, toshi wrote: > On 2017/01/12 20:31:50, bruening wrote: > > What I would do is have fast single-store cases for the common sizes: 1, 2, 4, > 8 > > (for 64-bit) bytes, and fall back to this only for larger or unusual sizes. > > Otherwise this is going to dominate the overhead. > > Can do. Should I add a method in drx.c? Hmm, since it hasn't come up before maybe just here instead of a library routine. OTOH maybe that would be useful. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode222 api/samples/memval.c:222: dr_insert_clean_call(drcontext, ilist, where, (void *)memcpy, false, 3, On 2017/01/12 21:33:20, toshi wrote: > On 2017/01/12 20:31:49, bruening wrote: > > On 2017/01/12 12:12:57, toshi wrote: > > > Should we use memcpy here? It simplifies the code, but we don't know whether > > it > > > is compatible with drx_buf. If we have too many large writes in a row, > > faulting > > > the buffer, will the application crash? > > > > > > Should we write our own drx_buf_insert_create_memcpy()? > > > > First, this is a circular buffer, which can't handle var-len writes, right? > > You're risking overflow instead of wrapping around. > > The circular buffer, without the 16K buffer size optimization uses faults as > well. However, I highly doubt that if memcpy() goes over, we'd be able to handle > it, as our handler expects some faulting instruction of the form mov $imm/%reg, > (%base + $disp). I'm assuming memcpy() does some sort of word copy, i.e. *p++ = > *q++, which probably wouldn't produce the necessary instrumentation for us to > catch it. Right, that's another issue with memcpy and the fault: we'd have to relax the fault handler pattern. OK, so this is why you suggested a drx_buf_*_memcpy. Makes more sense now. Seems reasonable to add it. > Also, circular buffer does allow variable-length writes. For the 16K wraparound how does it handle a single var-len store that goes off the end? It only wraps on the buf ptr update, not within each write. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode317 api/samples/memval.c:317: * instrumentation after the last instruction in a basic block. On 2017/01/12 21:33:20, toshi wrote: > On 2017/01/12 20:31:49, bruening wrote: > > We don't need this if we just have a check for drmgr_is_last_instr() that > > unreserves the reg, right? > > If I understand correctly, instead of adding a NOP, we would just always > unreserve all outstanding registers on the last instruction, right? > > Usually, this would work fine, since a terminating instruction exists for most > basic blocks. But I've been bitten by this where there was a basic block that > was not terminated by any special instruction -- it was a simple move that wrote > memory. If we blindly unreserved the register, then we'd never write it's memref > to disk (and we can get other insidious errors because the trace buffer has an > entry that's not in the write buffer). Yes, right. So either you have to specially post-insert when on the last instr, or you insert sthg -- but can you just insert a label instead of a nop? The is-app check is afterward (document that users of the drx routine need that ordering). Then there's zero cost in executed instrs.
Sign in to reply to this message.
Done w/ all except the discussion on drx_buf_*_memcpy(). I'll write that up within the next few patchsets. https://codereview.appspot.com/317090043/diff/1/api/samples/CMakeLists.txt File api/samples/CMakeLists.txt (right): https://codereview.appspot.com/317090043/diff/1/api/samples/CMakeLists.txt#ne... api/samples/CMakeLists.txt:214: add_sample_client(memval "memval.c;utils.c" "drmgr;drreg;drutil;drx") On 2017/01/12 20:31:49, bruening wrote: > nit: align to match line below Done. https://codereview.appspot.com/317090043/diff/1/api/samples/CMakeLists.txt#ne... api/samples/CMakeLists.txt:214: add_sample_client(memval "memval.c;utils.c" "drmgr;drreg;drutil;drx") On 2017/01/12 20:31:49, bruening wrote: > nit: group with memtrace+instrace Done. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c File api/samples/memval.c (right): https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode36 api/samples/memval.c:36: * Tracks all app writes and dumps them to a file. On 2017/01/12 20:31:50, bruening wrote: > suggest: elaborate on what is being recorded+dumped: the address *and* the value Done. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode38 api/samples/memval.c:38: * (1) It fills two per-thread-buffers with inlined instrumentation. On 2017/01/12 20:31:49, bruening wrote: > Explain why two (see related comments below.) Done. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode39 api/samples/memval.c:39: * (2) It calls a clean call to dump the buffers into a file. On 2017/01/12 20:31:49, bruening wrote: > This doesn't seem accurate: I see a fault handler for that Done. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode49 api/samples/memval.c:49: * manner On 2017/01/12 20:31:49, bruening wrote: > Isn't the key thing in this sample, getting the value post-instr, missing from > this list? Done. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode67 api/samples/memval.c:67: app_pc addr; /* mem ref addr */ On 2017/01/12 20:31:50, bruening wrote: > Without thinking hard, most people reading this would say "why not put the value > here and have just one buffer". Explaining why not in comments would help. > How hard would it be to have a var-len component? > > Explain how the code correlates values between the buffers (i.e., how to find > the value in the other buffer for a given mem_ref_t entry in the trace buffer). Right now, I believe the only reason why we cannot use a single buffer is because we don't know how well memcpy() will work with drx_buf(). If and when we get around to writing drx_buf_*_memcpy(), would you think it a good idea to consolidate the buffers? https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode75 api/samples/memval.c:75: * but we give ourselves some leeway and say a write should generally be under 32 bytes. On 2017/01/12 20:31:49, bruening wrote: > Some are quite large: e.g., OP_xsave is hundreds of bytes. So we're assuming > here that the *average* size for any 4096-long consecutive sequence is <= 32 > bytes. Seems reasonable for a sample: can you update the comment: "generally" > should be more about the "average". Done. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode102 api/samples/memval.c:102: static void On 2017/01/12 20:31:50, bruening wrote: > suggest: a comment explaining "fault" (i.e., it's not an unexpected fatal crash) Done. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode115 api/samples/memval.c:115: for (mem_ref = trace_base; mem_ref < trace_ptr; mem_ref++) On 2017/01/12 20:31:50, bruening wrote: > style violation: {} for multi-line body Done. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode127 api/samples/memval.c:127: write_hexdump(hex_buf, write_base, mem_ref)); On 2017/01/12 20:31:49, bruening wrote: > Use libc fprintf (see memtrace_simple.c and i#1929) Done. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode127 api/samples/memval.c:127: write_hexdump(hex_buf, write_base, mem_ref)); On 2017/01/12 20:31:49, bruening wrote: > Add comments that a binary dump is *much* faster than fprintf. > > Should this be renamed memval_simple.c? memtrace and instrace have the _simple > use text, while the _x86 uses binary by default (along with having other instru > opts). Done. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode145 api/samples/memval.c:145: if (drreg_reserve_register(drcontext, ilist, where, NULL, ®_tmp) != DRREG_SUCCESS) { On 2017/01/12 20:31:50, bruening wrote: > style: line too long Done. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode149 api/samples/memval.c:149: if (drreg_reserve_register(drcontext, ilist, where, NULL, ®_ptr) != DRREG_SUCCESS) { On 2017/01/12 20:31:49, bruening wrote: > style: line too long Done. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode162 api/samples/memval.c:162: opnd_create_reg(reg_tmp), OPSZ_PTR, offsetof(mem_ref_t, addr)); On 2017/01/12 20:31:50, bruening wrote: > style: line too long Done. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode174 api/samples/memval.c:174: OPND_CREATE_INT16(size), OPSZ_2, offsetof(mem_ref_t, size)); On 2017/01/12 20:31:50, bruening wrote: > style: line too long Done. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode178 api/samples/memval.c:178: if (instr_is_call(where)) { On 2017/01/12 20:31:50, bruening wrote: > This should be x86-only as on ARM a call writes to the link register, not > memory. I guess this function won't be called on ARM: so maybe an assert or > comment if not an actual ifdef? Either way. Done. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode186 api/samples/memval.c:186: drx_buf_insert_buf_store(drcontext, trace_buffer, ilist, where, reg_ptr, DR_REG_NULL, On 2017/01/12 20:31:50, bruening wrote: > style: line too long Done. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode198 api/samples/memval.c:198: dr_restore_reg(drcontext, ilist, where, reg_tmp, slot_tmp); On 2017/01/12 23:11:01, bruening wrote: > On 2017/01/12 21:33:20, toshi wrote: > > On 2017/01/12 20:31:50, bruening wrote: > > > This is unnecessary. It's only *DR* spill slots that have that prohibition: > > > *drreg* slots are valid across app instrs. > > > > > > If you can think of a way to augment the drreg (or DR) docs to make this > > clear, > > > please add it to the CL or send as a separate CL. > > > > reg_tmp is a drreg-reserved register, so yeah it's value is persisted across > app > > instructions. But, the problem shown here is that, after we call > > drutil_insert_get_mem_addr(), which loads the write address into reg_tmp, we > > subsequently clobber it later when loading type/size. However, we need that > > value in a reg to persist till after the current instruction. That was the > > motivation for performing this spill/restore. > > > > I also considered re-calling drutil_insert_get_mem_addr(), but it's also > > non-trivial to do so because the reserved registers could possibly be the same > > as the ones used in the memref, which have since been clobbered. > > I see. It may be more efficient to ask drreg for a 3rd reg and copy the address > there, as drreg will avoid extra spill+restore memrefs if there are dead regs. > > reg_tmp is only clobbered for ARM, right? > > It would be nice to automate the ARM-vs-x86 scratch w/o ifdefs here, but that's > not easy to do w/o moving some drreg reservations inside drx_buf which is a > whole different approach from the current library register model. Done. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode212 api/samples/memval.c:212: if (drreg_reserve_register(drcontext, ilist, where, NULL, ®_ptr) != DRREG_SUCCESS) { On 2017/01/12 20:31:50, bruening wrote: > style: line too long Done. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode220 api/samples/memval.c:220: * opt to simply use memcpy to copy the memory into the write buffer. On 2017/01/12 23:11:01, bruening wrote: > On 2017/01/12 21:33:20, toshi wrote: > > On 2017/01/12 20:31:50, bruening wrote: > > > What I would do is have fast single-store cases for the common sizes: 1, 2, > 4, > > 8 > > > (for 64-bit) bytes, and fall back to this only for larger or unusual sizes. > > > Otherwise this is going to dominate the overhead. > > > > Can do. Should I add a method in drx.c? > > Hmm, since it hasn't come up before maybe just here instead of a library > routine. OTOH maybe that would be useful. Acknowledged, see comment below. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode222 api/samples/memval.c:222: dr_insert_clean_call(drcontext, ilist, where, (void *)memcpy, false, 3, On 2017/01/12 23:11:01, bruening wrote: > On 2017/01/12 21:33:20, toshi wrote: > > On 2017/01/12 20:31:49, bruening wrote: > > > On 2017/01/12 12:12:57, toshi wrote: > > > > Should we use memcpy here? It simplifies the code, but we don't know > whether > > > it > > > > is compatible with drx_buf. If we have too many large writes in a row, > > > faulting > > > > the buffer, will the application crash? > > > > > > > > Should we write our own drx_buf_insert_create_memcpy()? > > > > > > First, this is a circular buffer, which can't handle var-len writes, right? > > > You're risking overflow instead of wrapping around. > > > > The circular buffer, without the 16K buffer size optimization uses faults as > > well. However, I highly doubt that if memcpy() goes over, we'd be able to > handle > > it, as our handler expects some faulting instruction of the form mov > $imm/%reg, > > (%base + $disp). I'm assuming memcpy() does some sort of word copy, i.e. *p++ > = > > *q++, which probably wouldn't produce the necessary instrumentation for us to > > catch it. > > Right, that's another issue with memcpy and the fault: we'd have to relax the > fault handler pattern. OK, so this is why you suggested a drx_buf_*_memcpy. > Makes more sense now. Seems reasonable to add it. Gotcha, will add drx_buf_*_memcpy(). This will also potentially have the optimization for common write sizes mentioned above (2, 4, 8 etc). > > > Also, circular buffer does allow variable-length writes. > > For the 16K wraparound how does it handle a single var-len store that goes off > the end? It only wraps on the buf ptr update, not within each write. Yup, we were thinking of adding some slack to the 16K buffer, but we also admonish in the docs that you should only write fixed-length elements whose sizes are all divisors of the 16K buffer size, so I didn't add any slack region. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode292 api/samples/memval.c:292: * assume no instruction has multiple distinct memory operands. On 2017/01/12 20:31:50, bruening wrote: > You mean "no instruction has multiple distinct *stores*", right? Certainly > several have a load and a store (push, movs, etc.) Done. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode317 api/samples/memval.c:317: * instrumentation after the last instruction in a basic block. On 2017/01/12 23:11:01, bruening wrote: > On 2017/01/12 21:33:20, toshi wrote: > > On 2017/01/12 20:31:49, bruening wrote: > > > We don't need this if we just have a check for drmgr_is_last_instr() that > > > unreserves the reg, right? > > > > If I understand correctly, instead of adding a NOP, we would just always > > unreserve all outstanding registers on the last instruction, right? > > > > Usually, this would work fine, since a terminating instruction exists for most > > basic blocks. But I've been bitten by this where there was a basic block that > > was not terminated by any special instruction -- it was a simple move that > wrote > > memory. If we blindly unreserved the register, then we'd never write it's > memref > > to disk (and we can get other insidious errors because the trace buffer has an > > entry that's not in the write buffer). > > Yes, right. So either you have to specially post-insert when on the last instr, > or you insert sthg -- but can you just insert a label instead of a nop? The > is-app check is afterward (document that users of the drx routine need that > ordering). Then there's zero cost in executed instrs. Done, great idea. https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode378 api/samples/memval.c:378: if(drreg_init(&ops) != DRREG_SUCCESS) On 2017/01/12 20:31:49, bruening wrote: > style: "if (" Done.
Sign in to reply to this message.
Commit log for latest patchset: --------------- i#1834 memval: Adds memval sample Introduces a sample client which inserts instrumentation after the current instruction to dereference app memory, and to fill a per-thread buffer. Fixes #1834 Review-URL: https://codereview.appspot.com/317090043 ---------------
Sign in to reply to this message.
https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c File api/samples/memval.c (right): https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode222 api/samples/memval.c:222: dr_insert_clean_call(drcontext, ilist, where, (void *)memcpy, false, 3, On 2017/01/13 09:53:44, toshi wrote: > On 2017/01/12 23:11:01, bruening wrote: > > On 2017/01/12 21:33:20, toshi wrote: > > > On 2017/01/12 20:31:49, bruening wrote: > > > > On 2017/01/12 12:12:57, toshi wrote: > > > > > Should we use memcpy here? It simplifies the code, but we don't know > > whether > > > > it > > > > > is compatible with drx_buf. If we have too many large writes in a row, > > > > faulting > > > > > the buffer, will the application crash? > > > > > > > > > > Should we write our own drx_buf_insert_create_memcpy()? > > > > > > > > First, this is a circular buffer, which can't handle var-len writes, > right? > > > > You're risking overflow instead of wrapping around. > > > > > > The circular buffer, without the 16K buffer size optimization uses faults as > > > well. However, I highly doubt that if memcpy() goes over, we'd be able to > > handle > > > it, as our handler expects some faulting instruction of the form mov > > $imm/%reg, > > > (%base + $disp). I'm assuming memcpy() does some sort of word copy, i.e. > *p++ > > = > > > *q++, which probably wouldn't produce the necessary instrumentation for us > to > > > catch it. > > > > Right, that's another issue with memcpy and the fault: we'd have to relax the > > fault handler pattern. OK, so this is why you suggested a drx_buf_*_memcpy. > > Makes more sense now. Seems reasonable to add it. > > Gotcha, will add drx_buf_*_memcpy(). This will also potentially have the > optimization for common write sizes mentioned above (2, 4, 8 etc). > > > > > > Also, circular buffer does allow variable-length writes. > > > > For the 16K wraparound how does it handle a single var-len store that goes off > > the end? It only wraps on the buf ptr update, not within each write. > > Yup, we were thinking of adding some slack to the 16K buffer, but we also > admonish in the docs that you should only write fixed-length elements whose > sizes are all divisors of the 16K buffer size, so I didn't add any slack region. So you're retracting your comment "circular buffer does allow variable-length writes" :) ? Didn't look at new patchset yet, presumably it does not use a circular anymore.
Sign in to reply to this message.
https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c File api/samples/memval.c (right): https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode222 api/samples/memval.c:222: dr_insert_clean_call(drcontext, ilist, where, (void *)memcpy, false, 3, On 2017/01/13 17:26:59, bruening wrote: > On 2017/01/13 09:53:44, toshi wrote: > > On 2017/01/12 23:11:01, bruening wrote: > > > On 2017/01/12 21:33:20, toshi wrote: > > > > On 2017/01/12 20:31:49, bruening wrote: > > > > > On 2017/01/12 12:12:57, toshi wrote: > > > > > > Should we use memcpy here? It simplifies the code, but we don't know > > > whether > > > > > it > > > > > > is compatible with drx_buf. If we have too many large writes in a row, > > > > > faulting > > > > > > the buffer, will the application crash? > > > > > > > > > > > > Should we write our own drx_buf_insert_create_memcpy()? > > > > > > > > > > First, this is a circular buffer, which can't handle var-len writes, > > right? > > > > > You're risking overflow instead of wrapping around. > > > > > > > > The circular buffer, without the 16K buffer size optimization uses faults > as > > > > well. However, I highly doubt that if memcpy() goes over, we'd be able to > > > handle > > > > it, as our handler expects some faulting instruction of the form mov > > > $imm/%reg, > > > > (%base + $disp). I'm assuming memcpy() does some sort of word copy, i.e. > > *p++ > > > = > > > > *q++, which probably wouldn't produce the necessary instrumentation for us > > to > > > > catch it. > > > > > > Right, that's another issue with memcpy and the fault: we'd have to relax > the > > > fault handler pattern. OK, so this is why you suggested a drx_buf_*_memcpy. > > > > Makes more sense now. Seems reasonable to add it. > > > > Gotcha, will add drx_buf_*_memcpy(). This will also potentially have the > > optimization for common write sizes mentioned above (2, 4, 8 etc). > > > > > > > > > Also, circular buffer does allow variable-length writes. > > > > > > For the 16K wraparound how does it handle a single var-len store that goes > off > > > the end? It only wraps on the buf ptr update, not within each write. > > > > Yup, we were thinking of adding some slack to the 16K buffer, but we also > > admonish in the docs that you should only write fixed-length elements whose > > sizes are all divisors of the 16K buffer size, so I didn't add any slack > region. > > So you're retracting your comment "circular buffer does allow variable-length > writes" :) ? Didn't look at new patchset yet, presumably it does not use a > circular anymore. Sent too soon. Further discussion with myself to be sure I understand: so we both agree 16K circular buffer does not handle var-len. But for non-16K circular buffer that uses faults: for a single-instr var-len write, the fault handling code re-executes the store with the buf ptr set back to the top? So it's up to the user to know where the final write at the bottom ends (could be a gap) -- which could work here if the code reading the buffer checks for the end and moves to the start if it doesn't fit. But for memcpy with multiple stores, it's going to split and wrap around. I guess that can be made to work too, if the code here checks for the end and merges together the two pieces. But imagine a 4-byte gap at the bottom, and we want to write 8 bytes. How does the user know whether the memcpy optimized and did a single 8-byte store which faulted and got moved (leaving a gap) or put 4 bytes at the bottom and 4 at the top? The fault handler has to be consistent I guess.
Sign in to reply to this message.
On 2017/01/13 17:35:50, bruening wrote: > https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c > File api/samples/memval.c (right): > > https://codereview.appspot.com/317090043/diff/1/api/samples/memval.c#newcode222 > api/samples/memval.c:222: dr_insert_clean_call(drcontext, ilist, where, (void > *)memcpy, false, 3, > On 2017/01/13 17:26:59, bruening wrote: > > On 2017/01/13 09:53:44, toshi wrote: > > > On 2017/01/12 23:11:01, bruening wrote: > > > > On 2017/01/12 21:33:20, toshi wrote: > > > > > On 2017/01/12 20:31:49, bruening wrote: > > > > > > On 2017/01/12 12:12:57, toshi wrote: > > > > > > > Should we use memcpy here? It simplifies the code, but we don't know > > > > whether > > > > > > it > > > > > > > is compatible with drx_buf. If we have too many large writes in a > row, > > > > > > faulting > > > > > > > the buffer, will the application crash? > > > > > > > > > > > > > > Should we write our own drx_buf_insert_create_memcpy()? > > > > > > > > > > > > First, this is a circular buffer, which can't handle var-len writes, > > > right? > > > > > > You're risking overflow instead of wrapping around. > > > > > > > > > > The circular buffer, without the 16K buffer size optimization uses > faults > > as > > > > > well. However, I highly doubt that if memcpy() goes over, we'd be able > to > > > > handle > > > > > it, as our handler expects some faulting instruction of the form mov > > > > $imm/%reg, > > > > > (%base + $disp). I'm assuming memcpy() does some sort of word copy, i.e. > > > *p++ > > > > = > > > > > *q++, which probably wouldn't produce the necessary instrumentation for > us > > > to > > > > > catch it. > > > > > > > > Right, that's another issue with memcpy and the fault: we'd have to relax > > the > > > > fault handler pattern. OK, so this is why you suggested a > drx_buf_*_memcpy. > > > > > > Makes more sense now. Seems reasonable to add it. > > > > > > Gotcha, will add drx_buf_*_memcpy(). This will also potentially have the > > > optimization for common write sizes mentioned above (2, 4, 8 etc). > > > > > > > > > > > > Also, circular buffer does allow variable-length writes. > > > > > > > > For the 16K wraparound how does it handle a single var-len store that goes > > off > > > > the end? It only wraps on the buf ptr update, not within each write. > > > > > > Yup, we were thinking of adding some slack to the 16K buffer, but we also > > > admonish in the docs that you should only write fixed-length elements whose > > > sizes are all divisors of the 16K buffer size, so I didn't add any slack > > region. > > > > So you're retracting your comment "circular buffer does allow variable-length > > writes" :) ? Didn't look at new patchset yet, presumably it does not use a > > circular anymore. > > Sent too soon. Further discussion with myself to be sure I understand: so we > both agree 16K circular buffer does not handle var-len. But for non-16K > circular buffer that uses faults: for a single-instr var-len write, the fault > handling code re-executes the store with the buf ptr set back to the top? So > it's up to the user to know where the final write at the bottom ends (could be a > gap) -- which could work here if the code reading the buffer checks for the end > and moves to the start if it doesn't fit. > > But for memcpy with multiple stores, it's going to split and wrap around. I > guess that can be made to work too, if the code here checks for the end and > merges together the two pieces. > > But imagine a 4-byte gap at the bottom, and we want to write 8 bytes. How does > the user know whether the memcpy optimized and did a single 8-byte store which > faulted and got moved (leaving a gap) or put 4 bytes at the bottom and 4 at the > top? The fault handler has to be consistent I guess. Right, we're using the faulting buffer which allows var-length writes. For the fast circular buffer we advise not to. And yes, I think memcpy() would have this problem -- in our implementation of drx_buf_memcpy(), we would instead have to incrementally update the buffer pointer each iteration of the loop, as a sort of "commit" operation to let the fault handler know where the last partial write ended. That way we would know exactly where the partial write left off, and we could fix it in the handler. example instrumentation for drx_buf_memcpy(): loop: test ecx, ecx jz done ; edx holds write_addr ; ebx holds buf_ptr mov eax, [edx + ecx*4] mov [ebx], eax ; instrumentation for drx_insert_update_buf_ptr() add ebx, 4 mov tls_slot, ebx ; acts as a "commit" operation to write back to TLS jmp loop done: ...
Sign in to reply to this message.
Commit log for latest patchset: --------------- i#1834 memval: Adds memval sample Introduces a sample client which inserts instrumentation after the current instruction to dereference app memory, and to fill a per-thread buffer. Fixes #1834 Review-URL: https://codereview.appspot.com/317090043 ---------------
Sign in to reply to this message.
Patch set message says it all, I implemented the optimization for gpr-sized writes instead of performing the memcpy blindly. However, I have not yet implemented the IR-built memcpy() that we discussed in the previous mail chain. I also have some other questions, below. https://codereview.appspot.com/317090043/diff/40001/ext/drx/drx_buf.c File ext/drx/drx_buf.c (right): https://codereview.appspot.com/317090043/diff/40001/ext/drx/drx_buf.c#newcode653 ext/drx/drx_buf.c:653: insert_load(void *drcontext, instrlist_t *ilist, instr_t *where, I've come into this problem a few times before, where I wanted a generic load/store that worked on any gpr. Maybe in a separate CL we change drx_buf_insert_buf_store() to drx_insert_store(), and make drx_buf_insert_buf_store() just redirect to drx_insert_store() so there's no API change. Conceivably we could also write a drx_insert_load() with insert_load() as the base (though it needs a bit of work done on it, as it's currently tailored to fit our needs as of now). https://codereview.appspot.com/317090043/diff/40001/ext/drx/drx_buf.c#newcode695 ext/drx/drx_buf.c:695: /* This could happen if, for example we tried to resize the base Is this a good solution? Should we instead exclude ebp and some other registers from drreg()? Not sure which is worse for performance: drreg excluding some registers in 32-bit and thus thrashing trying to spill/restore registers, or drreg allowing ebp and us having to fall back onto the clean call for many (how many?) 1-byte writes.
Sign in to reply to this message.
https://codereview.appspot.com/317090043/diff/40001/ext/drx/drx_buf.c File ext/drx/drx_buf.c (right): https://codereview.appspot.com/317090043/diff/40001/ext/drx/drx_buf.c#newcode695 ext/drx/drx_buf.c:695: /* This could happen if, for example we tried to resize the base On 2017/01/15 07:24:34, toshi wrote: > Is this a good solution? Should we instead exclude ebp and some other registers > from drreg()? Not sure which is worse for performance: drreg excluding some > registers in 32-bit and thus thrashing trying to spill/restore registers, or > drreg allowing ebp and us having to fall back onto the clean call for many (how > many?) 1-byte writes. This is actually a problem with drx_buf_insert_buf_store() as well. No client that I know of uses XINST_CREATE_store_1byte(), so I've never run into this problem before. Should we put this burden on the client to explicitly disallow reserving esi, edi and ebp? This would fix all our problems, but in that case maybe we should create a method drreg_init_vector_at_least_size() which takes an opnd_size_t and will fill a drvector with only registers which can be resized to that operand size (a no-op on AArchXX). This way clients only hit a potential slowdown when they know that they require a register which can be resized to it's 1-byte counterpart. I can do the above in a separate CL before moving on here, if need be.
Sign in to reply to this message.
https://codereview.appspot.com/317090043/diff/40001/ext/drx/drx_buf.c File ext/drx/drx_buf.c (right): https://codereview.appspot.com/317090043/diff/40001/ext/drx/drx_buf.c#newcode653 ext/drx/drx_buf.c:653: insert_load(void *drcontext, instrlist_t *ilist, instr_t *where, On 2017/01/15 07:24:34, toshi wrote: > I've come into this problem a few times before, where I wanted a generic > load/store that worked on any gpr. Maybe in a separate CL we change > drx_buf_insert_buf_store() to drx_insert_store(), and make > drx_buf_insert_buf_store() just redirect to drx_insert_store() so there's no API > change. > > Conceivably we could also write a drx_insert_load() with insert_load() as the > base (though it needs a bit of work done on it, as it's currently tailored to > fit our needs as of now). For a load, see below re: movzx. https://codereview.appspot.com/317090043/diff/40001/ext/drx/drx_buf.c#newcode695 ext/drx/drx_buf.c:695: /* This could happen if, for example we tried to resize the base On 2017/01/16 01:55:43, toshi wrote: > On 2017/01/15 07:24:34, toshi wrote: > > Is this a good solution? Should we instead exclude ebp and some other > registers > > from drreg()? Not sure which is worse for performance: drreg excluding some > > registers in 32-bit and thus thrashing trying to spill/restore registers, or > > drreg allowing ebp and us having to fall back onto the clean call for many > (how > > many?) 1-byte writes. > > This is actually a problem with drx_buf_insert_buf_store() as well. No client > that I know of uses XINST_CREATE_store_1byte(), so I've never run into this > problem before. > > Should we put this burden on the client to explicitly disallow reserving esi, > edi and ebp? This would fix all our problems, but in that case maybe we should > create a method drreg_init_vector_at_least_size() which takes an opnd_size_t and > will fill a drvector with only registers which can be resized to that operand > size (a no-op on AArchXX). This way clients only hit a potential slowdown when > they know that they require a register which can be resized to it's 1-byte > counterpart. > > I can do the above in a separate CL before moving on here, if need be. For a load, often the simplest solution is to use movzx to load 1 byte into the full register. This matches the behavior on ARM where the ISA only provides this type of 1-byte or 2-byte load that zero-expands and writes the whole register. Maybe there should be an XINST_CREATE_load_1byte_zextend() or sthg for this. For this particular store to the buffer, on x86, we may be able to get away with writing the whole word, rely on it being little-endian, and only update the buf ptr by 1 (or 2). And on ARM there's no problem. Another alternative for x86 is OP_movs but it takes too much setup as you need 3 specific regs.
Sign in to reply to this message.
On 2017/01/16 20:22:51, bruening wrote: > https://codereview.appspot.com/317090043/diff/40001/ext/drx/drx_buf.c > File ext/drx/drx_buf.c (right): > > https://codereview.appspot.com/317090043/diff/40001/ext/drx/drx_buf.c#newcode653 > ext/drx/drx_buf.c:653: insert_load(void *drcontext, instrlist_t *ilist, instr_t > *where, > On 2017/01/15 07:24:34, toshi wrote: > > I've come into this problem a few times before, where I wanted a generic > > load/store that worked on any gpr. Maybe in a separate CL we change > > drx_buf_insert_buf_store() to drx_insert_store(), and make > > drx_buf_insert_buf_store() just redirect to drx_insert_store() so there's no > API > > change. > > > > Conceivably we could also write a drx_insert_load() with insert_load() as the > > base (though it needs a bit of work done on it, as it's currently tailored to > > fit our needs as of now). > > For a load, see below re: movzx. > > https://codereview.appspot.com/317090043/diff/40001/ext/drx/drx_buf.c#newcode695 > ext/drx/drx_buf.c:695: /* This could happen if, for example we tried to resize > the base > On 2017/01/16 01:55:43, toshi wrote: > > On 2017/01/15 07:24:34, toshi wrote: > > > Is this a good solution? Should we instead exclude ebp and some other > > registers > > > from drreg()? Not sure which is worse for performance: drreg excluding some > > > registers in 32-bit and thus thrashing trying to spill/restore registers, or > > > drreg allowing ebp and us having to fall back onto the clean call for many > > (how > > > many?) 1-byte writes. > > > > This is actually a problem with drx_buf_insert_buf_store() as well. No client > > that I know of uses XINST_CREATE_store_1byte(), so I've never run into this > > problem before. > > > > Should we put this burden on the client to explicitly disallow reserving esi, > > edi and ebp? This would fix all our problems, but in that case maybe we should > > create a method drreg_init_vector_at_least_size() which takes an opnd_size_t > and > > will fill a drvector with only registers which can be resized to that operand > > size (a no-op on AArchXX). This way clients only hit a potential slowdown when > > they know that they require a register which can be resized to it's 1-byte > > counterpart. > > > > I can do the above in a separate CL before moving on here, if need be. > > For a load, often the simplest solution is to use movzx to load 1 byte into the > full register. This matches the behavior on ARM where the ISA only provides > this type of 1-byte or 2-byte load that zero-expands and writes the whole > register. Maybe there should be an XINST_CREATE_load_1byte_zextend() or sthg > for this. > > For this particular store to the buffer, on x86, we may be able to get away with > writing the whole word, rely on it being little-endian, and only update the buf > ptr by 1 (or 2). And on ARM there's no problem. > > Another alternative for x86 is OP_movs but it takes too much setup as you need 3 > specific regs. Done.
Sign in to reply to this message.
Commit log for latest patchset: --------------- i#1834 memval: Adds memval sample Introduces a sample client which inserts instrumentation after the current instruction to dereference app memory, and to fill a per-thread buffer. Fixes #1834 Review-URL: https://codereview.appspot.com/317090043 ---------------
Sign in to reply to this message.
https://codereview.appspot.com/317090043/diff/60001/core/arch/x86/instr_create.h File core/arch/x86/instr_create.h (right): https://codereview.appspot.com/317090043/diff/60001/core/arch/x86/instr_creat... core/arch/x86/instr_create.h:172: * followed by a zextend. Oops, thinking about it, this comment is a little out of date. I just perform the movzx directly. Will change.
Sign in to reply to this message.
It's cleaner to separate each feature into its own CL: so the drx addition on its own (with a test in drx-test); the XINST on its own (w/ a test in ir_x86.c); and memval on its own. (Sound good? Hopefully not making too much work for you here: just generally isolating each feature+test is best, for later bisect, cherry-picking, code history understanding, etc.) Also, our convention so far is to add a line for each feature manually in its CL into api/docs/release.dox.
Sign in to reply to this message.
On 2017/01/18 04:32:11, bruening wrote: > It's cleaner to separate each feature into its own CL: so the drx addition on > its own (with a test in drx-test); the XINST on its own (w/ a test in ir_x86.c); > and memval on its own. (Sound good? Hopefully not making too much work for you > here: just generally isolating each feature+test is best, for later bisect, > cherry-picking, code history understanding, etc.) Also, our convention so far > is to add a line for each feature manually in its CL into api/docs/release.dox. No problem, can do!
Sign in to reply to this message.
https://codereview.appspot.com/317090043/diff/60001/api/samples/memval_simple.c File api/samples/memval_simple.c (right): https://codereview.appspot.com/317090043/diff/60001/api/samples/memval_simple... api/samples/memval_simple.c:142: fprintf(data->logf, ""PFX": %s %2d %s\n", Putting no space between a string constant and a macro has caused problems before, for example, most recently: https://github.com/DynamoRIO/dynamorio/commit/ec2bbce298fb46ad91b279c9a5d2375... This instance is in a .c file, so it's presumably not a problem, but I thought it worth mentioning anyway.
Sign in to reply to this message.
Derek: For memval Part 3, should I upload only the patchset related to drx_buf_insert_memcpy(), or should I upload all of memval_simple (i.e. this patch w/o the zextend and drx padding stuff). https://codereview.appspot.com/317090043/diff/60001/api/samples/memval_simple.c File api/samples/memval_simple.c (right): https://codereview.appspot.com/317090043/diff/60001/api/samples/memval_simple... api/samples/memval_simple.c:142: fprintf(data->logf, ""PFX": %s %2d %s\n", On 2017/01/19 09:15:26, Edmund.Grimley.Evans wrote: > Putting no space between a string constant and a macro has caused problems > before, for example, most recently: > https://github.com/DynamoRIO/dynamorio/commit/ec2bbce298fb46ad91b279c9a5d2375... > > This instance is in a .c file, so it's presumably not a problem, but I thought > it worth mentioning anyway. Acknowledged.
Sign in to reply to this message.
On 2017/01/23 18:29:52, toshi wrote: > Derek: For memval Part 3, should I upload only the patchset related to > drx_buf_insert_memcpy(), or should I upload all of memval_simple (i.e. this > patch w/o the zextend and drx padding stuff). I guess it seems cleanest to have the drx_buf feature be separate with a test of its own in suite/ and then have the new memval sample added in its own CL. Of course that's easy for me to say, you're the one doing the work -- doesn't seem so bad to combine them, but I would vote for separate.
Sign in to reply to this message.
On 2017/01/23 18:59:04, bruening wrote: > On 2017/01/23 18:29:52, toshi wrote: > > Derek: For memval Part 3, should I upload only the patchset related to > > drx_buf_insert_memcpy(), or should I upload all of memval_simple (i.e. this > > patch w/o the zextend and drx padding stuff). > > I guess it seems cleanest to have the drx_buf feature be separate with a test of > its own in suite/ and then have the new memval sample added in its own CL. Of > course that's easy for me to say, you're the one doing the work -- doesn't seem > so bad to combine them, but I would vote for separate. Moved to GitHub UI: https://github.com/DynamoRIO/dynamorio/pull/2174
Sign in to reply to this message.
|