|
|
Created:
7 years, 4 months ago by Edmund.Grimley.Evans Modified:
7 years, 4 months ago CC:
dynamorio-devs_googlegroups.com Visibility:
Public. |
DescriptionCommit log for first patchset:
---------------
i#2091: Implement execve of a "#!" script with early injection.
The implementation carefully imitates what a recent Linux kernel was
observed to do with levels of recursion and long lines, but there are
some race conditions and transparency limitations: see the comment
in the code.
Fixes #2091
---------------
Patch Set 1 #Patch Set 2 : Fixed #Patch Set 3 : dr_helper #
Total comments: 52
Patch Set 4 : set_failure_return_val #
Total comments: 4
Patch Set 5 : MacOS #Patch Set 6 : Committed #
MessagesTotal messages: 41
Sign in to reply to this message.
There's a serious problem with this. Don't review yet...
Sign in to reply to this message.
Commit log for latest patchset: --------------- i#2091: Implement execve of a "#!" script with early injection. The implementation carefully imitates what a recent Linux kernel was observed to do with levels of recursion and long lines, but there are some race conditions and transparency limitations: see the comment in the code. Fixes #2091 Review-URL: https://codereview.appspot.com/318920043 ---------------
Sign in to reply to this message.
On 2016/12/07 11:05:15, Edmund.Grimley.Evans wrote: > Fixes #2091 It looks like this will not handle initial invocation of a script with drrun, so the "Fixes #2091" seems premature.
Sign in to reply to this message.
On 2016/12/07 15:42:22, bruening wrote: > On 2016/12/07 11:05:15, Edmund.Grimley.Evans wrote: > > Fixes #2091 > > It looks like this will not handle initial invocation of a script with drrun, so > the "Fixes #2091" seems premature. I don't see any mention of that in i#2091, but it would be good to have that feature, so I'll delete the "Fixes" line from the log message.
Sign in to reply to this message.
I wonder if a different design would better handle both initial and from-app execve: moving the script parsing to post-execve in loader.c. I guess this has the problem of 32-bit vs 64-bit final ELF file and would require a re-execve there -- if it weren't for that, I wonder if pinning drconfig files to the scripts themselves is reasonable or whether all app checks (bitwidth, drconfig) should be on the final ELF.
Sign in to reply to this message.
On 2016/12/07 16:04:11, bruening wrote: > I wonder if a different design would better handle both initial and from-app > execve: moving the script parsing to post-execve in loader.c. I guess this has > the problem of 32-bit vs 64-bit final ELF file and would require a re-execve > there -- if it weren't for that, I wonder if pinning drconfig files to the > scripts themselves is reasonable or whether all app checks (bitwidth, drconfig) > should be on the final ELF. How would recursive script interpreters fit into that? There are several ways of interpreting a command within a command line and this is a common cause of confusion. Some commands feed their arguments into /bin/sh, some do their own search of the PATH, some just use the argument as a (relative) file path,... Often the man page gives no clue.
Sign in to reply to this message.
I don't mind using "bin64/drrun -- sh -c the_script" or "bin64/drrun -- env the_script", by the way. You can't gdb the_script, either, though you can strace the_script. Note that if you put commands in an executable text file without the "#!" then you run that file from a shell: $ echo echo foo > s $ chmod +x s $ ./s foo $ But you can't "exec" that file, and you can't strace it either: $ strace -o log ./s strace: exec: Exec format error Whatever you do, some user will be expecting something different...
Sign in to reply to this message.
On 2016/12/07 16:59:50, Edmund.Grimley.Evans wrote: > I don't mind using "bin64/drrun -- sh -c the_script" or "bin64/drrun -- env > the_script", by the way. You can't gdb the_script, either, though you can strace > the_script. > > Note that if you put commands in an executable text file without the "#!" then > you run that file from a shell: > > $ echo echo foo > s > $ chmod +x s > $ ./s > foo > $ > > But you can't "exec" that file, and you can't strace it either: > > $ strace -o log ./s > strace: exec: Exec format error > > Whatever you do, some user will be expecting something different... Typical usage is to take some command line and prefix it with "drrun -- ". It can be a pain as a user to have to go figure out whether the initial target is a script or not and put in a special prefix in that case (e.g., with /usr/bin/firefox).
Sign in to reply to this message.
On 2016/12/07 17:07:00, bruening wrote: > Typical usage is to take some command line and prefix it with "drrun -- ". It > can be a pain as a user to have to go figure out whether the initial target is a > script or not and put in a special prefix in that case (e.g., with > /usr/bin/firefox). Yes, I understand that point, but life can never be that simple. What about shell built-ins, for example? Note, also, that "foo" and "drrun -- foo" can execute different programs if you have an ELF executable called "foo" in your current directory but a different "foo" is found on your PATH...
Sign in to reply to this message.
Maybe the best thing is for the parent to always resolve down to an ELF, with shared code between SYS_execve handling and drinjectlib. I was just throwing out the idea of waiting, since it's post-execve when we actually load the app, and for drconfig file options that works but for the meta stuff like blacklisting it has to be in the parent (unless we put in some go-native scheme post-execve), so unless we want to go the route of, as mentioned, drconfig on script files somehow (first one, if recursive, or each one), I suppose the parent is better. Qin, any thoughts?
Sign in to reply to this message.
On 2016/12/07 17:26:52, Edmund.Grimley.Evans wrote: > On 2016/12/07 17:07:00, bruening wrote: > > Typical usage is to take some command line and prefix it with "drrun -- ". It > > can be a pain as a user to have to go figure out whether the initial target is > a > > script or not and put in a special prefix in that case (e.g., with > > /usr/bin/firefox). > > Yes, I understand that point, but life can never be that simple. What about > shell built-ins, for example? We should support that case, of drrun on /usr/bin/firefox. Rarer things like shell-builtins seem ok to require the user to change. > Note, also, that "foo" and "drrun -- foo" can > execute different programs if you have an ELF executable called "foo" in your > current directory but a different "foo" is found on your PATH... ? If . is not on your PATH then "foo" in your current directory will not be found in the first place.
Sign in to reply to this message.
On 2016/12/07 17:34:19, bruening wrote: > On 2016/12/07 17:26:52, Edmund.Grimley.Evans wrote: > > On 2016/12/07 17:07:00, bruening wrote: > > > Typical usage is to take some command line and prefix it with "drrun -- ". > It > > > can be a pain as a user to have to go figure out whether the initial target > is > > a > > > script or not and put in a special prefix in that case (e.g., with > > > /usr/bin/firefox). > > > > Yes, I understand that point, but life can never be that simple. What about > > shell built-ins, for example? > > We should support that case, of drrun on /usr/bin/firefox. Rarer things like > shell-builtins seem ok to require the user to change. Yes, but I'm not sure how it should be done or whether it belongs in this patch. > > Note, also, that "foo" and "drrun -- foo" can > > execute different programs if you have an ELF executable called "foo" in your > > current directory but a different "foo" is found on your PATH... > > ? If . is not on your PATH then "foo" in your current directory will not be > found in the first place. But DR finds it: $ cp -i /usr/bin/uptime date $ date Wed 7 Dec 18:09:04 GMT 2016 $ bin64/drrun -- date 18:09:07 up 84 days, 5:13, 3 users, load average: 4.92, 3.76, 2.23 $
Sign in to reply to this message.
On 2016/12/07 18:11:03, Edmund.Grimley.Evans wrote: > On 2016/12/07 17:34:19, bruening wrote: > > On 2016/12/07 17:26:52, Edmund.Grimley.Evans wrote: > > > On 2016/12/07 17:07:00, bruening wrote: > > > > Typical usage is to take some command line and prefix it with "drrun -- ". > > > It > > > > can be a pain as a user to have to go figure out whether the initial > target > > is > > > a > > > > script or not and put in a special prefix in that case (e.g., with > > > > /usr/bin/firefox). > > > > > > Yes, I understand that point, but life can never be that simple. What about > > > shell built-ins, for example? > > > > We should support that case, of drrun on /usr/bin/firefox. Rarer things like > > shell-builtins seem ok to require the user to change. > > Yes, but I'm not sure how it should be done or whether it belongs in this patch. > > > > Note, also, that "foo" and "drrun -- foo" can > > > execute different programs if you have an ELF executable called "foo" in > your > > > current directory but a different "foo" is found on your PATH... > > > > ? If . is not on your PATH then "foo" in your current directory will not be > > found in the first place. > > But DR finds it: > > $ cp -i /usr/bin/uptime date > $ date > Wed 7 Dec 18:09:04 GMT 2016 > $ bin64/drrun -- date > 18:09:07 up 84 days, 5:13, 3 users, load average: 4.92, 3.76, 2.23 > $ Sounds like a separate, unrelated bug in how drdeploy.c uses drfront_get_absolute_path(). Please file an issue.
Sign in to reply to this message.
On 2016/12/07 18:53:45, bruening wrote: > Sounds like a separate, unrelated bug in how drdeploy.c uses > drfront_get_absolute_path(). Please file an issue. Done: i#2097
Sign in to reply to this message.
There are three kinds of recursion potentially involved in executing a file: script interpreters, ELF interpreters (PT_INTERP), and good old symbolic links. I'm not sure whether ELF interpreters are allowed to be recursive but you still might want to configure different ELF interpreters differently, I suppose.
Sign in to reply to this message.
We should make sure the code is in a place that can be shared with drinjectlib, so we don't have to move it later. I don't remember what gets linked into drinjectlib (long-term we wanted to clean things up and stop multi-building core files w/ crazy NOT_ defines).
Sign in to reply to this message.
On 2016/12/10 20:42:39, bruening wrote: > We should make sure the code is in a place that can be shared with drinjectlib, > so we don't have to move it later. I don't remember what gets linked into > drinjectlib (long-term we wanted to clean things up and stop multi-building core > files w/ crazy NOT_ defines). Does drinjectlib use the same kind of memory allocation as core/unix/os.c, or would that have to be virtualised?
Sign in to reply to this message.
On 2016/12/10 22:04:56, Edmund.Grimley.Evans wrote: > On 2016/12/10 20:42:39, bruening wrote: > > We should make sure the code is in a place that can be shared with > drinjectlib, > > so we don't have to move it later. I don't remember what gets linked into > > drinjectlib (long-term we wanted to clean things up and stop multi-building > core > > files w/ crazy NOT_ defines). > > Does drinjectlib use the same kind of memory allocation as core/unix/os.c, or > would that have to be virtualised? drinjectlib does not link in the DR allocator.
Sign in to reply to this message.
On 2016/12/11 02:32:47, bruening wrote: > On 2016/12/10 22:04:56, Edmund.Grimley.Evans wrote: > > On 2016/12/10 20:42:39, bruening wrote: > > > We should make sure the code is in a place that can be shared with > > drinjectlib, > > > so we don't have to move it later. I don't remember what gets linked into > > > drinjectlib (long-term we wanted to clean things up and stop multi-building > > core > > > files w/ crazy NOT_ defines). > > > > Does drinjectlib use the same kind of memory allocation as core/unix/os.c, or > > would that have to be virtualised? > > drinjectlib does not link in the DR allocator. So the new functions take the allocator function (malloc, or a wrapper round malloc, or whatever) as an additional argument?
Sign in to reply to this message.
On 2016/12/11 08:17:09, Edmund.Grimley.Evans wrote: > On 2016/12/11 02:32:47, bruening wrote: > > On 2016/12/10 22:04:56, Edmund.Grimley.Evans wrote: > > > On 2016/12/10 20:42:39, bruening wrote: > > > > We should make sure the code is in a place that can be shared with > > > drinjectlib, > > > > so we don't have to move it later. I don't remember what gets linked into > > > > drinjectlib (long-term we wanted to clean things up and stop > multi-building > > > core > > > > files w/ crazy NOT_ defines). > > > > > > Does drinjectlib use the same kind of memory allocation as core/unix/os.c, > or > > > would that have to be virtualised? > > > > drinjectlib does not link in the DR allocator. > > So the new functions take the allocator function (malloc, or a wrapper round > malloc, or whatever) as an additional argument? Or are passed a buffer -- drinjectlib could put it on the stack.
Sign in to reply to this message.
On 2016/12/11 21:59:19, bruening wrote: > > So the new functions take the allocator function (malloc, or a wrapper round > > malloc, or whatever) as an additional argument? > > Or are passed a buffer -- drinjectlib could put it on the stack. The maximum required for the strings is about 512 bytes, so the strings could be allocated into a buffer allocated by the caller. The current API requires an unbounded amount of memory for argv, however, so the API would have to be changed.
Sign in to reply to this message.
The different contexts don't just require different memory allocation; they also, I presume, require different approaches to accessing files. However, the core functionality could probably be provided by the following in dr_helper.[ch]. Does it look reasonable? // dr_helper.h: #ifdef UNIX typedef struct _script_interpreter_t { int argc; /* number of additional arguments */ char *argv[11]; /* null terminated list of arguments */ char strbuf[512]; /* a buffer for allocating the strings */ } script_interpreter_t; bool find_script_interpreter(script_interpreter_t *result, const char *fname, ssize_t (*read)(const char *pathname, void *buf, size_t count)); #endif /* UNIX */ // dr_helper.c: #ifdef UNIX /* If "fname" is a "#!" script, fill in "result" and return true; otherwise return * false. The script may use recursive script interpreters, up to four levels. * The "result" will contain the additional arguments supplied by the script file; * the caller is responsible for appending the original filepath "fname" and any * additional arguments. The function "read" is a callback used for reading the start * of "fname" and any recursive interpreters; it should also check that the files are * executable. */ bool find_script_interpreter(script_interpreter_t *result, const char *fname, ssize_t (*read)(const char *pathname, void *buf, size_t count)) { /* NYI */ return false; } #endif
Sign in to reply to this message.
tl;dr -- yes. If drhelper is the right place (as opposed to a 2nd lib). Xref i#1409. If there's no reason to separate module-specific code (in which this would be included, along w/ various ELF, etc. parsing stuff used in drfrontendlib and other places and which should eventually be provided in some lib) from the current asm contents of drhelper then yes it is the right place. Even if in drhelper, it might be best to have separate headers and .c files for the module-specific stuff (as mentioned here and in i#1409 we plan to move more of it over) rather than everything in dr_helper.*. Though perhaps we can just start out there and wait until we move more stuff. I don't know where the hardcoded [11] and [512] come from -- are we sure those are sufficient or should they be separately allocated to have no limit. On Tue, Dec 13, 2016 at 4:48 AM, <Edmund.Grimley.Evans@gmail.com> wrote: > The different contexts don't just require different memory allocation; > they also, I presume, require different approaches to accessing files. > However, the core functionality could probably be provided by the > following in dr_helper.[ch]. Does it look reasonable? > > // dr_helper.h: > > #ifdef UNIX > typedef struct _script_interpreter_t { > int argc; /* number of additional arguments */ > char *argv[11]; /* null terminated list of arguments */ > char strbuf[512]; /* a buffer for allocating the strings */ > } script_interpreter_t; > > bool > find_script_interpreter(script_interpreter_t *result, > const char *fname, > ssize_t (*read)(const char *pathname, void *buf, > size_t count)); > #endif /* UNIX */ > > // dr_helper.c: > > #ifdef UNIX > /* If "fname" is a "#!" script, fill in "result" and return true; > otherwise return > * false. The script may use recursive script interpreters, up to four > levels. > * The "result" will contain the additional arguments supplied by the > script file; > * the caller is responsible for appending the original filepath "fname" > and any > * additional arguments. The function "read" is a callback used for > reading the start > * of "fname" and any recursive interpreters; it should also check that > the files are > * executable. > */ > bool > find_script_interpreter(script_interpreter_t *result, > const char *fname, > ssize_t (*read)(const char *pathname, void *buf, > size_t count)) > { > /* NYI */ > return false; > } > #endif > > > > https://codereview.appspot.com/318920043/ > > -- > You received this message because you are subscribed to the Google Groups > "DynamoRIO Devs" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to dynamorio-devs+unsubscribe@googlegroups.com. > To post to this group, send email to dynamorio-devs@googlegroups.com. > Visit this group at https://groups.google.com/group/dynamorio-devs. > For more options, visit https://groups.google.com/d/optout. >
Sign in to reply to this message.
On 2016/12/13 16:31:06, bruening wrote: > I don't know where the hardcoded [11] and [512] come from -- are we sure > those are sufficient or should they be separately allocated to have no > limit. The 11 should perhaps be 9, 2 * 4 + 1: each of the 4 possible levels of recursion supplies an interpreter and an optional argument, plus 1 for the NULL terminator. The 512 is 4 * 128: at each of the 4 levels we only look at the first 127 bytes of the file, so the interpreter plus the optional argument fit in 128 bytes (126 bytes, really, including NULL terminators, because the first two are "#!"). Provided I've got those numbers right in the first place (to be confirmed and tested), they are unlikely to change, I think. If they did change, would it matter? This is an internal API, rather than an external ABI that has to be kept compatible, isn't it?
Sign in to reply to this message.
On Tue, Dec 13, 2016 at 1:19 PM, <Edmund.Grimley.Evans@gmail.com> wrote: > On 2016/12/13 16:31:06, bruening wrote: > >> I don't know where the hardcoded [11] and [512] come from -- are we >> > sure > >> those are sufficient or should they be separately allocated to have no >> limit. >> > > The 11 should perhaps be 9, 2 * 4 + 1: each of the 4 possible levels of > recursion supplies an interpreter and an optional argument, plus 1 for > the NULL terminator. The 512 is 4 * 128: at each of the 4 levels we only > look at the first 127 bytes of the file, so the interpreter plus the > optional argument fit in 128 bytes (126 bytes, really, including NULL > terminators, because the first two are "#!"). > > Provided I've got those numbers right in the first place (to be > confirmed and tested), they are unlikely to change, I think. If they did > change, would it matter? This is an internal API, rather than an > external ABI that has to be kept compatible, isn't it? It sounds like there are justified hard limits, which was the question.
Sign in to reply to this message.
Commit log for latest patchset: --------------- i#2091: Implement execve of a "#!" script with early injection. The implementation carefully imitates what a recent Linux kernel was observed to do with levels of recursion and long lines, but there are some race conditions and transparency limitations: see the comment in the code. Fixes #2091 Review-URL: https://codereview.appspot.com/318920043 ---------------
Sign in to reply to this message.
https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c File core/lib/dr_helper.c (right): https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:99: is_shebang(IN char *str, OUT char **interp, OUT char **arg) str can be const https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:106: while (*p == ' ' || *p == '\t') A comment explaining what is assumed about the format would help. This is assuming no other type of whitespace (\v, \r, non-ascii), which I believe is correct. https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:108: if (*p == '\n' || *p == 0) style: don't mix types: use '\0' https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:112: while (*p != ' ' && *p != '\t' && *p != '\n' && *p != 0) Path with a space is not supported, which again seems correct. See earlier comment: a comment up front explaining the assumed format would make it easier to understand and validate the code. https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:112: while (*p != ' ' && *p != '\t' && *p != '\n' && *p != 0) style again -- and many places below https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:119: *p++ = 0; ok so str is modified here -- worth mentioning in a comment before func decl https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:147: * executable. How about moving this comment to the header https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:152: ssize_t (*read)(const char *pathname, void *buf, size_t count)) Maybe use a different name to avoid shadowing libc read() https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:167: result->buffer[0][len] = 0; style: use char types for chars: '\0' https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:174: *argv++ = interp; This is backward: the arg should go first. https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:198: /* Reverse order of arguments and null-terminate. */ OK, there needs to be a comment up above explaining that it's *deliberately* adding in backward order. https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h File core/lib/dr_helper.h (right): https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h#newc... core/lib/dr_helper.h:79: /* Linux allows five levels of script interpreter, and reads no more than 127 It's ifdef UNIX but it seems to only consider Linux. Does MacOS differ? How about Android? https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h#newc... core/lib/dr_helper.h:84: char *argv[11]; /* null terminated list of arguments */ 11 needs to be explained here -- best as a named constant with comment https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h#newc... core/lib/dr_helper.h:85: char buffer[5][128]; /* buffers for allocating strings */ Seems better to have these each be named constants, for better readability and to make it easy to change (maybe MacOS has a different value or sthg). https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h#newc... core/lib/dr_helper.h:88: bool The comment that explains what this function does would be more useful here in the header so potential users will see it https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h#newc... core/lib/dr_helper.h:89: find_script_interpreter(script_interpreter_t *result, Use IN + OUT on these https://codereview.appspot.com/318920043/diff/40001/core/unix/os.c File core/unix/os.c (right): https://codereview.appspot.com/318920043/diff/40001/core/unix/os.c#newcode5545 core/unix/os.c:5545: * do a (non-failing) execve and therefore not to have to free the memory. We have a returned-execve path in place for handling failure: best to free it there. Unless you mean that the execve cannot fail in that case -- are you sure all checks are prior to that point? Please elaborate. https://codereview.appspot.com/318920043/diff/40001/core/unix/os.c#newcode5547 core/unix/os.c:5547: * executable ELF file. This needs to handle all UNIX that we support, so not just ELF. https://codereview.appspot.com/318920043/diff/40001/core/unix/os.c#newcode5563 core/unix/os.c:5563: * handle_execve. It seems little effort to add a free in the post-execve, to avoid all of these corner cases. https://codereview.appspot.com/318920043/diff/40001/core/unix/os.c#newcode5582 core/unix/os.c:5582: /* Check that the final interpreter is an ELF file. */ This is not ELF-specific: please update comment. https://codereview.appspot.com/318920043/diff/40001/core/unix/os.c#newcode6773 core/unix/os.c:6773: set_success_return_val(dcontext, ret); bug: set_failure_return_val (else will break MacOS)
Sign in to reply to this message.
https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c File core/lib/dr_helper.c (right): https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:119: *p++ = 0; On 2016/12/16 05:03:16, bruening wrote: > ok so str is modified here -- worth mentioning in a comment before func decl (b/c const often forgotten) and it should be INOUT, it seems, not IN
Sign in to reply to this message.
https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c File core/lib/dr_helper.c (right): https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:99: is_shebang(IN char *str, OUT char **interp, OUT char **arg) On 2016/12/16 05:03:16, bruening wrote: > str can be const Changed to INOUT. https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:106: while (*p == ' ' || *p == '\t') On 2016/12/16 05:03:16, bruening wrote: > A comment explaining what is assumed about the format would help. This is > assuming no other type of whitespace (\v, \r, non-ascii), which I believe is > correct. Comment added. https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:108: if (*p == '\n' || *p == 0) On 2016/12/16 05:03:16, bruening wrote: > style: don't mix types: use '\0' Changed. You might want to update https://github.com/DynamoRIO/dynamorio/wiki/Code-Style as there's no mention of '\0' there. You'll have to be a bit careful how you express the rule, however, because '\0' has the same type as 0: they're both int; sizeof('\0') == sizeof(int). https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:112: while (*p != ' ' && *p != '\t' && *p != '\n' && *p != 0) On 2016/12/16 05:03:16, bruening wrote: > Path with a space is not supported, which again seems correct. See earlier > comment: a comment up front explaining the assumed format would make it easier > to understand and validate the code. Done. https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:112: while (*p != ' ' && *p != '\t' && *p != '\n' && *p != 0) On 2016/12/16 05:03:16, bruening wrote: > style again -- and many places below Done. https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:119: *p++ = 0; On 2016/12/16 05:09:58, bruening wrote: > On 2016/12/16 05:03:16, bruening wrote: > > ok so str is modified here -- worth mentioning in a comment before func decl > > (b/c const often forgotten) > > and it should be INOUT, it seems, not IN Done. https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:147: * executable. On 2016/12/16 05:03:16, bruening wrote: > How about moving this comment to the header Done. I wasn't sure which place is best. https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:152: ssize_t (*read)(const char *pathname, void *buf, size_t count)) On 2016/12/16 05:03:16, bruening wrote: > Maybe use a different name to avoid shadowing libc read() Done. https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:167: result->buffer[0][len] = 0; On 2016/12/16 05:03:16, bruening wrote: > style: use char types for chars: '\0' Done. https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:174: *argv++ = interp; On 2016/12/16 05:03:16, bruening wrote: > This is backward: the arg should go first. Done. https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:198: /* Reverse order of arguments and null-terminate. */ On 2016/12/16 05:03:16, bruening wrote: > OK, there needs to be a comment up above explaining that it's *deliberately* > adding in backward order. Done. https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h File core/lib/dr_helper.h (right): https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h#newc... core/lib/dr_helper.h:79: /* Linux allows five levels of script interpreter, and reads no more than 127 On 2016/12/16 05:03:16, bruening wrote: > It's ifdef UNIX but it seems to only consider Linux. Does MacOS differ? How > about Android? I don't see why Android would have changed that part of the Linux kernel. I don't know about MacOS. If it turns out that MacOS allows arbitrary levels of recursion, then we have a problem. https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h#newc... core/lib/dr_helper.h:84: char *argv[11]; /* null terminated list of arguments */ On 2016/12/16 05:03:16, bruening wrote: > 11 needs to be explained here -- best as a named constant with comment I'll extend the comment, but I'm wary of cluttering up the global namespace by putting something here that ends up being visible in almost every source file in the tree... https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h#newc... core/lib/dr_helper.h:88: bool On 2016/12/16 05:03:16, bruening wrote: > The comment that explains what this function does would be more useful here in > the header so potential users will see it Done. https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h#newc... core/lib/dr_helper.h:89: find_script_interpreter(script_interpreter_t *result, On 2016/12/16 05:03:16, bruening wrote: > Use IN + OUT on these Done. There's no point in putting IN on the function pointer argument, right? :-)
Sign in to reply to this message.
https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c File core/lib/dr_helper.c (right): https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:108: if (*p == '\n' || *p == 0) If you really want a char-sized 0, there's always *""!
Sign in to reply to this message.
https://codereview.appspot.com/318920043/diff/40001/core/unix/os.c File core/unix/os.c (right): https://codereview.appspot.com/318920043/diff/40001/core/unix/os.c#newcode5545 core/unix/os.c:5545: * do a (non-failing) execve and therefore not to have to free the memory. On 2016/12/16 05:03:17, bruening wrote: > We have a returned-execve path in place for handling failure: best to free it > there. > > Unless you mean that the execve cannot fail in that case -- are you sure all > checks are prior to that point? Please elaborate. It's an exec of libdynamorio.so so it should only fail if there's some kind of misconfiguration or a resource problem. I've modified the comment. https://codereview.appspot.com/318920043/diff/40001/core/unix/os.c#newcode5547 core/unix/os.c:5547: * executable ELF file. On 2016/12/16 05:03:17, bruening wrote: > This needs to handle all UNIX that we support, so not just ELF. I've changed "ELF file" to "executable binary" in a couple of places. https://codereview.appspot.com/318920043/diff/40001/core/unix/os.c#newcode5563 core/unix/os.c:5563: * handle_execve. On 2016/12/16 05:03:17, bruening wrote: > It seems little effort to add a free in the post-execve, to avoid all of these > corner cases. I think the comment is probably wrong and there's no need to add a free in the post-execve: you should only get there with "script" still allocated when something has gone badly wrong and leaking memory is the least of your worries. However, there may well be some bugs involving children that are not followed. I'm not at all familiar with how that stuff works. In fact, I'm not sure it's even been decided how DR will determine whether to follow the child when it's a script. https://codereview.appspot.com/318920043/diff/40001/core/unix/os.c#newcode5582 core/unix/os.c:5582: /* Check that the final interpreter is an ELF file. */ On 2016/12/16 05:03:17, bruening wrote: > This is not ELF-specific: please update comment. Done. https://codereview.appspot.com/318920043/diff/40001/core/unix/os.c#newcode6773 core/unix/os.c:6773: set_success_return_val(dcontext, ret); On 2016/12/16 05:03:17, bruening wrote: > bug: set_failure_return_val (else will break MacOS) Done.
Sign in to reply to this message.
Commit log for latest patchset: --------------- i#2091: Implement execve of a "#!" script with early injection. The implementation carefully imitates what a recent Linux kernel was observed to do with levels of recursion and long lines, but there are some race conditions and transparency limitations: see the comment in the code. Fixes #2091 Review-URL: https://codereview.appspot.com/318920043 ---------------
Sign in to reply to this message.
Could we add a test? https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c File core/lib/dr_helper.c (right): https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.c#newc... core/lib/dr_helper.c:108: if (*p == '\n' || *p == 0) On 2016/12/16 09:56:53, Edmund.Grimley.Evans wrote: > On 2016/12/16 05:03:16, bruening wrote: > > style: don't mix types: use '\0' > > Changed. > > You might want to update https://github.com/DynamoRIO/dynamorio/wiki/Code-Style > as there's no mention of '\0' there. You'll have to be a bit careful how you > express the rule, however, because '\0' has the same type as 0: they're both > int; sizeof('\0') == sizeof(int). I updated the style guide. It's not so much the underlying type: it's the logical type. Like bool in C, it's just an int, but it is logically consistent to keep the types clean at an abstract level. This is a common style convention -- e.g., see Google's: https://google.github.io/styleguide/cppguide.html#0_and_nullptr/NULL https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h File core/lib/dr_helper.h (right): https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h#newc... core/lib/dr_helper.h:79: /* Linux allows five levels of script interpreter, and reads no more than 127 On 2016/12/16 09:56:53, Edmund.Grimley.Evans wrote: > On 2016/12/16 05:03:16, bruening wrote: > > It's ifdef UNIX but it seems to only consider Linux. Does MacOS differ? How > > about Android? > > I don't see why Android would have changed that part of the Linux kernel. I > don't know about MacOS. If it turns out that MacOS allows arbitrary levels of > recursion, then we have a problem. If Mac has not been considered at all there should be a comment about that, an ASSERT_NOT_TESTED, or this should be ifdef LINUX. The first hit on a search for MacOS says the line length is 512 and there's no recursion at all: https://books.google.com/books?id=K8vUkpOXhN4C&pg=PA826&lpg=PA826&dq=mac+os+e... Another quick search for BSD finds this interesting table: http://www.in-ulm.de/~mascheck/various/shebang/ https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h#newc... core/lib/dr_helper.h:84: char *argv[11]; /* null terminated list of arguments */ On 2016/12/16 09:56:53, Edmund.Grimley.Evans wrote: > On 2016/12/16 05:03:16, bruening wrote: > > 11 needs to be explained here -- best as a named constant with comment > > I'll extend the comment, but I'm wary of cluttering up the global namespace by > putting something here that ends up being visible in almost every source file in > the tree... I think it's much clearer to have: #define SCRIPT_RECURSION_MAX IF_MACOS_ELSE(1, 5) #define SCRIPT_LINE_MAX IF_MACOS_ELSE(512, 128) char *argv[SCRIPT_RECURSION_MAX * 2 + 1]; char buffer[SCRIPT_RECURSION_MAX][SCRIPT_LINE_MAX]; The code is clearer, magic constants are pulled out and recognized up front, it is easy to have different values for MacOS or BSD, and it matches the style of the rest of the code base. https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h#newc... core/lib/dr_helper.h:85: char buffer[5][128]; /* buffers for allocating strings */ On 2016/12/16 05:03:17, bruening wrote: > Seems better to have these each be named constants, for better readability and > to make it easy to change (maybe MacOS has a different value or sthg). And indeed, MacOS does have different values. https://codereview.appspot.com/318920043/diff/40001/core/unix/os.c File core/unix/os.c (right): https://codereview.appspot.com/318920043/diff/40001/core/unix/os.c#newcode5563 core/unix/os.c:5563: * handle_execve. On 2016/12/16 12:00:21, Edmund.Grimley.Evans wrote: > On 2016/12/16 05:03:17, bruening wrote: > > It seems little effort to add a free in the post-execve, to avoid all of these > > corner cases. > > I think the comment is probably wrong and there's no need to add a free in the > post-execve: you should only get there with "script" still allocated when > something has gone badly wrong and leaking memory is the least of your worries. > However, there may well be some bugs involving children that are not followed. > I'm not at all familiar with how that stuff works. In fact, I'm not sure it's > even been decided how DR will determine whether to follow the child when it's a > script. It's all based on config files whose name comes from the target app binary, as described in the user-facing docs. I will put a comment on the line below where it checks. https://codereview.appspot.com/318920043/diff/60001/core/unix/os.c File core/unix/os.c (right): https://codereview.appspot.com/318920043/diff/60001/core/unix/os.c#newcode5745 core/unix/os.c:5745: should_inject = rununder_on; This is where it determines whether to take over the child. So if there's a path where we allocate memory for a script that will get here and decide not to run the child under DR, and it sure looks like that's the case, we'll probably want some free code in the post-execve path b/c it's not exec-ing libdynamorio.so anymore but the app binary which has a higher likelihood of failure. We did do an existence check already I suppose, which rules out the majority of failures, but it still seems like a relatively easy thing to do, to add a free?
Sign in to reply to this message.
https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h File core/lib/dr_helper.h (right): https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h#newc... core/lib/dr_helper.h:79: /* Linux allows five levels of script interpreter, and reads no more than 127 On 2016/12/16 19:07:11, bruening wrote: > Another quick search for BSD finds this interesting table: > > http://www.in-ulm.de/~mascheck/various/shebang/ That's some impressively painstaking research! I should probably just stick in the numbers for MacOS taken from there and ASSERT_NOT_TESTED. https://codereview.appspot.com/318920043/diff/60001/core/unix/os.c File core/unix/os.c (right): https://codereview.appspot.com/318920043/diff/60001/core/unix/os.c#newcode5745 core/unix/os.c:5745: should_inject = rununder_on; On 2016/12/16 19:07:11, bruening wrote: > This is where it determines whether to take over the child. So if there's a > path where we allocate memory for a script that will get here and decide not to > run the child under DR, and it sure looks like that's the case, we'll probably > want some free code in the post-execve path b/c it's not exec-ing > libdynamorio.so anymore but the app binary which has a higher likelihood of > failure. We did do an existence check already I suppose, which rules out the > majority of failures, but it still seems like a relatively easy thing to do, to > add a free? I don't see that a free is required in post-execve in this case because once you've decided to go native (not to follow) you can throw away the script struct (that is, free it before you exec) and then simply exec the top-level executable exactly as if it weren't a script. As I see it, the only time you'd want to free in post-execve is when you've tried, but failed, to exec libdynamorio.so, which will only happen on a configuration error or a resource problem. So some changes are required to do this - presumably pointers to the allocated memory need to be returned from handle_execve_script - but no freeing in post-execve. This is all getting a bit too complicated. Can we leave some of this for a future commit? I need to concentrate on AArch64-specific stuff...
Sign in to reply to this message.
https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h File core/lib/dr_helper.h (right): https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h#newc... core/lib/dr_helper.h:79: /* Linux allows five levels of script interpreter, and reads no more than 127 On 2016/12/16 21:22:16, Edmund.Grimley.Evans wrote: > On 2016/12/16 19:07:11, bruening wrote: > > Another quick search for BSD finds this interesting table: > > > > http://www.in-ulm.de/~mascheck/various/shebang/ > > That's some impressively painstaking research! > > I should probably just stick in the numbers for MacOS taken from there and > ASSERT_NOT_TESTED. Sure. https://codereview.appspot.com/318920043/diff/60001/core/unix/os.c File core/unix/os.c (right): https://codereview.appspot.com/318920043/diff/60001/core/unix/os.c#newcode5745 core/unix/os.c:5745: should_inject = rununder_on; On 2016/12/16 21:22:17, Edmund.Grimley.Evans wrote: > On 2016/12/16 19:07:11, bruening wrote: > > This is where it determines whether to take over the child. So if there's a > > path where we allocate memory for a script that will get here and decide not > to > > run the child under DR, and it sure looks like that's the case, we'll probably > > want some free code in the post-execve path b/c it's not exec-ing > > libdynamorio.so anymore but the app binary which has a higher likelihood of > > failure. We did do an existence check already I suppose, which rules out the > > majority of failures, but it still seems like a relatively easy thing to do, > to > > add a free? > > I don't see that a free is required in post-execve in this case because once > you've decided to go native (not to follow) you can throw away the script struct > (that is, free it before you exec) and then simply exec the top-level executable > exactly as if it weren't a script. Yes, makes sense, you can free pre-execve. > As I see it, the only time you'd want to free in post-execve is when you've > tried, but failed, to exec libdynamorio.so, which will only happen on a > configuration error or a resource problem. > > So some changes are required to do this - presumably pointers to the allocated > memory need to be returned from handle_execve_script - but no freeing in > post-execve. > > This is all getting a bit too complicated. Can we leave some of this for a > future commit? I need to concentrate on AArch64-specific stuff... You mean 1) adding tests and 2) returning a pointer here for freeing? OK, but please list the missing pieces in the issue and in comments here so we can track it.
Sign in to reply to this message.
https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h File core/lib/dr_helper.h (right): https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h#newc... core/lib/dr_helper.h:84: char *argv[11]; /* null terminated list of arguments */ On 2016/12/16 19:07:11, bruening wrote: > On 2016/12/16 09:56:53, Edmund.Grimley.Evans wrote: > > On 2016/12/16 05:03:16, bruening wrote: > > > 11 needs to be explained here -- best as a named constant with comment > > > > I'll extend the comment, but I'm wary of cluttering up the global namespace by > > putting something here that ends up being visible in almost every source file > in > > the tree... > > I think it's much clearer to have: > > #define SCRIPT_RECURSION_MAX IF_MACOS_ELSE(1, 5) IF_MACOS_ELSE is not defined, presumably because "globals_shared.h" is only conditionally included.
Sign in to reply to this message.
https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h File core/lib/dr_helper.h (right): https://codereview.appspot.com/318920043/diff/40001/core/lib/dr_helper.h#newc... core/lib/dr_helper.h:84: char *argv[11]; /* null terminated list of arguments */ On 2016/12/19 15:17:04, Edmund.Grimley.Evans wrote: > IF_MACOS_ELSE is not defined, but I can use #if defined(MACOS).
Sign in to reply to this message.
Commit log for latest patchset: --------------- i#2091: Implement execve of a "#!" script with early injection. The implementation carefully imitates what a recent Linux kernel was observed to do with levels of recursion and long lines, but there are some race conditions and transparency limitations: see the comment in the code. Still to do: Handle script interpreters that are not followed, and add a test. Review-URL: https://codereview.appspot.com/318920043 ---------------
Sign in to reply to this message.
If a fixme comment about the known leak is added then LGTM https://codereview.appspot.com/318920043/diff/60001/core/unix/os.c File core/unix/os.c (right): https://codereview.appspot.com/318920043/diff/60001/core/unix/os.c#newcode5745 core/unix/os.c:5745: should_inject = rununder_on; On 2016/12/16 21:40:17, bruening wrote: > On 2016/12/16 21:22:17, Edmund.Grimley.Evans wrote: > > On 2016/12/16 19:07:11, bruening wrote: > > > This is where it determines whether to take over the child. So if there's a > > > path where we allocate memory for a script that will get here and decide not > > to > > > run the child under DR, and it sure looks like that's the case, we'll > probably > > > want some free code in the post-execve path b/c it's not exec-ing > > > libdynamorio.so anymore but the app binary which has a higher likelihood of > > > failure. We did do an existence check already I suppose, which rules out > the > > > majority of failures, but it still seems like a relatively easy thing to do, > > to > > > add a free? > > > > I don't see that a free is required in post-execve in this case because once > > you've decided to go native (not to follow) you can throw away the script > struct > > (that is, free it before you exec) and then simply exec the top-level > executable > > exactly as if it weren't a script. > > Yes, makes sense, you can free pre-execve. > > > As I see it, the only time you'd want to free in post-execve is when you've > > tried, but failed, to exec libdynamorio.so, which will only happen on a > > configuration error or a resource problem. > > > > So some changes are required to do this - presumably pointers to the allocated > > memory need to be returned from handle_execve_script - but no freeing in > > post-execve. > > > > This is all getting a bit too complicated. Can we leave some of this for a > > future commit? I need to concentrate on AArch64-specific stuff... > > You mean 1) adding tests and 2) returning a pointer here for freeing? OK, but > please list the missing pieces in the issue and in comments here so we can track > it. Please add a comment with i# about the missing code (probably up where the memory is allocated). This is currently incorrect, leaky code and needs acknowledgement that we're aware of it.
Sign in to reply to this message.
Committed as https://github.com/DynamoRIO/dynamorio/commit/1b763ba775b8f43fff347ff90eaa628... Final commit log: --------------- i#2091: Implement execve of a "#!" script with early injection. The implementation carefully imitates what a recent Linux kernel was observed to do with levels of recursion and long lines, but there are some race conditions and transparency limitations: see the comment in the code. Still to do: Handle script interpreters that are not followed, and add a test. Review-URL: https://codereview.appspot.com/318920043 ---------------
Sign in to reply to this message.
|