|
|
DescriptionUpload patch and base files in parallel.
Patch Set 1 #
Total comments: 11
Patch Set 2 : Address review comments #
Total comments: 5
Patch Set 3 : change shortcut to -j #Patch Set 4 : remove action=store #
Total comments: 7
Patch Set 5 : code review comments #MessagesTotal messages: 18
Yay! Robbie should still review as I don't know enough about Python, but lgtm! https://codereview.appspot.com/12629043/diff/1/upload.py File upload.py (right): https://codereview.appspot.com/12629043/diff/1/upload.py#newcode2256 upload.py:2256: t = threading.Thread(target=UploadFile, args=(output_queue, filename, data, options.download_base)) Wrap https://codereview.appspot.com/12629043/diff/1/upload.py#newcode2261 upload.py:2261: t.join() Add some timeout and raise error on t.isAlive() ?
Sign in to reply to this message.
lg2m % 2 things: Thing 1: This should be upstreamed to code.google.com/p/rietveld first (this upload.py is a copy-pasta of that one. Yes, it's annoying) Thing 2: I'm curious how this will impact the apparent stability of the patch upload process. In particular, making it worse would be bad :/. I still haven't gotten to drill into the particulars of why patch upload is so bad, but I have the suspicion that more connections will not improve the issue (in particular, I suspect we may be making Blobstore grumpy on the appengine side). https://codereview.appspot.com/12629043/diff/1/upload.py File upload.py (right): https://codereview.appspot.com/12629043/diff/1/upload.py#newcode481 upload.py:481: # for that at the moment. Yay, flake. Can we maybe print a message when we're in verbose mode here? StatusUpdate() looks like the right thing. https://codereview.appspot.com/12629043/diff/1/upload.py#newcode1181 upload.py:1181: except: Can we glean useful info like the HTTP status code from the exception? https://codereview.appspot.com/12629043/diff/1/upload.py#newcode1207 upload.py:1207: file_id, new_content, is_binary, status, False)) Seems like this is a perfect opportunity to combine these two if clauses? https://codereview.appspot.com/12629043/diff/1/upload.py#newcode1209 upload.py:1209: t.start() I think you won't be guaranteed to have a valid `t` here, since it's possible to have a non-base, non-new-content file (i.e. updating from patchset 1 to patchset 2 without a rebase.) I may be wrong here. t should be set to None before the if statement(s), and we should look for it to be not None before start()/append()'ing it.
Sign in to reply to this message.
I very much like the idea of this change. However, the fact that you're doing this manually with threads and a Queue makes me a bit nervous. I would encourage you to use a ThreadPool instead (or possibly even a Multiprocessing Pool), e.g.: http://stackoverflow.com/questions/3033952/python-thread-pool-similar-to-the-... It'll probably simplify your implementation some, and will allow you to much more easily control the amount of parallelism.
Sign in to reply to this message.
On 2013/08/08 00:22:33, dpranke wrote: > I very much like the idea of this change. > > However, the fact that you're doing this manually with threads and a Queue makes > me a bit nervous. > > I would encourage you to use a ThreadPool instead (or possibly even a > Multiprocessing Pool), e.g.: > > http://stackoverflow.com/questions/3033952/python-thread-pool-similar-to-the-... > Wow... I didn't even know about that somehow. I've looked for a stdlib threadpool in python before. Thanks python documentation (or lack thereof)! +1 for this. > It'll probably simplify your implementation some, and will allow you to much > more easily control the amount of parallelism.
Sign in to reply to this message.
> Thing 1: This should be upstreamed to code.google.com/p/rietveld first (this > upload.py is a copy-pasta of that one. Yes, it's annoying) I did this in the rietveld hg checkout. Did it upload it incorrectly or to the wrong place? > Thing 2: I'm curious how this will impact the apparent stability of the patch > upload process. In particular, making it worse would be bad :/. I still haven't > gotten to drill into the particulars of why patch upload is so bad, but I have > the suspicion that more connections will not improve the issue (in particular, I > suspect we may be making Blobstore grumpy on the appengine side). Yup. All I can say is that this makes uploading ~5x faster in my (not very scientific) testing. ThreadPool is definitely better! It also lets me control the number of threads, so I added a commandline-flag so we can tweak it if we need to. Again some unscientific testing led me to the conclusion that 25 threads is optimal, but I don't have too much faith in that. https://codereview.appspot.com/12629043/diff/1/upload.py File upload.py (right): https://codereview.appspot.com/12629043/diff/1/upload.py#newcode481 upload.py:481: # for that at the moment. On 2013/08/07 23:55:12, iannucci wrote: > Yay, flake. Can we maybe print a message when we're in verbose mode here? > StatusUpdate() looks like the right thing. Done. https://codereview.appspot.com/12629043/diff/1/upload.py#newcode1181 upload.py:1181: except: On 2013/08/07 23:55:12, iannucci wrote: > Can we glean useful info like the HTTP status code from the exception? Sure. Added the status code. Not sure what other useful information there is to glean. The 500 error response body the reitveld returns is generic and not helpful. https://codereview.appspot.com/12629043/diff/1/upload.py#newcode1207 upload.py:1207: file_id, new_content, is_binary, status, False)) On 2013/08/07 23:55:12, iannucci wrote: > Seems like this is a perfect opportunity to combine these two if clauses? Can't both of them be != None, i.e. don't we need to do two separate uploads sometimes? https://codereview.appspot.com/12629043/diff/1/upload.py#newcode1209 upload.py:1209: t.start() On 2013/08/07 23:55:12, iannucci wrote: > I think you won't be guaranteed to have a valid `t` here, since it's possible to > have a non-base, non-new-content file (i.e. updating from patchset 1 to patchset > 2 without a rebase.) I may be wrong here. > > t should be set to None before the if statement(s), and we should look for it to > be not None before start()/append()'ing it. New code avoids this. https://codereview.appspot.com/12629043/diff/1/upload.py#newcode2261 upload.py:2261: t.join() On 2013/08/07 22:47:35, jln wrote: > Add some timeout and raise error on t.isAlive() ? I added a timeout of 60 seconds. Not sure what a reasonable value here is given that we're doing an HTTP post. Open to suggestions on a better value.
Sign in to reply to this message.
iannucci is OOO for another week. maruel, do you maybe have time to review this?
Sign in to reply to this message.
+ codereview-discuss@ in case someone is not monitoring codereview-list@ IMHO, the main drawback of this CL is that it's now effectively python 2.6+. So I just want to know if anyone still need 2.5 support. I don't think so but it's worth asking first. https://codereview.appspot.com/12629043/diff/9001/upload.py File upload.py (right): https://codereview.appspot.com/12629043/diff/9001/upload.py#newcode57 upload.py:57: from multiprocessing.pool import ThreadPool This means upload.py becomes 2.6+. While the Chromium team is now all 2.6+, it may not be true for everyone. As a matter of fact, I don't even see this class being documented. https://codereview.appspot.com/12629043/diff/9001/upload.py#newcode624 upload.py:624: group.add_option("-n", "--number-parallel-uploads", action="store", action="store" is the default so it's not necessary.
Sign in to reply to this message.
I thought we've been requiring 2.6 in depot_tools for a long time ... -- Dirk https://codereview.appspot.com/12629043/diff/9001/upload.py File upload.py (right): https://codereview.appspot.com/12629043/diff/9001/upload.py#newcode624 upload.py:624: group.add_option("-n", "--number-parallel-uploads", action="store", I'd probably use -j/--jobs here for consistency w/ many other tools.
Sign in to reply to this message.
I think requiring python 2.6 is reasonable, but I can change the code back to not using multiprocessing if needed. https://codereview.appspot.com/12629043/diff/9001/upload.py File upload.py (right): https://codereview.appspot.com/12629043/diff/9001/upload.py#newcode624 upload.py:624: group.add_option("-n", "--number-parallel-uploads", action="store", On 2013/08/22 20:03:10, dpranke wrote: > I'd probably use -j/--jobs here for consistency w/ many other tools. Done. https://codereview.appspot.com/12629043/diff/9001/upload.py#newcode624 upload.py:624: group.add_option("-n", "--number-parallel-uploads", action="store", On 2013/08/22 18:30:23, M-A wrote: > action="store" is the default so it's not necessary. Done.
Sign in to reply to this message.
lgtm with nits. Nobody had issue on the ML for 3 weeks for good to go with python 2.6+. https://codereview.appspot.com/12629043/diff/18001/upload.py File upload.py (right): https://codereview.appspot.com/12629043/diff/18001/upload.py#newcode482 upload.py:482: StatusUpdate("Upload got a 500 response: %d" % e.code) Prefer ' for new strings instead of " https://codereview.appspot.com/12629043/diff/18001/upload.py#newcode625 upload.py:625: dest="num_upload_threads", default=25, 25 is a tad high as the default, I wouldn't default to more than 8. Not everything is in MTV... https://codereview.appspot.com/12629043/diff/18001/upload.py#newcode1164 upload.py:1164: elif options.verbose > 0: elif options.verbose: https://codereview.appspot.com/12629043/diff/18001/upload.py#newcode1183 upload.py:1183: response_body = ("Failed to upload file for %s. Got %d status code." % Eventually, we need retries there.
Sign in to reply to this message.
https://codereview.appspot.com/12629043/diff/18001/upload.py File upload.py (right): https://codereview.appspot.com/12629043/diff/18001/upload.py#newcode625 upload.py:625: dest="num_upload_threads", default=25, On 2013/09/13 15:20:06, M-A wrote: > 25 is a tad high as the default, I wouldn't default to more than 8. Not > everything is in MTV... Done. https://codereview.appspot.com/12629043/diff/18001/upload.py#newcode1164 upload.py:1164: elif options.verbose > 0: On 2013/09/13 15:20:06, M-A wrote: > elif options.verbose: Done. https://codereview.appspot.com/12629043/diff/18001/upload.py#newcode1183 upload.py:1183: response_body = ("Failed to upload file for %s. Got %d status code." % On 2013/09/13 15:20:06, M-A wrote: > Eventually, we need retries there. rpc_server.Send already does retries.
Sign in to reply to this message.
I don't have commit rights. Can someone commit this for me?
Sign in to reply to this message.
On 2013/09/13 23:38:04, ojan wrote: > I don't have commit rights. Can someone commit this for me? I want to get something else in first, will take a look at this one right after. Sorry for the delays.
Sign in to reply to this message.
Committed as 355577b0ddd8.
Sign in to reply to this message.
Message was sent while issue was closed.
Sign in to reply to this message.
Message was sent while issue was closed.
Sign in to reply to this message.
|