|
|
Created:
5 years, 6 months ago by Mike Schwartz Modified:
5 years, 6 months ago Reviewers:
houglum CC:
tjennison_google.com, gsutil-crs_google.com Visibility:
Public. |
DescriptionDisable multiprocessing (threading-only) under Alpine Linux
Patch Set 1 #Patch Set 2 : Fix comment #
Total comments: 4
Patch Set 3 : Changes per review comments #
Total comments: 7
Patch Set 4 : Fix config generation #Patch Set 5 : more review comments #
Total comments: 2
Patch Set 6 : fix missing logging import #MessagesTotal messages: 10
A couple comments, PTAL. FYI for discussion contributors: this review was not created with the `--private` flag, so it's open to the public. https://codereview.appspot.com/349790043/diff/20001/gslib/command.py File gslib/command.py (right): https://codereview.appspot.com/349790043/diff/20001/gslib/command.py#newcode1377 gslib/command.py:1377: if RunningOnAlpineLinux(): So, in writing this comment, I've found a few scattered/problematic things with the multiprocessing logic we have in place... - I don't think we even want to initialize (and thus require the corresponding teardown of) the multiprocessing data structures if running on Alpine. We should also be checking if we're running on Alpine in gslib.utils.parallelism_framework_util.CheckMultiprocessingAvailableAndInit() too, since that's the method used to determine whether or not we set up all the parallelism stuff in the first place. - The check on this line should instead be placed alongside the IS_WINDOWS check in `_GetProcessAndThreadCount()`. That grabs the default value for number of processes, as set in gslib.commands.config.DEFAULT_PARALLEL_PROCESS_COUNT, and returns 1 (or errors out and tells users to unset the value in their boto config or set it to 1). - The default value in gslib.commands.config.DEFAULT_PARALLEL_PROCESS_COUNT is only set to >1 if we're running on Linux... We state there that there are problems with Windows *and Mac* w.r.t. running multiple processes, but in the rest of the code, we don't prohibit multiprocessing on Macs. So either that comment is outdated, or we should also have multiprocessing restrictions throughout the code for Macs too...? - Sort of a continuation of the point above -- I think we should probably have a helper method, called something like `ShouldProhibitMultiprocessing()`, that returns True if we can safely use multiprocessing on the given environment. And we can check that function in the places where we're calling system_util.IS_WINDOWS to limit parallelism right now ( defining constants in gslib.commands.config, inside _GetProcessAndThreadCount(), and inside gslib.utils.parallelism_framework_util.CheckMultiprocessingAvailableAndInit() ). https://codereview.appspot.com/349790043/diff/20001/gslib/utils/system_util.py File gslib/utils/system_util.py (right): https://codereview.appspot.com/349790043/diff/20001/gslib/utils/system_util.p... gslib/utils/system_util.py:59: with open('/etc/os-release', 'r') as f: It's possible that this might raise an exception (file doesn't exist, we can't read it, etc.) -- can you put this in a `try:` / `except IOError as e:` block?
Sign in to reply to this message.
https://codereview.appspot.com/349790043/diff/20001/gslib/command.py File gslib/command.py (right): https://codereview.appspot.com/349790043/diff/20001/gslib/command.py#newcode1377 gslib/command.py:1377: if RunningOnAlpineLinux(): On 2018/10/12 08:28:13, houglum wrote: > So, in writing this comment, I've found a few scattered/problematic things with > the multiprocessing logic we have in place... > > - I don't think we even want to initialize (and thus require the corresponding > teardown of) the multiprocessing data structures if running on Alpine. We > should also be checking if we're running on Alpine in > gslib.utils.parallelism_framework_util.CheckMultiprocessingAvailableAndInit() > too, since that's the method used to determine whether or not we set up all the > parallelism stuff in the first place. > > - The check on this line should instead be placed alongside the IS_WINDOWS check > in `_GetProcessAndThreadCount()`. That grabs the default value for number of > processes, as set in gslib.commands.config.DEFAULT_PARALLEL_PROCESS_COUNT, and > returns 1 (or errors out and tells users to unset the value in their boto config > or set it to 1). > > - The default value in gslib.commands.config.DEFAULT_PARALLEL_PROCESS_COUNT is > only set to >1 if we're running on Linux... We state there that there are > problems with Windows *and Mac* w.r.t. running multiple processes, but in the > rest of the code, we don't prohibit multiprocessing on Macs. So either that > comment is outdated, or we should also have multiprocessing restrictions > throughout the code for Macs too...? > > - Sort of a continuation of the point above -- I think we should probably have a > helper method, called something like `ShouldProhibitMultiprocessing()`, that > returns True if we can safely use multiprocessing on the given environment. And > we can check that function in the places where we're calling > system_util.IS_WINDOWS to limit parallelism right now ( defining constants in > gslib.commands.config, inside _GetProcessAndThreadCount(), and inside > gslib.utils.parallelism_framework_util.CheckMultiprocessingAvailableAndInit() ). Good catches. Done. I also reworked the config file list printing because I noticed when I tested this on docker (where I had no config file) the error message brokenly suggested fixing parallel_process_count in the config files located at (nothing printed). https://codereview.appspot.com/349790043/diff/20001/gslib/utils/system_util.py File gslib/utils/system_util.py (right): https://codereview.appspot.com/349790043/diff/20001/gslib/utils/system_util.p... gslib/utils/system_util.py:59: with open('/etc/os-release', 'r') as f: On 2018/10/12 08:28:13, houglum wrote: > It's possible that this might raise an exception (file doesn't exist, we can't > read it, etc.) -- can you put this in a `try:` / `except IOError as e:` block? Done.
Sign in to reply to this message.
https://codereview.appspot.com/349790043/diff/40001/gslib/utils/parallelism_f... File gslib/utils/parallelism_framework_util.py (right): https://codereview.appspot.com/349790043/diff/40001/gslib/utils/parallelism_f... gslib/utils/parallelism_framework_util.py:254: raise IOError('Unable to open /etc/os-release to determine OS release') Do we want to raise an exception in this case? I'd opt for just returning (False, "") or (False, "Unknown") and emitting a debug-level log message if it's not an environment where we *know* we shouldn't use multiprocessing. https://codereview.appspot.com/349790043/diff/40001/gslib/utils/parallelism_f... gslib/utils/parallelism_framework_util.py:255: else: The "else:" isn't necessary here.
Sign in to reply to this message.
https://codereview.appspot.com/349790043/diff/40001/gslib/utils/parallelism_f... File gslib/utils/parallelism_framework_util.py (right): https://codereview.appspot.com/349790043/diff/40001/gslib/utils/parallelism_f... gslib/utils/parallelism_framework_util.py:254: raise IOError('Unable to open /etc/os-release to determine OS release') On 2018/10/12 21:47:23, houglum wrote: > Do we want to raise an exception in this case? I'd opt for just returning > (False, "") or (False, "Unknown") and emitting a debug-level log message if it's > not an environment where we *know* we shouldn't use multiprocessing. Done. https://codereview.appspot.com/349790043/diff/40001/gslib/utils/parallelism_f... gslib/utils/parallelism_framework_util.py:255: else: On 2018/10/12 21:47:23, houglum wrote: > The "else:" isn't necessary here. Sure, but it seems marginally more readable to me (makes the logical relationship explicit). I don't feel strongly about it, so if you want to save 1 line of text I'll do it ;-)
Sign in to reply to this message.
https://codereview.appspot.com/349790043/diff/40001/gslib/utils/parallelism_f... File gslib/utils/parallelism_framework_util.py (right): https://codereview.appspot.com/349790043/diff/40001/gslib/utils/parallelism_f... gslib/utils/parallelism_framework_util.py:254: raise IOError('Unable to open /etc/os-release to determine OS release') On 2018/10/12 22:06:34, Mike Schwartz wrote: > On 2018/10/12 21:47:23, houglum wrote: > > Do we want to raise an exception in this case? I'd opt for just returning > > (False, "") or (False, "Unknown") and emitting a debug-level log message if > it's > > not an environment where we *know* we shouldn't use multiprocessing. > > Done. I believe you, but I don't see the patch where this was added :P Did you forget to run the upload script on it? https://codereview.appspot.com/349790043/diff/40001/gslib/utils/parallelism_f... gslib/utils/parallelism_framework_util.py:255: else: On 2018/10/12 22:06:34, Mike Schwartz wrote: > On 2018/10/12 21:47:23, houglum wrote: > > The "else:" isn't necessary here. > > Sure, but it seems marginally more readable to me (makes the logical > relationship explicit). > > I don't feel strongly about it, so if you want to save 1 line of text I'll do it > ;-) Meh, I don't feel strongly. And if we're not raising in the block above anymore, the else becomes necessary anyway, so this comment thread isn't applicable anymore :)
Sign in to reply to this message.
https://codereview.appspot.com/349790043/diff/40001/gslib/utils/parallelism_f... File gslib/utils/parallelism_framework_util.py (right): https://codereview.appspot.com/349790043/diff/40001/gslib/utils/parallelism_f... gslib/utils/parallelism_framework_util.py:254: raise IOError('Unable to open /etc/os-release to determine OS release') On 2018/10/12 23:07:34, houglum wrote: > On 2018/10/12 22:06:34, Mike Schwartz wrote: > > On 2018/10/12 21:47:23, houglum wrote: > > > Do we want to raise an exception in this case? I'd opt for just returning > > > (False, "") or (False, "Unknown") and emitting a debug-level log message if > > it's > > > not an environment where we *know* we shouldn't use multiprocessing. > > > > Done. > > I believe you, but I don't see the patch where this was added :P Did you forget > to run the upload script on it? I uploaded to the wrong rietveld issue. I just tried again, PTAL.
Sign in to reply to this message.
https://codereview.appspot.com/349790043/diff/80001/gslib/utils/parallelism_f... File gslib/utils/parallelism_framework_util.py (right): https://codereview.appspot.com/349790043/diff/80001/gslib/utils/parallelism_f... gslib/utils/parallelism_framework_util.py:254: logger.debug('Unable to open /etc/os-release to determine whether OS ' `logger` isn't defined in this scope or at the module level, so this would crash. Import the logging module at the top and use `logging.debug` here instead. ^ I really need to make a pylintrc for this project and get linting set up as a presubmit check on GitHub - it catches references to undefined variables. I've made this same mistake more times than I can count :(
Sign in to reply to this message.
https://codereview.appspot.com/349790043/diff/80001/gslib/utils/parallelism_f... File gslib/utils/parallelism_framework_util.py (right): https://codereview.appspot.com/349790043/diff/80001/gslib/utils/parallelism_f... gslib/utils/parallelism_framework_util.py:254: logger.debug('Unable to open /etc/os-release to determine whether OS ' On 2018/10/12 23:36:33, houglum wrote: > `logger` isn't defined in this scope or at the module level, so this would > crash. Import the logging module at the top and use `logging.debug` here > instead. > > ^ I really need to make a pylintrc for this project and get linting set up as a > presubmit check on GitHub - it catches references to undefined variables. I've > made this same mistake more times than I can count :( Thanks for catching that. Done.
Sign in to reply to this message.
|