|
|
Created:
7 years, 9 months ago by michelzel Modified:
7 years, 8 months ago CC:
gsutil-crs_google.com Visibility:
Public. |
DescriptionAdd Message track
Patch Set 1 #
Total comments: 51
Patch Set 2 : CR feedback round 1 thobrla #
Total comments: 20
Patch Set 3 : CR feedback round 2 thobrla #
Total comments: 28
Patch Set 4 : CR feedback round 3 thobrla #
Total comments: 12
Patch Set 5 : CR feedback round 4 thobrla #
MessagesTotal messages: 14
Planned final commit message: Add Message track
Sign in to reply to this message.
https://codereview.appspot.com/296520043/diff/1/gslib/copy_helper.py File gslib/copy_helper.py (right): https://codereview.appspot.com/296520043/diff/1/gslib/copy_helper.py#newcode944 gslib/copy_helper.py:944: def _GetComponentNumber(component): This is only for uploads, right, and relies on the naming of components by _N? Add documentation, and, ideally, move this next to the function where we choose the component name? https://codereview.appspot.com/296520043/diff/1/gslib/copy_helper.py#newcode997 gslib/copy_helper.py:997: message_type='Component To Upload', time=time.time())) It seems like message_types should be an enum somewhere, since I assume you'll be parsing them in the UIThread. https://codereview.appspot.com/296520043/diff/1/gslib/copy_helper.py#newcode1006 gslib/copy_helper.py:1006: if component.generation: Here and below, when will component.generation not be present? https://codereview.appspot.com/296520043/diff/1/gslib/copy_helper.py#newcode2170 gslib/copy_helper.py:2170: size = component.end_byte-component.start_byte How did you test this? I suspect this variable name is invalid (the '-' should be an '_') https://codereview.appspot.com/296520043/diff/1/gslib/parallelism_framework_u... File gslib/parallelism_framework_util.py (right): https://codereview.appspot.com/296520043/diff/1/gslib/parallelism_framework_u... gslib/parallelism_framework_util.py:50: def __len__(self): I'm curious, why is this needed? https://codereview.appspot.com/296520043/diff/1/gslib/progress_callback.py File gslib/progress_callback.py (right): https://codereview.appspot.com/296520043/diff/1/gslib/progress_callback.py#ne... gslib/progress_callback.py:122: """Outputs progress info for large operations like file copy or hash.""" This comment should be amended now, since this class now posts messages for output and tracking. https://codereview.appspot.com/296520043/diff/1/gslib/progress_callback.py#ne... gslib/progress_callback.py:134: src_url: FileUrl or StorageUrl representing the source file. In which cases are each of these values expected to be None? Add comments about this. https://codereview.appspot.com/296520043/diff/1/gslib/progress_callback.py#ne... gslib/progress_callback.py:156: Actual message is printed to stderr by UIThread End sentences with punctuation. https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py File gslib/thread_message.py (right): https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:24: class Message(object): Use a more specific class name; "Message" is very ambiguous. How about "StatusMessage"? https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:32: time: Time that this message was created. In what format? https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:42: It includes the size of a file/component besides the components of a Message. the use of the word "component" is overloaded here. Given that this inherits from message, I don't think you need to document that those members are included. https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:49: size: File/component size. integer, or string? https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:57: """Message class for files/components being processed. What does it mean to "be processed"? https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:59: This class only contains general information about each file/component I'm confused by the language here. It reads to me as "the only reason that this contains general information is that total size and time remaining can be calculated." Is that even true? https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:69: name: Name of the file/component to be processed. Is this the destination or the source? https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:70: size: Number of bytes that compose this file/component. It can be How does this differ from DataMessage size? If it's the same, why is it documented differently. Also, this refers to the total size, right? https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:71: converted into a readable format by using gslib.util.MakeHumanReadable. I don't think it's necessary to document a function that you might use with this message. https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:72: finished: Boolean to indicate whether this is a start or end message. specifically, the start of the transfer of the file/object or the end? https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:88: It is mainly called from copy_helper (also called from commands/hash.py and I don't think it's necessary to document where it's used - users can search for FileProgressCallbackHandler. But really, couldn't this be produced anywhere? I think it's better to state what the class accomplishes and how it should be used. https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:91: This class contains specific information about the current progress of a file "file component"? or: "file or component"? https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:92: component. The class merely tracks the progress of the file. "merely" doesn't add meaning here https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:105: src_url: FileUrl/CloudUrl representing the source file. I assume this is overloaded for unary operations like hashing - would be useful to call that out here. https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:127: queue. It merely tracks the number of files remaining. "merely" doesn't add meaning here https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:135: remaining. This is really the total number, not the number remaining, correct?
Sign in to reply to this message.
https://codereview.appspot.com/296520043/diff/1/gslib/copy_helper.py File gslib/copy_helper.py (right): https://codereview.appspot.com/296520043/diff/1/gslib/copy_helper.py#newcode944 gslib/copy_helper.py:944: def _GetComponentNumber(component): On 2016/06/22 01:05:34, thobrla wrote: > This is only for uploads, right, and relies on the naming of components by _N? > > Add documentation, and, ideally, move this next to the function where we choose > the component name? Done. https://codereview.appspot.com/296520043/diff/1/gslib/copy_helper.py#newcode997 gslib/copy_helper.py:997: message_type='Component To Upload', time=time.time())) On 2016/06/22 01:05:34, thobrla wrote: > It seems like message_types should be an enum somewhere, since I assume you'll > be parsing them in the UIThread. I did add integer fields to specify each type of message and changed all the message_type calls to those integers, but I was wondering why would that be better? I'm assuming it's because it is faster, but I wanted to confirm. https://codereview.appspot.com/296520043/diff/1/gslib/copy_helper.py#newcode1006 gslib/copy_helper.py:1006: if component.generation: On 2016/06/22 01:05:34, thobrla wrote: > Here and below, when will component.generation not be present? It just a sort of precautionary measure, but they should have a generation field. On the end message, however, it mixes with 'Component to Upload' messages, which don't have a generation. I can probably take those off for those 2 for's here. https://codereview.appspot.com/296520043/diff/1/gslib/copy_helper.py#newcode2170 gslib/copy_helper.py:2170: size = component.end_byte-component.start_byte On 2016/06/22 01:05:35, thobrla wrote: > How did you test this? I suspect this variable name is invalid (the '-' should > be an '_') I'm not sure why would that be the case, this is just a subtraction and seems to be working fine. I will space it, though. https://codereview.appspot.com/296520043/diff/1/gslib/parallelism_framework_u... File gslib/parallelism_framework_util.py (right): https://codereview.appspot.com/296520043/diff/1/gslib/parallelism_framework_u... gslib/parallelism_framework_util.py:50: def __len__(self): On 2016/06/22 01:05:35, thobrla wrote: > I'm curious, why is this needed? It isn't, but while I was making some tests with the UIThread I realized we did not have len(atomic_dict) set, so I thought I would create it if anybody needs it in the future. But it is totally removable https://codereview.appspot.com/296520043/diff/1/gslib/progress_callback.py File gslib/progress_callback.py (right): https://codereview.appspot.com/296520043/diff/1/gslib/progress_callback.py#ne... gslib/progress_callback.py:122: """Outputs progress info for large operations like file copy or hash.""" On 2016/06/22 01:05:35, thobrla wrote: > This comment should be amended now, since this class now posts messages for > output and tracking. Done. https://codereview.appspot.com/296520043/diff/1/gslib/progress_callback.py#ne... gslib/progress_callback.py:134: src_url: FileUrl or StorageUrl representing the source file. On 2016/06/22 01:05:35, thobrla wrote: > In which cases are each of these values expected to be None? Add comments about > this. Done. https://codereview.appspot.com/296520043/diff/1/gslib/progress_callback.py#ne... gslib/progress_callback.py:156: Actual message is printed to stderr by UIThread On 2016/06/22 01:05:35, thobrla wrote: > End sentences with punctuation. Done, thanks! https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py File gslib/thread_message.py (right): https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:24: class Message(object): On 2016/06/22 01:05:36, thobrla wrote: > Use a more specific class name; "Message" is very ambiguous. How about > "StatusMessage"? Sounds good, I changed it. https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:32: time: Time that this message was created. On 2016/06/22 01:05:35, thobrla wrote: > In what format? Added "(since Epoch)" for all occurrences of time as argument. https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:42: It includes the size of a file/component besides the components of a Message. On 2016/06/22 01:05:35, thobrla wrote: > the use of the word "component" is overloaded here. > > Given that this inherits from message, I don't think you need to document that > those members are included. Ok, changed it to just "It includes the size of a file/component." https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:49: size: File/component size. On 2016/06/22 01:05:35, thobrla wrote: > integer, or string? Solved. https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:57: """Message class for files/components being processed. On 2016/06/22 01:05:35, thobrla wrote: > What does it mean to "be processed"? Rephrased it. https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:59: This class only contains general information about each file/component On 2016/06/22 01:05:35, thobrla wrote: > I'm confused by the language here. It reads to me as "the only reason that this > contains general information is that total size and time remaining can be > calculated." Is that even true? > No, it was supposed to be "this class contains only" instead of "this class only contains". This would make more sense, but I'm rephrasing it to make it clearer. https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:69: name: Name of the file/component to be processed. On 2016/06/22 01:05:35, thobrla wrote: > Is this the destination or the source? It is the source for all files or a download component, and the destination for a upload component. I'll make sure to include that. https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:70: size: Number of bytes that compose this file/component. It can be On 2016/06/22 01:05:35, thobrla wrote: > How does this differ from DataMessage size? If it's the same, why is it > documented differently. Also, this refers to the total size, right? Now they have the same documentation. https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:71: converted into a readable format by using gslib.util.MakeHumanReadable. On 2016/06/22 01:05:35, thobrla wrote: > I don't think it's necessary to document a function that you might use with this > message. Removed in all occurrences. https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:72: finished: Boolean to indicate whether this is a start or end message. On 2016/06/22 01:05:35, thobrla wrote: > specifically, the start of the transfer of the file/object or the end? I didn't quite understand, but I'm supposing the description of finished was not clear enough. I changed it. https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:88: It is mainly called from copy_helper (also called from commands/hash.py and On 2016/06/22 01:05:35, thobrla wrote: > I don't think it's necessary to document where it's used - users can search for > FileProgressCallbackHandler. > > But really, couldn't this be produced anywhere? I think it's better to state > what the class accomplishes and how it should be used. Totally agree. I honestly don't know why I put this in the first place. My bad. https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:91: This class contains specific information about the current progress of a file On 2016/06/22 01:05:35, thobrla wrote: > "file component"? > or: > "file or component"? file or component, thanks. https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:92: component. The class merely tracks the progress of the file. On 2016/06/22 01:05:36, thobrla wrote: > "merely" doesn't add meaning here Removed. https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:105: src_url: FileUrl/CloudUrl representing the source file. On 2016/06/22 01:05:35, thobrla wrote: > I assume this is overloaded for unary operations like hashing - would be useful > to call that out here. Is this referring to dst_url? https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:127: queue. It merely tracks the number of files remaining. On 2016/06/22 01:05:36, thobrla wrote: > "merely" doesn't add meaning here Removed. https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:135: remaining. On 2016/06/22 01:05:35, thobrla wrote: > This is really the total number, not the number remaining, correct? Yes, I had put this a while ago (when I thought the SeekAheadThread only looked at the tasks not included in the task_queue). I updated the description.
Sign in to reply to this message.
https://codereview.appspot.com/296520043/diff/1/gslib/copy_helper.py File gslib/copy_helper.py (right): https://codereview.appspot.com/296520043/diff/1/gslib/copy_helper.py#newcode2170 gslib/copy_helper.py:2170: size = component.end_byte-component.start_byte On 2016/06/23 00:16:20, michelzel wrote: > On 2016/06/22 01:05:35, thobrla wrote: > > How did you test this? I suspect this variable name is invalid (the '-' > should > > be an '_') > > I'm not sure why would that be the case, this is just a subtraction and seems to > be working fine. I will space it, though. Sorry, I got very confused by the lack of spacing! https://codereview.appspot.com/296520043/diff/1/gslib/parallelism_framework_u... File gslib/parallelism_framework_util.py (right): https://codereview.appspot.com/296520043/diff/1/gslib/parallelism_framework_u... gslib/parallelism_framework_util.py:50: def __len__(self): On 2016/06/23 00:16:21, michelzel wrote: > On 2016/06/22 01:05:35, thobrla wrote: > > I'm curious, why is this needed? > > It isn't, but while I was making some tests with the UIThread I realized we did > not have len(atomic_dict) set, so I thought I would create it if anybody needs > it in the future. But it is totally removable If it's not necessary for this changelist, don't include it here. In general, we defer defining functions until they're used/needed. https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py File gslib/thread_message.py (right): https://codereview.appspot.com/296520043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:105: src_url: FileUrl/CloudUrl representing the source file. On 2016/06/23 00:16:22, michelzel wrote: > On 2016/06/22 01:05:35, thobrla wrote: > > I assume this is overloaded for unary operations like hashing - would be > useful > > to call that out here. > > Is this referring to dst_url? Sorry, yes. https://codereview.appspot.com/296520043/diff/20001/gslib/copy_helper.py File gslib/copy_helper.py (right): https://codereview.appspot.com/296520043/diff/20001/gslib/copy_helper.py#newc... gslib/copy_helper.py:988: def _GetComponentNumber(component): Any particular reason to define this function inline? I'd suggest moving it to the module scope. https://codereview.appspot.com/296520043/diff/20001/gslib/copy_helper.py#newc... gslib/copy_helper.py:989: """Gets component number from component CloudUrl. Used during parallel composite upload. https://codereview.appspot.com/296520043/diff/20001/gslib/progress_callback.py File gslib/progress_callback.py (right): https://codereview.appspot.com/296520043/diff/20001/gslib/progress_callback.p... gslib/progress_callback.py:140: this might also be under under src_url.url_string for uploads. None if I don't think calling out its relation to src_url.url_string is necessary. https://codereview.appspot.com/296520043/diff/20001/gslib/progress_callback.p... gslib/progress_callback.py:141: not divided into components. Nit: here and above, use the language pattern "this, or None if condition", i.e., "... representing the destination file, or None if representing" https://codereview.appspot.com/296520043/diff/20001/gslib/thread_message.py File gslib/thread_message.py (right): https://codereview.appspot.com/296520043/diff/20001/gslib/thread_message.py#n... gslib/thread_message.py:53: SeekAheadMessage, this is the sum of the size of all objects. It seems strange for a parent class to mention its subclass here. While SeekAheadMessage and FileMessage/ProgressMessage happen to share size as a member variable, their semantics are quite different. Consider having SeekAheadMessage as a separate class. https://codereview.appspot.com/296520043/diff/20001/gslib/thread_message.py#n... gslib/thread_message.py:68: FILE = 1 Comment that these are message types? https://codereview.appspot.com/296520043/diff/20001/gslib/thread_message.py#n... gslib/thread_message.py:79: name: Name of the file/component to be processed. If a file, always refers This semantic is a little confusing to me, since it sometimes refers to the source and other times to the destination. Is this the destination of the final file for uploads, or the component? How will we associate components with their parents? https://codereview.appspot.com/296520043/diff/20001/gslib/thread_message.py#n... gslib/thread_message.py:101: or a single component, depending on the approach taken. "depending on the approach taken" doesn't add meaning here. https://codereview.appspot.com/296520043/diff/20001/gslib/thread_message.py#n... gslib/thread_message.py:109: size: Number of bytes that compose this file/component, in bytes. Simpler: "Total size of this file/component, in bytes." https://codereview.appspot.com/296520043/diff/20001/gslib/thread_message.py#n... gslib/thread_message.py:114: dst_url: FileUrl/CloudUrl representing the destination file. Not useful ", or None for unary operations like hashing."
Sign in to reply to this message.
It took longer than I thought it would, I apologize for that! https://codereview.appspot.com/296520043/diff/20001/gslib/copy_helper.py File gslib/copy_helper.py (right): https://codereview.appspot.com/296520043/diff/20001/gslib/copy_helper.py#newc... gslib/copy_helper.py:988: def _GetComponentNumber(component): On 2016/06/23 01:45:09, thobrla wrote: > Any particular reason to define this function inline? I'd suggest moving it to > the module scope. No, it's just that it was originally inline, but I can surely remove it. However, I tried to use this for ProgressMessage, but it would return wrong results and generate wrong test cases, so I decided to keep it for FileMessages (describing components) only. https://codereview.appspot.com/296520043/diff/20001/gslib/copy_helper.py#newc... gslib/copy_helper.py:989: """Gets component number from component CloudUrl. On 2016/06/23 01:45:09, thobrla wrote: > Used during parallel composite upload. Done. https://codereview.appspot.com/296520043/diff/20001/gslib/progress_callback.py File gslib/progress_callback.py (right): https://codereview.appspot.com/296520043/diff/20001/gslib/progress_callback.p... gslib/progress_callback.py:140: this might also be under under src_url.url_string for uploads. None if On 2016/06/23 01:45:09, thobrla wrote: > I don't think calling out its relation to src_url.url_string is necessary. Done. https://codereview.appspot.com/296520043/diff/20001/gslib/progress_callback.p... gslib/progress_callback.py:141: not divided into components. On 2016/06/23 01:45:09, thobrla wrote: > Nit: here and above, use the language pattern "this, or None if condition", > i.e., "... representing the destination file, or None if representing" Done. https://codereview.appspot.com/296520043/diff/20001/gslib/thread_message.py File gslib/thread_message.py (right): https://codereview.appspot.com/296520043/diff/20001/gslib/thread_message.py#n... gslib/thread_message.py:53: SeekAheadMessage, this is the sum of the size of all objects. On 2016/06/23 01:45:09, thobrla wrote: > It seems strange for a parent class to mention its subclass here. While > SeekAheadMessage and FileMessage/ProgressMessage happen to share size as a > member variable, their semantics are quite different. Consider having > SeekAheadMessage as a separate class. Yeah, that makes sense. However, since I am creating DataMessage only to handle size for FileMessage and ProgressMessage, maybe it would be better to erase DataMessage after all and make all of our classes inherit StatusMessage directly? https://codereview.appspot.com/296520043/diff/20001/gslib/thread_message.py#n... gslib/thread_message.py:68: FILE = 1 On 2016/06/23 01:45:09, thobrla wrote: > Comment that these are message types? Done. https://codereview.appspot.com/296520043/diff/20001/gslib/thread_message.py#n... gslib/thread_message.py:79: name: Name of the file/component to be processed. If a file, always refers On 2016/06/23 01:45:09, thobrla wrote: > This semantic is a little confusing to me, since it sometimes refers to the > source and other times to the destination. > > Is this the destination of the final file for uploads, or the component? How > will we associate components with their parents? Agree it is confusing and maybe insufficient. I am adding src_url and dst_url fields here. dst_url will always exist, but src_url might be None (we lost our source for existing components when uploading, for instance) https://codereview.appspot.com/296520043/diff/20001/gslib/thread_message.py#n... gslib/thread_message.py:101: or a single component, depending on the approach taken. On 2016/06/23 01:45:09, thobrla wrote: > "depending on the approach taken" doesn't add meaning here. Done. https://codereview.appspot.com/296520043/diff/20001/gslib/thread_message.py#n... gslib/thread_message.py:109: size: Number of bytes that compose this file/component, in bytes. On 2016/06/23 01:45:09, thobrla wrote: > Simpler: "Total size of this file/component, in bytes." Done. https://codereview.appspot.com/296520043/diff/20001/gslib/thread_message.py#n... gslib/thread_message.py:114: dst_url: FileUrl/CloudUrl representing the destination file. Not useful On 2016/06/23 01:45:09, thobrla wrote: > ", or None for unary operations like hashing." Done.
Sign in to reply to this message.
https://codereview.appspot.com/296520043/diff/40001/gslib/copy_helper.py File gslib/copy_helper.py (right): https://codereview.appspot.com/296520043/diff/40001/gslib/copy_helper.py#newc... gslib/copy_helper.py:1427: try: Here and below, this isn't an exceptional case, and thus we shouldn't use try/except to handle it. Is the problem that we don't know at this point if it is a component or not? This seems problematic - what if a user uploaded an object named "cloudobject_1"? You'll need to find a better way of plumbing through whether this is a component or not. https://codereview.appspot.com/296520043/diff/40001/gslib/copy_helper.py#newc... gslib/copy_helper.py:2230: try: Sorry I didn't catch this try/except previously. https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py File gslib/thread_message.py (right): https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:42: """Indicates beginning or end of a operation onto a file/component. This also applies to cloud objects, right? Also, "operation onto" doesn't sound correct, syntactically https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:57: def __init__(self, dst_url, src_url=None, size=None, finished=False, Depending on what you decide to do with the src_url below, it would be nice if FileMessage and ProgressMessage had the same ordering of src/dst args. https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:58: component_num=None, message_type=None, time=None): Here and in other classes, at what point can time be None? It seems like this is a required argument. https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:63: src_url: FileUrl/CloudUrl representing the source file. None if Wouldn't an existing component still have the same source file, ultimately? https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:64: describing an existing component or existing object to delete. If this is an existing object to delete, that seems like a metadata operation. https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:70: time: Time that this message was created (since Epoch). Here and in other classes, is this a datetime object, an integer, etc? https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:86: or a single component. Similarly, this affects cloud objects https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:99: dst_url: FileUrl/CloudUrl representing the destination file, or None. extra punctuation here
Sign in to reply to this message.
Since I had a couple of questions, I will wait to hear what should be my approach before submitting the code, but everything that I knew for sure what to do has already been implemented. PS: I felt I was not very productive today (you can see by the time I'm finally submitting this), and I think it has something to do with the fact that I had to sleep here in the office the last couple of days (I managed to lost both my apartment and room keys, and the leasing office has yet to provide me a replacement). I will try to get some rest now, so I might enter a bit late tomorrow morning. https://codereview.appspot.com/296520043/diff/40001/gslib/copy_helper.py File gslib/copy_helper.py (right): https://codereview.appspot.com/296520043/diff/40001/gslib/copy_helper.py#newc... gslib/copy_helper.py:1427: try: On 2016/06/24 00:09:03, thobrla wrote: > Here and below, this isn't an exceptional case, and thus we shouldn't use > try/except to handle it. Is the problem that we don't know at this point if it > is a component or not? This seems problematic - what if a user uploaded an > object named "cloudobject_1"? > > You'll need to find a better way of plumbing through whether this is a component > or not. That was the original intent. However, I remembered that we had discussed about NonResumableUpload not being broken down into components, right? So I could just remove that. As for your concern on "cloudobject_1", although I do not think we are going to face this issue, wouldn't that be overcome by the fact that the url_string of "cloudobject_1" would contain a hash for this object and would not finish with _1 (unless it was the first component)? I had this impression https://codereview.appspot.com/296520043/diff/40001/gslib/copy_helper.py#newc... gslib/copy_helper.py:2230: try: On 2016/06/24 00:09:03, thobrla wrote: > Sorry I didn't catch this try/except previously. Is something wrong with it? This was more of a precautionary thing, because cp_result should always have bytes_transferred and component_num. https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py File gslib/thread_message.py (right): https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:42: """Indicates beginning or end of a operation onto a file/component. On 2016/06/24 00:09:04, thobrla wrote: > This also applies to cloud objects, right? > > Also, "operation onto" doesn't sound correct, syntactically Yeah, that didn't feel very right when I was writing it. Is 'upon' better? https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:57: def __init__(self, dst_url, src_url=None, size=None, finished=False, On 2016/06/24 00:09:04, thobrla wrote: > Depending on what you decide to do with the src_url below, it would be nice if > FileMessage and ProgressMessage had the same ordering of src/dst args. The problem is I might not always have src_url. If we manage to get this, I agree they should be in the same order, as they previously were. https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:58: component_num=None, message_type=None, time=None): On 2016/06/24 00:09:03, thobrla wrote: > Here and in other classes, at what point can time be None? It seems like this > is a required argument. Changed it to be a required argument for all classes. https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:63: src_url: FileUrl/CloudUrl representing the source file. None if On 2016/06/24 00:09:04, thobrla wrote: > Wouldn't an existing component still have the same source file, ultimately? Yes, but I do not seem able to retrieve it easily. Any suggestions? https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:64: describing an existing component or existing object to delete. On 2016/06/24 00:09:04, thobrla wrote: > If this is an existing object to delete, that seems like a metadata operation. Yes. I will change that. https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:70: time: Time that this message was created (since Epoch). On 2016/06/24 00:09:03, thobrla wrote: > Here and in other classes, is this a datetime object, an integer, etc? Integer, I will add that. https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:86: or a single component. On 2016/06/24 00:09:03, thobrla wrote: > Similarly, this affects cloud objects Changed. https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:99: dst_url: FileUrl/CloudUrl representing the destination file, or None. On 2016/06/24 00:09:03, thobrla wrote: > extra punctuation here Done.
Sign in to reply to this message.
I don't actually see a patch for the most recent round of comments. https://codereview.appspot.com/296520043/diff/40001/gslib/copy_helper.py File gslib/copy_helper.py (right): https://codereview.appspot.com/296520043/diff/40001/gslib/copy_helper.py#newc... gslib/copy_helper.py:1427: try: On 2016/06/24 08:29:14, michelzel wrote: > On 2016/06/24 00:09:03, thobrla wrote: > > Here and below, this isn't an exceptional case, and thus we shouldn't use > > try/except to handle it. Is the problem that we don't know at this point if > it > > is a component or not? This seems problematic - what if a user uploaded an > > object named "cloudobject_1"? > > > > You'll need to find a better way of plumbing through whether this is a > component > > or not. > > That was the original intent. However, I remembered that we had discussed about > NonResumableUpload not being broken down into components, right? So I could just > remove that. > > As for your concern on "cloudobject_1", although I do not think we are going to > face this issue, wouldn't that be overcome by the fact that the url_string of > "cloudobject_1" would contain a hash for this object and would not finish with > _1 (unless it was the first component)? I had this impression We should confirm (and comment) if non-resumable upload is guaranteed not to be broken down into components. URL strings representing object names are hashed only for tracker files, but not for the purpose of passing the filename around. https://codereview.appspot.com/296520043/diff/40001/gslib/copy_helper.py#newc... gslib/copy_helper.py:2230: try: On 2016/06/24 08:29:14, michelzel wrote: > On 2016/06/24 00:09:03, thobrla wrote: > > Sorry I didn't catch this try/except previously. > > Is something wrong with it? This was more of a precautionary thing, because > cp_result should always have bytes_transferred and component_num. Yes, here and in the cases above, this is doing one of two things: 1. Attempt to hedge against a code bug (in which case we should fix the bug) 2. Handling an expected case as an exceptional one (in which case we should pass the right arguments to handle the exceptional case). More generally, it seems like we're using try/except as an if statement, because we're doing a very similar thing in the "exceptional" case. That's not a good coding practice. https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py File gslib/thread_message.py (right): https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:42: """Indicates beginning or end of a operation onto a file/component. On 2016/06/24 08:29:14, michelzel wrote: > On 2016/06/24 00:09:04, thobrla wrote: > > This also applies to cloud objects, right? > > > > Also, "operation onto" doesn't sound correct, syntactically > > Yeah, that didn't feel very right when I was writing it. Is 'upon' better? "upon" sounds like it refers to the destination. Perhaps "for"? Also, grammar: "an operation" https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:63: src_url: FileUrl/CloudUrl representing the source file. None if On 2016/06/24 08:29:14, michelzel wrote: > On 2016/06/24 00:09:04, thobrla wrote: > > Wouldn't an existing component still have the same source file, ultimately? > > Yes, but I do not seem able to retrieve it easily. Any suggestions? We must have the source URL at some point to choose to resume the upload from existing components, right? We have it in _DoParallelCompositeUpload.
Sign in to reply to this message.
https://codereview.appspot.com/296520043/diff/40001/gslib/copy_helper.py File gslib/copy_helper.py (right): https://codereview.appspot.com/296520043/diff/40001/gslib/copy_helper.py#newc... gslib/copy_helper.py:1427: try: On 2016/06/24 16:14:38, thobrla wrote: > On 2016/06/24 08:29:14, michelzel wrote: > > On 2016/06/24 00:09:03, thobrla wrote: > > > Here and below, this isn't an exceptional case, and thus we shouldn't use > > > try/except to handle it. Is the problem that we don't know at this point if > > it > > > is a component or not? This seems problematic - what if a user uploaded an > > > object named "cloudobject_1"? > > > > > > You'll need to find a better way of plumbing through whether this is a > > component > > > or not. > > > > That was the original intent. However, I remembered that we had discussed > about > > NonResumableUpload not being broken down into components, right? So I could > just > > remove that. > > > > As for your concern on "cloudobject_1", although I do not think we are going > to > > face this issue, wouldn't that be overcome by the fact that the url_string of > > "cloudobject_1" would contain a hash for this object and would not finish with > > _1 (unless it was the first component)? I had this impression > > We should confirm (and comment) if non-resumable upload is guaranteed not to be > broken down into components. > > URL strings representing object names are hashed only for tracker files, but not > for the purpose of passing the filename around. Ok, I added a comment to both _UploadFileToObjectNonResumable and _DownloadObjectToFileNonResumable. https://codereview.appspot.com/296520043/diff/40001/gslib/copy_helper.py#newc... gslib/copy_helper.py:2230: try: On 2016/06/24 16:14:38, thobrla wrote: > On 2016/06/24 08:29:14, michelzel wrote: > > On 2016/06/24 00:09:03, thobrla wrote: > > > Sorry I didn't catch this try/except previously. > > > > Is something wrong with it? This was more of a precautionary thing, because > > cp_result should always have bytes_transferred and component_num. > > Yes, here and in the cases above, this is doing one of two things: > 1. Attempt to hedge against a code bug (in which case we should fix the bug) > 2. Handling an expected case as an exceptional one (in which case we should pass > the right arguments to handle the exceptional case). > > More generally, it seems like we're using try/except as an if statement, because > we're doing a very similar thing in the "exceptional" case. That's not a good > coding practice. Ok, noted. I was able to remove all the try/excepts I had added. I had to send an extra argument to _UploadFileToObjectResumable named is_component, which will indicate whether or not we should get the component number. In this try/except statement here, I just removed the try/except because every cp_result is supposed to have bytes_transferred and component_num fields. https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py File gslib/thread_message.py (right): https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:42: """Indicates beginning or end of a operation onto a file/component. On 2016/06/24 16:14:38, thobrla wrote: > On 2016/06/24 08:29:14, michelzel wrote: > > On 2016/06/24 00:09:04, thobrla wrote: > > > This also applies to cloud objects, right? > > > > > > Also, "operation onto" doesn't sound correct, syntactically > > > > Yeah, that didn't feel very right when I was writing it. Is 'upon' better? > > "upon" sounds like it refers to the destination. Perhaps "for"? Also, grammar: > "an operation" Yes, of course. My English is bad, but not that much, sorry. https://codereview.appspot.com/296520043/diff/40001/gslib/thread_message.py#n... gslib/thread_message.py:63: src_url: FileUrl/CloudUrl representing the source file. None if On 2016/06/24 16:14:38, thobrla wrote: > On 2016/06/24 08:29:14, michelzel wrote: > > On 2016/06/24 00:09:04, thobrla wrote: > > > Wouldn't an existing component still have the same source file, ultimately? > > > > Yes, but I do not seem able to retrieve it easily. Any suggestions? > > We must have the source URL at some point to choose to resume the upload from > existing components, right? We have it in _DoParallelCompositeUpload. I added this, and now we have src_url, dst_url and time as required arguments, keeping the order (src_url, dst_url) similar in all the classes here.
Sign in to reply to this message.
https://codereview.appspot.com/296520043/diff/60001/gslib/copy_helper.py File gslib/copy_helper.py (right): https://codereview.appspot.com/296520043/diff/60001/gslib/copy_helper.py#newc... gslib/copy_helper.py:1501: if is_component: Nit: Here and below, you could avoid duplicating arguments and calling this function twice by initializing component_num conditionally: component_num = _GetComponentNumber(dst_url) if is_component else None https://codereview.appspot.com/296520043/diff/60001/gslib/copy_helper.py#newc... gslib/copy_helper.py:1737: is_component=(src_obj_metadata is None)) I'm concerned that if we later decide to pass src_obj_metadata for components, this will be an easy bug to miss. Can we make it more explicit via function args or some other approach? https://codereview.appspot.com/296520043/diff/60001/gslib/copy_helper.py#newc... gslib/copy_helper.py:2943: if src_obj_metadata: Should this use src_obj_size as calculated below? https://codereview.appspot.com/296520043/diff/60001/gslib/tests/test_update.py File gslib/tests/test_update.py (right): https://codereview.appspot.com/296520043/diff/60001/gslib/tests/test_update.p... gslib/tests/test_update.py:48: @unittest.skip('This test performs a large copy but remains here for ' I don't think we should skip this test by default. It performs a large copy only if there is a large populated git repo. I'd rather add a #TODO to fix the test to ignore the .git parts. https://codereview.appspot.com/296520043/diff/60001/gslib/thread_message.py File gslib/thread_message.py (right): https://codereview.appspot.com/296520043/diff/60001/gslib/thread_message.py#n... gslib/thread_message.py:50: FILE = 1 Does this mean an upload or a download? How will the UI thread know. Presently we embed this in the announce text for the progress callback. https://codereview.appspot.com/296520043/diff/60001/gslib/thread_message.py#n... gslib/thread_message.py:62: time: Integer representing when message was created (seconds since Epoch). time.time() is used everywhere to populate this value, and it returns a float
Sign in to reply to this message.
https://codereview.appspot.com/296520043/diff/60001/gslib/copy_helper.py File gslib/copy_helper.py (right): https://codereview.appspot.com/296520043/diff/60001/gslib/copy_helper.py#newc... gslib/copy_helper.py:1501: if is_component: On 2016/06/24 21:31:00, thobrla wrote: > Nit: Here and below, you could avoid duplicating arguments and calling this > function twice by initializing component_num conditionally: > component_num = _GetComponentNumber(dst_url) if is_component else None Yes, makes total sense. https://codereview.appspot.com/296520043/diff/60001/gslib/copy_helper.py#newc... gslib/copy_helper.py:1737: is_component=(src_obj_metadata is None)) On 2016/06/24 21:31:00, thobrla wrote: > I'm concerned that if we later decide to pass src_obj_metadata for components, > this will be an easy bug to miss. Can we make it more explicit via function args > or some other approach? Added is_component field to avoid this issue here and below. https://codereview.appspot.com/296520043/diff/60001/gslib/copy_helper.py#newc... gslib/copy_helper.py:2943: if src_obj_metadata: On 2016/06/24 21:31:00, thobrla wrote: > Should this use src_obj_size as calculated below? Yes, this is better. Thanks for the tip! https://codereview.appspot.com/296520043/diff/60001/gslib/tests/test_update.py File gslib/tests/test_update.py (right): https://codereview.appspot.com/296520043/diff/60001/gslib/tests/test_update.p... gslib/tests/test_update.py:48: @unittest.skip('This test performs a large copy but remains here for ' On 2016/06/24 21:31:01, thobrla wrote: > I don't think we should skip this test by default. It performs a large copy > only if there is a large populated git repo. I'd rather add a #TODO to fix the > test to ignore the .git parts. No sorry, I just forgot to remove this. It's actually been a while since I can't finish this test. https://codereview.appspot.com/296520043/diff/60001/gslib/thread_message.py File gslib/thread_message.py (right): https://codereview.appspot.com/296520043/diff/60001/gslib/thread_message.py#n... gslib/thread_message.py:50: FILE = 1 On 2016/06/24 21:31:01, thobrla wrote: > Does this mean an upload or a download? How will the UI thread know. Presently > we embed this in the announce text for the progress callback. Added types for all possible operations. https://codereview.appspot.com/296520043/diff/60001/gslib/thread_message.py#n... gslib/thread_message.py:62: time: Integer representing when message was created (seconds since Epoch). On 2016/06/24 21:31:01, thobrla wrote: > time.time() is used everywhere to populate this value, and it returns a float Done.
Sign in to reply to this message.
LGTM Please get review from yarbrough@ also. As discussed, once you have LGTMs from both of us, squash changes so that you can start the UI branch, but don't push to master.
Sign in to reply to this message.
Ok, cool! On Fri, Jun 24, 2016 at 4:44 PM, <thobrla@google.com> wrote: > LGTM > > Please get review from yarbrough@ also. > > As discussed, once you have LGTMs from both of us, squash changes so > that you can start the UI branch, but don't push to master. > > https://codereview.appspot.com/296520043/ >
Sign in to reply to this message.
|