|
|
Created:
7 years, 9 months ago by michelzel Modified:
7 years, 8 months ago CC:
gsutil-crs_google.com Visibility:
Public. |
DescriptionFile UI Design
Patch Set 1 #
Total comments: 125
Patch Set 2 : CR feedback round 1 thobrla #
Total comments: 113
Patch Set 3 : CR feedback round 2 thobrla #
Total comments: 2
Patch Set 4 : Style fixes #
Total comments: 105
Patch Set 5 : CR feedback round 3 thobrla #Patch Set 6 : Added time definition fixes. #
Total comments: 50
Patch Set 7 : CR feedback round 4 thobrla #
Total comments: 22
Patch Set 8 : CR feedback round 5 thobrla #
Total comments: 17
Patch Set 9 : CR feedback round 6 thobrla #Patch Set 10 : Fix Broken UI Check #
Total comments: 2
Patch Set 11 : CR feedback round 7 thobrla #
Total comments: 2
Patch Set 12 : CR feedback round 1 yarbrough + tests for rewrite and hash #
Total comments: 4
Patch Set 13 : Add size to rewrite NameExpansionResult #MessagesTotal messages: 25
File UI Design.
Sign in to reply to this message.
https://codereview.appspot.com/305770043/diff/1/gslib/command.py File gslib/command.py (right): https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode541 gslib/command.py:541: ui_controller = UIController() This is used only in the main process and main thread, right? Could it instead be a class variable on the command instance (that we disable for pickling in _SetUpPerCallerState) https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1668 gslib/command.py:1668: def _DecimalShort(num): In general, command.py is quite long and the below functions are utility function. Could we move these to util.py? We could also move MakeHumanReadable and other formatting functions to a new format_util.py, since we have several such functions. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1672 gslib/command.py:1672: num: The number to be shortened. Does this support floats and ints? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1674 gslib/command.py:1674: shortened string version for num. How short? What are the guarantees on the return value? Will it be of fixed size? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1682 gslib/command.py:1682: if num < 1000000000: Is this number implicitly tied to the maximum size of a GCS object being 5 terabytes? It seems like if we use this function to shorten total size for larger transfers it could be problematic. At the minimum we should document the limitation (but I'd prefer to remove it altogether). This also seems like it's doing some duplication of MakeHumanReadable and could be refactored. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1698 gslib/command.py:1698: %H:%M:%D time remaining format. %H might be higher than 23. If %H is >= 100 does that affect the maximum line length? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1733 gslib/command.py:1733: class UIController(object): Move this class to its own file. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1734 gslib/command.py:1734: """Controller UI class to integrate _MainThreadUIQueue and _UIThread.""" "integrate" here means functionality common to MainThreadUIQueue and _UIThread? More description on the class's responsibilities would be useful. It looks like this: - processes incoming StatusMessages - stores all necessary data about the current and past states of the system necessary to display to the UI - provides methods for actually displaying messages to the UI Is that all true? Anything I missed? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1747 gslib/command.py:1747: self.update_time_frequency = 1 These substantially impact the UI behavior, so I think they warrant detailed documentation in the comments. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1766 gslib/command.py:1766: self.individual_file_progress = [] Why is this a list instead of a dict keyed by filename? It seems like lookup could be expensive in the case that there are many files progressing, because list_variable[n] is O(n). This would also simplify lookup, i.e., I don't see that you would need a separate index variable. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1768 gslib/command.py:1768: # Enum for individual_file_progress Why not instead define a namedtuple so that you can reference these values via names instead of index-based lookup? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1796 gslib/command.py:1796: status_item: the item to be processed. status_message? Is this guaranteed to be a StatusMessage type? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1797 gslib/command.py:1797: stream: stream to print messages. Here only for SeekAheadThread Given that we're providing time-remaining estimates I don't think it's necessary for the SeekAheadThread to print this message anymore. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1816 gslib/command.py:1816: status_item.message_type < FileMessage.COMPONENT_TO_UPLOAD and Using a < operator on a enum seems very brittle. Here and below, can we call out explicitly what types we expect this to be (or not to be)? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1834 gslib/command.py:1834: [file_name, 0, {}, status_item.size]) What are the types of each of these values? Do they all need to be mutable? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1837 gslib/command.py:1837: self.total_size += status_item.size Once the producer thread finishes, would we want to adjust the number from the seek-ahead thread in case there is any discrepancy? For example, if during a long-running operation the user adds files, our final number should represent what was actually transferred. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1847 gslib/command.py:1847: file_progress = self.individual_file_progress[file_index] It seems like when the file is finished, we should clean up/delete its entries or else index/progress will grow arbitrarily large. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1858 gslib/command.py:1858: status_item.message_type == FileMessage.EXISTING_COMPONENT and I think the implication here is that an existing component could also be a non-finalized component that is being resumed? I think this is different than the notion of an "existing component" in copy_helper, which implies that the component is fully complete. We should clarify that in the FileMessage. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1912 gslib/command.py:1912: # Some progresses are being called without a proper FileMessage. When does this happen? This seems like a bug. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1915 gslib/command.py:1915: # Checking if this specific component has been processed Is the world component overloaded here? This seems like it could be referring to an entire file or a component of a file, based on the message type. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1928 gslib/command.py:1928: """Throughput tracking. What does this function actually do? It looks like it updates self.throughput and self.throughput time. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1947 gslib/command.py:1947: """Prints progress. Describe the conditions under which progress is printed. It also looks like this updates class state, which implies this function does a lot more than just printing. Would it be worth separating the state and the printing? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1952 gslib/command.py:1952: stream: stream to print messages. Under what conditions do we print to a stream other than sys.stderr? I assume this is for unit testing only, so here and elsewhere it would be good to call that out. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1955 gslib/command.py:1955: if ((cur_time-self.refresh_time > self.update_time_frequency or It looks like we're comparing a time value to a frequency value here, which indicates that one of these variables is not named correctly. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1957 gslib/command.py:1957: # Time to update all informations s/informations/information/ https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1958 gslib/command.py:1958: total_remaining = self.total_size-self.total_progress space in between arithmetic operators on variables https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1961 gslib/command.py:1961: self.time_remaining = total_remaining/self.throughput space in between arithmetic operators on variables https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1969 gslib/command.py:1969: percentage = '%d' % int(100*float(self.total_progress)/self.total_size) Nit: percentage_completed? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1971 gslib/command.py:1971: percentage = 'N/A' Does it make sense to print 'N/A% Done'? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1983 gslib/command.py:1983: 's, ETA ' + _PrettyTime(self.time_remaining)) What does this print if time_remaining is 'Undefined'? It seems like we have are inferring that it is defined from other variables, which would make it easy to author a bug. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1985 gslib/command.py:1985: stream.write(string_to_print+ '\r') space in between arithmetic operators https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1993 gslib/command.py:1993: self.current_char = (self.current_char+1)%len(self.progress_char) space between operators https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode2035 gslib/command.py:2035: self.ui_controller.CalculateThroughput(cur_time) Why does this UI thread (or MainThreadUIQueue) need to call CalculateThroughput or PrintProgress? These seem like implementation details. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode2067 gslib/command.py:2067: self.logger = logger if logger else CreateGsutilLogger('UI') Is this used anymore? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode2080 gslib/command.py:2080: # Item from MainThread to indicate we are done ProducerThread, right? https://codereview.appspot.com/305770043/diff/1/gslib/commands/rewrite.py File gslib/commands/rewrite.py (right): https://codereview.appspot.com/305770043/diff/1/gslib/commands/rewrite.py#new... gslib/commands/rewrite.py:385: (fixed) position. Why do we want these in a fixed position as opposed to outputting \n? Aren't we managing these via the UI thread? https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py File gslib/copy_helper.py (right): https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode1000 gslib/copy_helper.py:1000: # Dict to track component info so we allign FileMessage values s/allign/align/ https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode1093 gslib/copy_helper.py:1093: for component in existing_objects_to_delete: We're posting finish messages for existing objects to delete, but Apply is deleting the newly-uploaded components as well. Why do we only cover existing ones here? https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode1427 gslib/copy_helper.py:1427: This function does not support component transfers. Add a line between this and Args: https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode1680 gslib/copy_helper.py:1680: src_obj_metadata: Metadata for source object For source file? What does it include? https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode1803 gslib/copy_helper.py:1803: size = src_obj_metadata.size if src_obj_metadata else None In what case will we not know the size? Also, if size is all we care about, should we pass that instead of src_obj_metadata? https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode2152 gslib/copy_helper.py:2152: api_selector, decryption_key=None, gsutil_api=None): If we are only using the status_queue from gsutil_api, just pass that instead of gsutil_api? https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode2702 gslib/copy_helper.py:2702: gsutil_api: gsutil Cloud API instance to use for the copy. Again, if we just need the status queue, just use that? https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode2703 gslib/copy_helper.py:2703: src_obj_metadata: If source URL is a cloud object, source object metadata How could the source URL be a cloud object for a file-to-file copy? https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode2725 gslib/copy_helper.py:2725: size=src_obj_metadata.size, This will blow up if src_obj_metadata is None, yes? We have access to src_obj_size in the caller, so it seems like we could pass that. https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode3435 gslib/copy_helper.py:3435: uploaded_components.append([url, dst_arg.file_length]) If we're not modifying these elements, use a tuple instead of a list. https://codereview.appspot.com/305770043/diff/1/gslib/progress_callback.py File gslib/progress_callback.py (right): https://codereview.appspot.com/305770043/diff/1/gslib/progress_callback.py#ne... gslib/progress_callback.py:34: """Makes progress callbacks at least once every _TIMEOUT_SECONDS. This is only true if Progress() is called, yes? I don't see that this is guaranteed to report every _TIMEOUT_SECONDS. https://codereview.appspot.com/305770043/diff/1/gslib/seek_ahead_thread.py File gslib/seek_ahead_thread.py (right): https://codereview.appspot.com/305770043/diff/1/gslib/seek_ahead_thread.py#ne... gslib/seek_ahead_thread.py:113: estimate_message += ', total size: %s' % MakeHumanReadable(num_data_bytes) As mentioned elsewhere, I think you can just drop this. https://codereview.appspot.com/305770043/diff/1/gslib/thread_message.py File gslib/thread_message.py (right): https://codereview.appspot.com/305770043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:101: processed_bytes: Number of bytes already processed from that specific Integer? https://codereview.appspot.com/305770043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:103: into a readable format by using gslib.util.MakeHumanReadable. I don't think it's necessary to call out MakeHumanReadable here, as that's a UI construct. https://codereview.appspot.com/305770043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:128: queue. It tracks the number of files remaining. It tracks the total number of files, not just those not yet covered by the ProducerThread, and not the number of files remaining. Right?
Sign in to reply to this message.
https://codereview.appspot.com/305770043/diff/1/gslib/command.py File gslib/command.py (right): https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode541 gslib/command.py:541: ui_controller = UIController() On 2016/07/12 21:32:34, thobrla wrote: > This is used only in the main process and main thread, right? Could it instead > be a class variable on the command instance (that we disable for pickling in > _SetUpPerCallerState) Just to make sure I got it right, are you saying to create self.ui_controller on the Command class instead of using it as global? If that's the case, we could definitely do that. I'll just wait for your confirmation and add on the next round, ok? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1668 gslib/command.py:1668: def _DecimalShort(num): On 2016/07/12 21:32:36, thobrla wrote: > In general, command.py is quite long and the below functions are utility > function. Could we move these to util.py? We could also move MakeHumanReadable > and other formatting functions to a new format_util.py, since we have several > such functions. Yes, that makes sense. I will leave them all in util, then we can discuss which ones to move over to his new file. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1672 gslib/command.py:1672: num: The number to be shortened. On 2016/07/12 21:32:35, thobrla wrote: > Does this support floats and ints? No, it's only integers. This is used to abbreviate the total number of files, I don't think that was clear. This is also why I created this instead of using MakeHumanReadable. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1674 gslib/command.py:1674: shortened string version for num. On 2016/07/12 21:32:36, thobrla wrote: > How short? What are the guarantees on the return value? Will it be of fixed > size? Unless we have more than 10^12 objects, we are talking about 6 characters at most (3 for integer part, 1 for '.', 1 for decimal value, and one for the letter). https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1682 gslib/command.py:1682: if num < 1000000000: On 2016/07/12 21:32:35, thobrla wrote: > Is this number implicitly tied to the maximum size of a GCS object being 5 > terabytes? It seems like if we use this function to shorten total size for > larger transfers it could be problematic. At the minimum we should document the > limitation (but I'd prefer to remove it altogether). > > This also seems like it's doing some duplication of MakeHumanReadable and could > be refactored. Is there a limit on the number of objects (again, I apologize for not making it clear before)? Even if we do not, I think 10^12 is a pretty solid limit, right? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1698 gslib/command.py:1698: %H:%M:%D time remaining format. %H might be higher than 23. On 2016/07/12 21:32:35, thobrla wrote: > If %H is >= 100 does that affect the maximum line length? Yes. Is that something usual with all those large uploads and downloads that our clients make? I could change for 'Over 100 hours' if so (I don't think they need a very precise estimate if we are talking about 100+ hours, right?). https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1733 gslib/command.py:1733: class UIController(object): On 2016/07/12 21:32:34, thobrla wrote: > Move this class to its own file. Ok, I'll do that. What about the _UIThread and _MainThreadUIQueue? I think it would make sense to keep them all together, right? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1734 gslib/command.py:1734: """Controller UI class to integrate _MainThreadUIQueue and _UIThread.""" On 2016/07/12 21:32:34, thobrla wrote: > "integrate" here means functionality common to MainThreadUIQueue and _UIThread? > > More description on the class's responsibilities would be useful. It looks like > this: > - processes incoming StatusMessages > - stores all necessary data about the current and past states of the system > necessary to display to the UI > - provides methods for actually displaying messages to the UI > > Is that all true? Anything I missed? I updated the description. Also added that we calculate throughput and time remaining https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1747 gslib/command.py:1747: self.update_time_frequency = 1 On 2016/07/12 21:32:35, thobrla wrote: > These substantially impact the UI behavior, so I think they warrant detailed > documentation in the comments. Done. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1766 gslib/command.py:1766: self.individual_file_progress = [] On 2016/07/12 21:32:34, thobrla wrote: > Why is this a list instead of a dict keyed by filename? It seems like lookup > could be expensive in the case that there are many files progressing, because > list_variable[n] is O(n). This would also simplify lookup, i.e., I don't see > that you would need a separate index variable. Wait, would it be O(n) even with the individual_file_index dict? Regardless, it does make sense to use the filename as the key. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1768 gslib/command.py:1768: # Enum for individual_file_progress On 2016/07/12 21:32:34, thobrla wrote: > Why not instead define a namedtuple so that you can reference these values via > names instead of index-based lookup? Because we ideally want to edit those values, I would have to create new namedtuple everytime we change something. Do you think it is still worth the change? Since this would involve creating a copy of our dictionary every time we update a key, I decided to keep it this way for now. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1796 gslib/command.py:1796: status_item: the item to be processed. On 2016/07/12 21:32:35, thobrla wrote: > status_message? Is this guaranteed to be a StatusMessage type? That's our ultimate goal. As of right now, only test_seek_ahead_thread puts something other than a StatusMessage as status_item, and I'll be correcting that soon, so I think the change to status_message is valid. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1797 gslib/command.py:1797: stream: stream to print messages. Here only for SeekAheadThread On 2016/07/12 21:32:34, thobrla wrote: > Given that we're providing time-remaining estimates I don't think it's necessary > for the SeekAheadThread to print this message anymore. I don't know about that. It seems a good way of enforcing that this should take a while. Furthermore (and I'm not sure whether or not this is in fact a good reason), a lot of tests depend on this message to succeed. Since it would not hurt to keep it, but it would certainly cause some headache to remove it, I decided for the former. Does that make sense or am I being too lazy? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1816 gslib/command.py:1816: status_item.message_type < FileMessage.COMPONENT_TO_UPLOAD and On 2016/07/12 21:32:35, thobrla wrote: > Using a < operator on a enum seems very brittle. Here and below, can we call > out explicitly what types we expect this to be (or not to be)? We could, but there are 6 different types of messages describing files (although we seem to be currently treating them all equally). Should I just list them regardless? As an alternative, I could maybe create a comment on FileMessage informing that all File-like values must come before components, and the first component-like value should be FileMessage.COMPONENT_TO_UPLOAD https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1834 gslib/command.py:1834: [file_name, 0, {}, status_item.size]) On 2016/07/12 21:32:35, thobrla wrote: > What are the types of each of these values? Do they all need to be mutable? I described them (and I updated this description to make it clearer) when declaring self.individual_file_progress. Do I also need to put it here? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1837 gslib/command.py:1837: self.total_size += status_item.size On 2016/07/12 21:32:35, thobrla wrote: > Once the producer thread finishes, would we want to adjust the number from the > seek-ahead thread in case there is any discrepancy? For example, if during a > long-running operation the user adds files, our final number should represent > what was actually transferred. I'm not sure I understand. If I get what you are saying (one example would be trying to upload all the files of a local folder to a bucket, and during the operation adding more files to this folder, right?), would our operation actually upload this newly-added file? If so, I'm having a hard time imagining what we could in that case. I see two possible scenarios: trying to store on seek_ahead_thread the name of the files that we passed through (and we would have to add this argument to the SeekAheadMessage), or updating total_size as we get new files. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1847 gslib/command.py:1847: file_progress = self.individual_file_progress[file_index] On 2016/07/12 21:32:35, thobrla wrote: > It seems like when the file is finished, we should clean up/delete its entries > or else index/progress will grow arbitrarily large. It will grow up to the number of objects, right? I could maybe create another compact dict or list of finished objects and erase all the component-specific information from those files. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1858 gslib/command.py:1858: status_item.message_type == FileMessage.EXISTING_COMPONENT and On 2016/07/12 21:32:36, thobrla wrote: > I think the implication here is that an existing component could also be a > non-finalized component that is being resumed? I think this is different than > the notion of an "existing component" in copy_helper, which implies that the > component is fully complete. We should clarify that in the FileMessage. That is actually the same thing as in copy_helper. The thing is that we create start and end messages for this component, even though it is already completed by the time it starts. Thus, when we receive the start message for an EXISTING_COMPONENT, which indicates a fully complete component, we can already process all the information. An EXISTING_COMPONENT message with finished==True will just be ignored, as we have already processed this component. We do send this message because when deleting all the components, _DoParallelCompositeUpload gathers existing_components and uploaded_components (https://github.com/GoogleCloudPlatform/gsutil/blob/master/gslib/copy_helper.p...) https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1912 gslib/command.py:1912: # Some progresses are being called without a proper FileMessage. On 2016/07/12 21:32:35, thobrla wrote: > When does this happen? This seems like a bug. It happens on hash tests. I will try to track it down https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1915 gslib/command.py:1915: # Checking if this specific component has been processed On 2016/07/12 21:32:35, thobrla wrote: > Is the world component overloaded here? This seems like it could be referring > to an entire file or a component of a file, based on the message type. I updated the description, let me know if it is still unclear. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1928 gslib/command.py:1928: """Throughput tracking. On 2016/07/12 21:32:36, thobrla wrote: > What does this function actually do? It looks like it updates self.throughput > and self.throughput time. Done. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1947 gslib/command.py:1947: """Prints progress. On 2016/07/12 21:32:36, thobrla wrote: > Describe the conditions under which progress is printed. It also looks like > this updates class state, which implies this function does a lot more than just > printing. Would it be worth separating the state and the printing? Do you think so? I'm not so sure about this one. The only class variables we update are self.time_remaining (which, as a matter of fact, is only used here, so we do not have to declare as a class variable), self.refresh_time and self.current_char. I really don't see a need to separate this method into two, but let me know if you think otherwise. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1952 gslib/command.py:1952: stream: stream to print messages. On 2016/07/12 21:32:35, thobrla wrote: > Under what conditions do we print to a stream other than sys.stderr? I assume > this is for unit testing only, so here and elsewhere it would be good to call > that out. Yes, it is. I will add that note. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1955 gslib/command.py:1955: if ((cur_time-self.refresh_time > self.update_time_frequency or On 2016/07/12 21:32:34, thobrla wrote: > It looks like we're comparing a time value to a frequency value here, which > indicates that one of these variables is not named correctly. Done. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1957 gslib/command.py:1957: # Time to update all informations On 2016/07/12 21:32:35, thobrla wrote: > s/informations/information/ Acknowledged. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1958 gslib/command.py:1958: total_remaining = self.total_size-self.total_progress On 2016/07/12 21:32:34, thobrla wrote: > space in between arithmetic operators on variables Done. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1961 gslib/command.py:1961: self.time_remaining = total_remaining/self.throughput On 2016/07/12 21:32:35, thobrla wrote: > space in between arithmetic operators on variables Sorry about those. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1969 gslib/command.py:1969: percentage = '%d' % int(100*float(self.total_progress)/self.total_size) On 2016/07/12 21:32:35, thobrla wrote: > Nit: percentage_completed? Done. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1971 gslib/command.py:1971: percentage = 'N/A' On 2016/07/12 21:32:34, thobrla wrote: > Does it make sense to print 'N/A% Done'? Is it better to just not show anything, or maybe 'Could not calculate progress percentage?' https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1983 gslib/command.py:1983: 's, ETA ' + _PrettyTime(self.time_remaining)) On 2016/07/12 21:32:35, thobrla wrote: > What does this print if time_remaining is 'Undefined'? It seems like we have > are inferring that it is defined from other variables, which would make it easy > to author a bug. I didn't quite understand what you meant by "It seems like we have are inferring that it is defined from other variables", but _PrettyTime returns what it receives if it is a string. I do have to adjust the documentation to include that. Does that seem enough? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1985 gslib/command.py:1985: stream.write(string_to_print+ '\r') On 2016/07/12 21:32:35, thobrla wrote: > space in between arithmetic operators Done. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1993 gslib/command.py:1993: self.current_char = (self.current_char+1)%len(self.progress_char) On 2016/07/12 21:32:35, thobrla wrote: > space between operators Done. I did some regex search to ensure all operators are separated by spaces, I think it should be fine now. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode2035 gslib/command.py:2035: self.ui_controller.CalculateThroughput(cur_time) On 2016/07/12 21:32:35, thobrla wrote: > Why does this UI thread (or MainThreadUIQueue) need to call CalculateThroughput > or PrintProgress? These seem like implementation details. Ok, so maybe it makes more sense to have a Call method inside the Controller which calls those three functions, right? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode2067 gslib/command.py:2067: self.logger = logger if logger else CreateGsutilLogger('UI') On 2016/07/12 21:32:34, thobrla wrote: > Is this used anymore? No, but since we talked about having a logger mode later on, I thought it wouldn't hurt to keep it. Do you think we should remove for now? https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode2080 gslib/command.py:2080: # Item from MainThread to indicate we are done On 2016/07/12 21:32:35, thobrla wrote: > ProducerThread, right? Is it? The portion of code that puts it into the status_queue is on Apply: if is_main_thread: glob_status_queue.put(ZERO_TASKS_TO_DO_ARGUMENT) https://codereview.appspot.com/305770043/diff/1/gslib/commands/rewrite.py File gslib/commands/rewrite.py (right): https://codereview.appspot.com/305770043/diff/1/gslib/commands/rewrite.py#new... gslib/commands/rewrite.py:385: (fixed) position. On 2016/07/12 21:32:36, thobrla wrote: > Why do we want these in a fixed position as opposed to outputting \n? Aren't we > managing these via the UI thread? I think we should definitely output \n in this case, but those aren't treated by the UIThread directly. This is actually similar to keeping the 'Estimated number of objects ...' message for the UIThread: I kept it because it did not seem to affect us, and tests would fail otherwise. I will update the description here. https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py File gslib/copy_helper.py (right): https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode1000 gslib/copy_helper.py:1000: # Dict to track component info so we allign FileMessage values On 2016/07/12 21:32:36, thobrla wrote: > s/allign/align/ Done. https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode1093 gslib/copy_helper.py:1093: for component in existing_objects_to_delete: On 2016/07/12 21:32:36, thobrla wrote: > We're posting finish messages for existing objects to delete, but Apply is > deleting the newly-uploaded components as well. Why do we only cover existing > ones here? I'm not sure I understand what you are saying. We have basically 3 types of messages here: COMPONENT_TO_UPLOAD, EXISTING_COMPONENT and EXISTING_OBJECT_TO_DELETE. The start messages are all created above (separated above). After that, we gather the first two types on the 'components' list, which we use to post finish messages (right above). After that, we post finish messages for those of the EXISTING_OBJECT_TO_DELETE type. Does that make sense? https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode1427 gslib/copy_helper.py:1427: This function does not support component transfers. On 2016/07/12 21:32:36, thobrla wrote: > Add a line between this and Args: Done. https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode1680 gslib/copy_helper.py:1680: src_obj_metadata: Metadata for source object On 2016/07/12 21:32:36, thobrla wrote: > For source file? What does it include? Edited the description. https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode1803 gslib/copy_helper.py:1803: size = src_obj_metadata.size if src_obj_metadata else None On 2016/07/12 21:32:36, thobrla wrote: > In what case will we not know the size? Also, if size is all we care about, > should we pass that instead of src_obj_metadata? I'll quote from the PerformCopy docstring: "src_obj_metadata:(...)If source URL is a file, an apitools Object that may contain file size, or None." That's why I decided not to blindly trust we have size. As for the second question, I agree that there's no good reason for not just sending the size. I also just realized this is already being passed to _UploadFileToObject, so I'll just delete the src_obj_metadata field. It seems that upload_size stores the correct size (the difference from src_obj_size is when the _CompressFileForUpload method is called, which updates upload_size). I did still leave the try statement, however, as upload_size could be equal to src_obj_size, which is taken from the possibly None src_obj_metadata in the first place. https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode2152 gslib/copy_helper.py:2152: api_selector, decryption_key=None, gsutil_api=None): On 2016/07/12 21:32:36, thobrla wrote: > If we are only using the status_queue from gsutil_api, just pass that instead of > gsutil_api? Done. https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode2702 gslib/copy_helper.py:2702: gsutil_api: gsutil Cloud API instance to use for the copy. On 2016/07/12 21:32:36, thobrla wrote: > Again, if we just need the status queue, just use that? Agreed. I did use gsutil_api because I saw that was being used on many different functions, but they do seem to have multiple purposes for it, so it makes sense to just send the status_queue here. https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode2703 gslib/copy_helper.py:2703: src_obj_metadata: If source URL is a cloud object, source object metadata On 2016/07/12 21:32:36, thobrla wrote: > How could the source URL be a cloud object for a file-to-file copy? My bad! https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode2725 gslib/copy_helper.py:2725: size=src_obj_metadata.size, On 2016/07/12 21:32:36, thobrla wrote: > This will blow up if src_obj_metadata is None, yes? We have access to > src_obj_size in the caller, so it seems like we could pass that. Yes, this is potentially the same problem as in _UploadFileToObject https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode3435 gslib/copy_helper.py:3435: uploaded_components.append([url, dst_arg.file_length]) On 2016/07/12 21:32:36, thobrla wrote: > If we're not modifying these elements, use a tuple instead of a list. Does that offer a gain in performance, or is it just a way to show the elements will be immutable? Either way, changed it. https://codereview.appspot.com/305770043/diff/1/gslib/progress_callback.py File gslib/progress_callback.py (right): https://codereview.appspot.com/305770043/diff/1/gslib/progress_callback.py#ne... gslib/progress_callback.py:34: """Makes progress callbacks at least once every _TIMEOUT_SECONDS. On 2016/07/12 21:32:36, thobrla wrote: > This is only true if Progress() is called, yes? I don't see that this is > guaranteed to report every _TIMEOUT_SECONDS. Yes, that is indeed a problem. I'm not sure how we could solve it, though. Any ideas? I saw that Progress() is usually called after sending/receiving a given number of bytes (like here: https://github.com/GoogleCloudPlatform/gsutil/blob/437be5d09d2eb76221c7259a58...). If I'm not mistaken (and at least for this case, but I would think is similar for other Progress calls), our buffer size is 8 KiB, right? If so, the only way we would not receive a call after _TIMEOUT_SECONDS would the connection is lower than 8/_TIMEOUT_SECONDS Kibs. Do we need to worry about that case (or maybe if something goes wrong)? It seems that the connection would be really slow either way, so not having one ProgressMessage would be fairly irrelevant. PS: Given I just argued that we are fairly certain to receive a Progress call more often than we need, I wondered why we actually need the _TIMEOUT_SECONDS after all. The answer is that without checking for timeout, while we are fairly certain to receive at least one callback from 8 KiB or so, it is more of a stretch to assume we will receive START_BYTES_PER_CALLBACK (which is currently set at 256 KiB) on this interval, which would prevent us from sending a ProgressMessage. Again, the connection would have to be slow, but that seems a bit more reasonable, and justifies adding _TIMEOUT_SECONDS https://codereview.appspot.com/305770043/diff/1/gslib/seek_ahead_thread.py File gslib/seek_ahead_thread.py (right): https://codereview.appspot.com/305770043/diff/1/gslib/seek_ahead_thread.py#ne... gslib/seek_ahead_thread.py:113: estimate_message += ', total size: %s' % MakeHumanReadable(num_data_bytes) On 2016/07/12 21:32:36, thobrla wrote: > As mentioned elsewhere, I think you can just drop this. I first thought I had forgotten about this, but I actually kept it so test_seek_ahead_thread would keep working, as it depends on those UI messages. I will change the test so it creates a UIController and gets the message from there. https://codereview.appspot.com/305770043/diff/1/gslib/thread_message.py File gslib/thread_message.py (right): https://codereview.appspot.com/305770043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:101: processed_bytes: Number of bytes already processed from that specific On 2016/07/12 21:32:36, thobrla wrote: > Integer? Done. https://codereview.appspot.com/305770043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:103: into a readable format by using gslib.util.MakeHumanReadable. On 2016/07/12 21:32:36, thobrla wrote: > I don't think it's necessary to call out MakeHumanReadable here, as that's a UI > construct. Yes, we talked about that, don't know why it's still there. Sorry. https://codereview.appspot.com/305770043/diff/1/gslib/thread_message.py#newco... gslib/thread_message.py:128: queue. It tracks the number of files remaining. On 2016/07/12 21:32:37, thobrla wrote: > It tracks the total number of files, not just those not yet covered by the > ProducerThread, and not the number of files remaining. Right? Yes, fixed it.
Sign in to reply to this message.
https://codereview.appspot.com/305770043/diff/1/gslib/command.py File gslib/command.py (right): https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode541 gslib/command.py:541: ui_controller = UIController() On 2016/07/13 22:54:09, michelzel wrote: > On 2016/07/12 21:32:34, thobrla wrote: > > This is used only in the main process and main thread, right? Could it > instead > > be a class variable on the command instance (that we disable for pickling in > > _SetUpPerCallerState) > > Just to make sure I got it right, are you saying to create self.ui_controller on > the Command class instead of using it as global? If that's the case, we could > definitely do that. I'll just wait for your confirmation and add on the next > round, ok? Yes, that's correct. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1698 gslib/command.py:1698: %H:%M:%D time remaining format. %H might be higher than 23. On 2016/07/13 22:54:09, michelzel wrote: > On 2016/07/12 21:32:35, thobrla wrote: > > If %H is >= 100 does that affect the maximum line length? > > Yes. Is that something usual with all those large uploads and downloads that our > clients make? I could change for 'Over 100 hours' if so (I don't think they need > a very precise estimate if we are talking about 100+ hours, right?). Over 100 hours is definitely a possibility, and I don't think "over 100 hours" is sufficient. A user might be able to live with a week-long operation, but not a month. I do think we can cap at 99 days, since at that point they probably should look for a way to parallelize across machines. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1766 gslib/command.py:1766: self.individual_file_progress = [] On 2016/07/13 22:54:08, michelzel wrote: > On 2016/07/12 21:32:34, thobrla wrote: > > Why is this a list instead of a dict keyed by filename? It seems like lookup > > could be expensive in the case that there are many files progressing, because > > list_variable[n] is O(n). This would also simplify lookup, i.e., I don't see > > that you would need a separate index variable. > > Wait, would it be O(n) even with the individual_file_index dict? Regardless, it > does make sense to use the filename as the key. Sorry, I was incorrect because Python lists are array-backed. Still, it seems like we will want to delete sequences of elements from the list to prevent unbounded memory growth, and that will be O(n) - see https://wiki.python.org/moin/TimeComplexity https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1768 gslib/command.py:1768: # Enum for individual_file_progress On 2016/07/13 22:54:08, michelzel wrote: > On 2016/07/12 21:32:34, thobrla wrote: > > Why not instead define a namedtuple so that you can reference these values via > > names instead of index-based lookup? > > Because we ideally want to edit those values, I would have to create new > namedtuple everytime we change something. Do you think it is still worth the > change? > > Since this would involve creating a copy of our dictionary every time we update > a key, I decided to keep it this way for now. You're correct about namedtuple because that is immutable. But what about a container class? You could retrieve that from the dictionary and set individual members. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1797 gslib/command.py:1797: stream: stream to print messages. Here only for SeekAheadThread On 2016/07/13 22:54:09, michelzel wrote: > On 2016/07/12 21:32:34, thobrla wrote: > > Given that we're providing time-remaining estimates I don't think it's > necessary > > for the SeekAheadThread to print this message anymore. > > I don't know about that. It seems a good way of enforcing that this should take > a while. Furthermore (and I'm not sure whether or not this is in fact a good > reason), a lot of tests depend on this message to succeed. Since it would not > hurt to keep it, but it would certainly cause some headache to remove it, I > decided for the former. Does that make sense or am I being too lazy? I guess the question is whether it provides value to the user or not. Having a UI element just to make our testing easier doesn't make sense. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1816 gslib/command.py:1816: status_item.message_type < FileMessage.COMPONENT_TO_UPLOAD and On 2016/07/13 22:54:09, michelzel wrote: > On 2016/07/12 21:32:35, thobrla wrote: > > Using a < operator on a enum seems very brittle. Here and below, can we call > > out explicitly what types we expect this to be (or not to be)? > > We could, but there are 6 different types of messages describing files (although > we seem to be currently treating them all equally). Should I just list them > regardless? > > As an alternative, I could maybe create a comment on FileMessage informing that > all File-like values must come before components, and the first component-like > value should be FileMessage.COMPONENT_TO_UPLOAD I think this kind of programming is very brittle and unreadable. What is this part of the statement actually asserting? That it's a whole file as opposed to a component? That it's a download? It's not clear at all when you read it. If this part of the if-statement is only designed to handle certain types of FileMessages, we should either make those their own class or be explicit about what types we handle. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1834 gslib/command.py:1834: [file_name, 0, {}, status_item.size]) On 2016/07/13 22:54:08, michelzel wrote: > On 2016/07/12 21:32:35, thobrla wrote: > > What are the types of each of these values? Do they all need to be mutable? > > I described them (and I updated this description to make it clearer) when > declaring self.individual_file_progress. Do I also need to put it here? I think you have a good start although per my other comment you probably want a class to store this state. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1837 gslib/command.py:1837: self.total_size += status_item.size On 2016/07/13 22:54:08, michelzel wrote: > On 2016/07/12 21:32:35, thobrla wrote: > > Once the producer thread finishes, would we want to adjust the number from the > > seek-ahead thread in case there is any discrepancy? For example, if during a > > long-running operation the user adds files, our final number should represent > > what was actually transferred. > > I'm not sure I understand. If I get what you are saying (one example would be > trying to upload all the files of a local folder to a bucket, and during the > operation adding more files to this folder, right?), would our operation > actually upload this newly-added file? If so, I'm having a hard time imagining > what we could in that case. > I see two possible scenarios: trying to store on seek_ahead_thread the name of > the files that we passed through (and we would have to add this argument to the > SeekAheadMessage), or updating total_size as we get new files. I think describing the actual total is better. The ProducerThread will operate on its own list results, and that will reflect what we actually transfer. My suggestion is once the ProducerThread finishes enumerating all tasks, we use that as the final value instead of the SeekAheadThread since the source could have changed in between those two times and the SeekAheadThread is there for estimation only. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1847 gslib/command.py:1847: file_progress = self.individual_file_progress[file_index] On 2016/07/13 22:54:09, michelzel wrote: > On 2016/07/12 21:32:35, thobrla wrote: > > It seems like when the file is finished, we should clean up/delete its entries > > or else index/progress will grow arbitrarily large. > > It will grow up to the number of objects, right? I could maybe create another > compact dict or list of finished objects and erase all the component-specific > information from those files. Yes, and there will potentially be millions or even billions of objects. It doesn't seem right that this should take up multiple megabytes of memory if we're only using 20 threads. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1858 gslib/command.py:1858: status_item.message_type == FileMessage.EXISTING_COMPONENT and On 2016/07/13 22:54:09, michelzel wrote: > On 2016/07/12 21:32:36, thobrla wrote: > > I think the implication here is that an existing component could also be a > > non-finalized component that is being resumed? I think this is different than > > the notion of an "existing component" in copy_helper, which implies that the > > component is fully complete. We should clarify that in the FileMessage. > > That is actually the same thing as in copy_helper. The thing is that we create > start and end messages for this component, even though it is already completed > by the time it starts. Thus, when we receive the start message for an > EXISTING_COMPONENT, which indicates a fully complete component, we can already > process all the information. An EXISTING_COMPONENT message with finished==True > will just be ignored, as we have already processed this component. We do send > this message because when deleting all the components, > _DoParallelCompositeUpload gathers existing_components and uploaded_components > (https://github.com/GoogleCloudPlatform/gsutil/blob/master/gslib/copy_helper.p...) I see. This makes sense, but maybe a comment in FileMessage on EXISTING_COMPONENT enum about the expected start/finish behavior. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1947 gslib/command.py:1947: """Prints progress. On 2016/07/13 22:54:08, michelzel wrote: > On 2016/07/12 21:32:36, thobrla wrote: > > Describe the conditions under which progress is printed. It also looks like > > this updates class state, which implies this function does a lot more than > just > > printing. Would it be worth separating the state and the printing? > > Do you think so? I'm not so sure about this one. The only class variables we > update are self.time_remaining (which, as a matter of fact, is only used here, > so we do not have to declare as a class variable), self.refresh_time and > self.current_char. I really don't see a need to separate this method into two, > but let me know if you think otherwise. I don't think you necessarily need to separate it, but you should update the name the function and document its behavior outside of printing, particularly if any other functions rely on state set by this function. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1971 gslib/command.py:1971: percentage = 'N/A' On 2016/07/13 22:54:09, michelzel wrote: > On 2016/07/12 21:32:34, thobrla wrote: > > Does it make sense to print 'N/A% Done'? > > Is it better to just not show anything, or maybe 'Could not calculate progress > percentage?' If we don't have the ability to make an estimate, I don't see the point of displaying one. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1983 gslib/command.py:1983: 's, ETA ' + _PrettyTime(self.time_remaining)) On 2016/07/13 22:54:09, michelzel wrote: > On 2016/07/12 21:32:35, thobrla wrote: > > What does this print if time_remaining is 'Undefined'? It seems like we have > > are inferring that it is defined from other variables, which would make it > easy > > to author a bug. > > I didn't quite understand what you meant by "It seems like we have > are inferring that it is defined from other variables", but _PrettyTime returns > what it receives if it is a string. I do have to adjust the documentation to > include that. Does that seem enough? My point is if time_remaining is 'undefined', do we want to print "undefined"? That doesn't seem helpful. Does PrettyTime expect the string 'Undefined' specially as an input? Because it looks to me like it expects "a string/None value". It seems like we either have an estimate or we don't, and in the case where we don't, we shouldn't even call PrettyTime. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode2067 gslib/command.py:2067: self.logger = logger if logger else CreateGsutilLogger('UI') On 2016/07/13 22:54:09, michelzel wrote: > On 2016/07/12 21:32:34, thobrla wrote: > > Is this used anymore? > > No, but since we talked about having a logger mode later on, I thought it > wouldn't hurt to keep it. Do you think we should remove for now? In general, avoid adding code unless you need it now. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode2080 gslib/command.py:2080: # Item from MainThread to indicate we are done On 2016/07/13 22:54:09, michelzel wrote: > On 2016/07/12 21:32:35, thobrla wrote: > > ProducerThread, right? > > Is it? The portion of code that puts it into the status_queue is on Apply: > if is_main_thread: > glob_status_queue.put(ZERO_TASKS_TO_DO_ARGUMENT) > Acknowledged. https://codereview.appspot.com/305770043/diff/1/gslib/commands/rewrite.py File gslib/commands/rewrite.py (right): https://codereview.appspot.com/305770043/diff/1/gslib/commands/rewrite.py#new... gslib/commands/rewrite.py:385: (fixed) position. On 2016/07/13 22:54:10, michelzel wrote: > On 2016/07/12 21:32:36, thobrla wrote: > > Why do we want these in a fixed position as opposed to outputting \n? Aren't > we > > managing these via the UI thread? > > I think we should definitely output \n in this case, but those aren't treated by > the UIThread directly. This is actually similar to keeping the 'Estimated number > of objects ...' message for the UIThread: I kept it because it did not seem to > affect us, and tests would fail otherwise. > I will update the description here. I see - these are data operations so I think we need to handle them in the UI thread. However, this can be done in a future CL. Can you add a TODO? https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py File gslib/copy_helper.py (right): https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode1093 gslib/copy_helper.py:1093: for component in existing_objects_to_delete: On 2016/07/13 22:54:10, michelzel wrote: > On 2016/07/12 21:32:36, thobrla wrote: > > We're posting finish messages for existing objects to delete, but Apply is > > deleting the newly-uploaded components as well. Why do we only cover existing > > ones here? > > I'm not sure I understand what you are saying. We have basically 3 types of > messages here: COMPONENT_TO_UPLOAD, EXISTING_COMPONENT and > EXISTING_OBJECT_TO_DELETE. The start messages are all created above (separated > above). After that, we gather the first two types on the 'components' list, > which we use to post finish messages (right above). After that, we post finish > messages for those of the EXISTING_OBJECT_TO_DELETE type. Does that make sense? Thanks, that makes sense. https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode2725 gslib/copy_helper.py:2725: size=src_obj_metadata.size, On 2016/07/13 22:54:10, michelzel wrote: > On 2016/07/12 21:32:36, thobrla wrote: > > This will blow up if src_obj_metadata is None, yes? We have access to > > src_obj_size in the caller, so it seems like we could pass that. > > Yes, this is potentially the same problem as in _UploadFileToObject Needs to be fixed, then. https://codereview.appspot.com/305770043/diff/1/gslib/copy_helper.py#newcode3435 gslib/copy_helper.py:3435: uploaded_components.append([url, dst_arg.file_length]) On 2016/07/13 22:54:10, michelzel wrote: > On 2016/07/12 21:32:36, thobrla wrote: > > If we're not modifying these elements, use a tuple instead of a list. > > Does that offer a gain in performance, or is it just a way to show the elements > will be immutable? Either way, changed it. It's to show the elements will be immutable. https://codereview.appspot.com/305770043/diff/20001/gslib/command.py File gslib/command.py (right): https://codereview.appspot.com/305770043/diff/20001/gslib/command.py#newcode34 gslib/command.py:34: import random This seems unused? https://codereview.appspot.com/305770043/diff/20001/gslib/command.py#newcode558 gslib/command.py:558: self.logger, MainThreadUIQueue(sys.stderr, ui_controller, Whitespace is off here - please run lint on every changed file before submitting a patch. There are several whitespace and other lint errors in the other files in this patch. https://codereview.appspot.com/305770043/diff/20001/gslib/copy_helper.py File gslib/copy_helper.py (right): https://codereview.appspot.com/305770043/diff/20001/gslib/copy_helper.py#newc... gslib/copy_helper.py:2193: size = component.end_byte - component.start_byte+1 Space between operators. https://codereview.appspot.com/305770043/diff/20001/gslib/copy_helper.py#newc... gslib/copy_helper.py:2195: ReadOrCreateDownloadTrackerFile(src_obj_metadata, dst_url, logger, Why do we need to read/create a tracker file here when it's already done in _MaintainSlicedDownloadTrackerFiles? Is this just to get download_start_byte? It seems dangerous to potentially create this file here. https://codereview.appspot.com/305770043/diff/20001/gslib/copy_helper.py#newc... gslib/copy_helper.py:2202: bytes_already_downloaded= Style: if you can't fit a keyword arg on a line, use parenthesis or assign it to a variable prior to making the call. https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:2: # Copyright 2013 Google Inc. All Rights Reserved. 2016 https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:15: # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS Grab the apache license and use that (ex. command.py) https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:51: ZERO_TASKS_TO_DO_ARGUMENT = ('There were no', 'tasks to do') Duplication from command - move it to a utility file (maybe parallelism_framework_util)? https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:54: """Tests for ui functions.""" Capitalize acronyms like UI. https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:59: def test_EmptyList(self): Make this a unit test since it doesn't interact with buckets/objects/commands https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:71: def test_SeekAheadMessage(self): Make this a unit test since it doesn't interact with buckets/objects/commands https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:110: def test_UIUtilityFunctions(self): One test case for each function. Also should be unit tests https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:123: num_objects = random.randint(3, 12) Avoid using random numbers in tests as it makes them harder to debug and reproduce issues https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:146: def test_DownloadMutlipleObjects(self): Duplicate test https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:170: def test_UploadMutlipleObjects(self): This looks like it's only testing a single object? Please avoid submitting incomplete code in CLs. https://codereview.appspot.com/305770043/diff/20001/gslib/thread_message.py File gslib/thread_message.py (right): https://codereview.appspot.com/305770043/diff/20001/gslib/thread_message.py#n... gslib/thread_message.py:75: bytes_transferred: Specific field for resuming downloads. At the end, it "At the end" doesn't add meaning https://codereview.appspot.com/305770043/diff/20001/gslib/thread_message.py#n... gslib/thread_message.py:77: bytes_already_downloaded: Specific field for resuming downloads. When If we know bytes_transferred and total size, can't we infer this value? https://codereview.appspot.com/305770043/diff/20001/gslib/thread_message.py#n... gslib/thread_message.py:78: starting a component download, it tells how many bytes were already s/tells how/tells how/ https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py File gslib/ui_controller.py (right): https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:2: # Copyright 2010 Google Inc. All Rights Reserved. 2016 https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:17: """ This can be moved to one line with the comment. Also, this class now contains the UI thread as well, yes? https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:23: import codecs Seems like there are many unused imports here (lint should detect this). https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:60: time remaining. Finally, it actually displays the messages to the UI. It provides methods for displaying messages to the UI, right? In other words, it doesn't display them automatically. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:78: # Minimum period for refreshing information What is the difference between refreshing information and refreshing status? Please be more precise and descriptive in these comments. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:84: # Minimum waiting time before actually throughput info. Sentence doesn't parse. Did you mean "actually displaying"? https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:95: self.existing_progress = 0 How is this different than old_sum? https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:114: self.num_iter = 0 What does this track? https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:118: self.throughput_time = self.refresh_time This statement is duplicated. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:125: self.last_progress_time = 0 Last time we reported progress, or received a progress message? https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:126: self.file_report_change = False What's the semantic of this variable? It's possible this is described in the overall algorithm instead of here, but it seems to trigger updates not based on time. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:128: def Call(self, status_message, stream, cur_time): Should this just be called ProcessMessage? It also optionally prints, so that should be documented. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:132: status_message: Message to be processed. This could be None according to the UIThread? https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:153: if isinstance(status_message, SeekAheadMessage): This function is quite long. Do you think it would be more readable to break up the cases into individual functions? _HandleNewFileStarted() _HandleFileFinished() _HandleComponentStarted() etc. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:161: self.total_size = status_message.size Do we want to set this even if (not status_message.size) ? https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:165: elif isinstance(status_message, FileMessage) and ( We repeat the "is FileMessage" check a lot - could we use nested ifs? https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:186: # This guarantees the file has not been counted on SeekAheadThread This isn't necessarily true - it just means the SeekAheadThread isn't done processing yet. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:252: (status_message.message_type == FileMessage.COMPONENT_TO_UPLOAD or We already checked the message type above, so could we use a nested if-statement that just checks finished/not finished? https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:280: # Some progresses are being called without a proper FileMessage. Still need to fix this (bug) https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:300: """Updates throughput if the required period for calculation has passed. I think it would be useful to document the top-level algorithm used to calculate throughput somewhere as well as the variables that can be adjusted to control it. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:311: self.throughput = (self.throughput * (cur_time - self.throughput_time) / Why do we assign the same variable twice in a row? Does one indicate a different, temporary value? https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:314: self.old_sum = self.new_progress Nit: consistent naming - pick "sum" or "progress" https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:318: """Prints progress. This doesn't always print progress, so the name is misleading. Could the code that updates this be separated so that you get something more readable like: if _ShouldPrintProgress(): _PrintProgress() https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:346: string_to_print = ( Might be useful to include a sample print in comments here, or in the function docs. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:379: Presently, this class sends the messages it receives to UIController, which Drop "Presently, "? It implies you intend to change this behavior later. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:400: if status_message == ZERO_TASKS_TO_DO_ARGUMENT: This is only used in _ParallelApply, so the MainThreadUIQueue should never receive this message. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:444: except: We should only handle Queue.Empty here, yes? Other exceptions would need to propagate up. https://codereview.appspot.com/305770043/diff/20001/gslib/util.py File gslib/util.py (right): https://codereview.appspot.com/305770043/diff/20001/gslib/util.py#newcode728 gslib/util.py:728: shortened string version for this number. Document the limitations and the expected "shortness". In general, I think it makes sense to re-use _EXP_STRINGS here, instead of defining your own powers of 10. https://codereview.appspot.com/305770043/diff/20001/gslib/util.py#newcode748 gslib/util.py:748: seconds remaining. Trailing whitespace here and below, please run lint https://codereview.appspot.com/305770043/diff/20001/gslib/util.py#newcode762 gslib/util.py:762: hours = remaining_time//3600 Why use floor division here and below (// instead of /)? Will remaining time ever be negative? https://codereview.appspot.com/305770043/diff/20001/gslib/util.py#newcode776 gslib/util.py:776: String representing a readable format for number with two trailing Isn't this up to two trailing zeroes, as opposed to exactly two? Is the goal to pad the number to an exact string size, or just to add trailing zeroes and make no guarantees about the size value being returned? In either case, would it make more sense to accept a length / number of zeroes as an argument instead of making this function special-purpose? https://codereview.appspot.com/305770043/diff/20001/gslib/util.py#newcode782 gslib/util.py:782: if num%100 == 0: Space between operators if you keep this approach. https://codereview.appspot.com/305770043/diff/20001/gslib/util.py#newcode1528 gslib/util.py:1528: def CreateGsutilLogger(command_name): If you don't need this elsewhere, leave it in command.py for now.
Sign in to reply to this message.
I decided to only publish the comments before submitting the code because there are a couple of points I want to clarify first (I put a [Q] on the beginning of each comment that contains a question). I would also like to apologize for this previous round. It was taking a long time, and I tried to finish it as soon as possible so you could review it, but I didn't realize I had made such a poor job on many different places. It ended up wasting both of our times, and I am sincerely sorry. I will make sure to be highly confident with my code and comments from now on before sending it to review. In particular, the style issues will drop considerably: I had no idea I would commit that many mistakes in just a couple of days after running lint. I might get to work a bit later tomorrow as I just finished this now (it should had been a bit earlier, but I had to stop by at my apartment). Once I get the comments back from you, it should be relatively straightforward to submit the code (it is mostly lint-clean, I've changed a lot on the documentation and the tests are working!). Thanks and once again, sorry for the poor job! https://codereview.appspot.com/305770043/diff/20001/gslib/command.py File gslib/command.py (right): https://codereview.appspot.com/305770043/diff/20001/gslib/command.py#newcode34 gslib/command.py:34: import random On 2016/07/14 19:57:48, thobrla wrote: > This seems unused? Acknowledged. https://codereview.appspot.com/305770043/diff/20001/gslib/command.py#newcode558 gslib/command.py:558: self.logger, MainThreadUIQueue(sys.stderr, ui_controller, On 2016/07/14 19:57:48, thobrla wrote: > Whitespace is off here - please run lint on every changed file before submitting > a patch. There are several whitespace and other lint errors in the other files > in this patch. Done. As I told you, I had run lint shortly before submitting the CL. I had no idea I would make that many mistakes in such a short spam. I really apologize for it, from now on I will always deliver lint-clean code! https://codereview.appspot.com/305770043/diff/20001/gslib/copy_helper.py File gslib/copy_helper.py (right): https://codereview.appspot.com/305770043/diff/20001/gslib/copy_helper.py#newc... gslib/copy_helper.py:2193: size = component.end_byte - component.start_byte+1 On 2016/07/14 19:57:48, thobrla wrote: > Space between operators. [Q] Once again, sorry. Besides lint, I also checked occurrences of [^ ][op][^ ]for every operator in all files to make sure I am spacing operators. I also encountered a couple of places throughout the code that were not mine and had no spaces, and took the liberty of adjusting them. Should I also adjust in this case? What's the difference? _START_BYTES_PER_CALLBACK = 1024*256 _MAX_BYTES_PER_CALLBACK = 1024*1024*100 What about this? s3_key = self.CreateObject(bucket_uri=s3_bucket, contents='f'*1024*1024) https://codereview.appspot.com/305770043/diff/20001/gslib/copy_helper.py#newc... gslib/copy_helper.py:2195: ReadOrCreateDownloadTrackerFile(src_obj_metadata, dst_url, logger, On 2016/07/14 19:57:49, thobrla wrote: > Why do we need to read/create a tracker file here when it's already done in > _MaintainSlicedDownloadTrackerFiles? Is this just to get download_start_byte? > It seems dangerous to potentially create this file here. [Q] Agreed. Do you have any suggestions on how to do this? https://codereview.appspot.com/305770043/diff/20001/gslib/copy_helper.py#newc... gslib/copy_helper.py:2202: bytes_already_downloaded= On 2016/07/14 19:57:49, thobrla wrote: > Style: if you can't fit a keyword arg on a line, use parenthesis or assign it to > a variable prior to making the call. Got it, thanks. https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:2: # Copyright 2013 Google Inc. All Rights Reserved. On 2016/07/14 19:57:49, thobrla wrote: > 2016 Done. https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:15: # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS On 2016/07/14 19:57:49, thobrla wrote: > Grab the apache license and use that (ex. command.py) Done. https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:51: ZERO_TASKS_TO_DO_ARGUMENT = ('There were no', 'tasks to do') On 2016/07/14 19:57:49, thobrla wrote: > Duplication from command - move it to a utility file (maybe > parallelism_framework_util)? Done. https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:54: """Tests for ui functions.""" On 2016/07/14 19:57:49, thobrla wrote: > Capitalize acronyms like UI. [Q] What about on TestUi? Should I leave it like this or use TestUI? https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:59: def test_EmptyList(self): On 2016/07/14 19:57:49, thobrla wrote: > Make this a unit test since it doesn't interact with buckets/objects/commands Done. https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:71: def test_SeekAheadMessage(self): On 2016/07/14 19:57:49, thobrla wrote: > Make this a unit test since it doesn't interact with buckets/objects/commands Done. https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:110: def test_UIUtilityFunctions(self): On 2016/07/14 19:57:49, thobrla wrote: > One test case for each function. Also should be unit tests Ok. I also put them on test_util since they are now there. https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:123: num_objects = random.randint(3, 12) On 2016/07/14 19:57:49, thobrla wrote: > Avoid using random numbers in tests as it makes them harder to debug and > reproduce issues Acknowledged. I will put a fixed number. https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:146: def test_DownloadMutlipleObjects(self): On 2016/07/14 19:57:49, thobrla wrote: > Duplicate test Done. https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:170: def test_UploadMutlipleObjects(self): On 2016/07/14 19:57:49, thobrla wrote: > This looks like it's only testing a single object? > > Please avoid submitting incomplete code in CLs. It is, I actually didn't realize it. Sorry for that. Given your advice, I will not be submitting the throughput tests this time, although they are already in progress. https://codereview.appspot.com/305770043/diff/20001/gslib/thread_message.py File gslib/thread_message.py (right): https://codereview.appspot.com/305770043/diff/20001/gslib/thread_message.py#n... gslib/thread_message.py:75: bytes_transferred: Specific field for resuming downloads. At the end, it On 2016/07/14 19:57:49, thobrla wrote: > "At the end" doesn't add meaning I meant "On end messages", I think it wasn't clear. https://codereview.appspot.com/305770043/diff/20001/gslib/thread_message.py#n... gslib/thread_message.py:77: bytes_already_downloaded: Specific field for resuming downloads. When On 2016/07/14 19:57:49, thobrla wrote: > If we know bytes_transferred and total size, can't we infer this value? Yes. Actually, I'm not using bytes_transferred in the UIController, just bytes_already_downloaded (which comes first), so I'll remove it. https://codereview.appspot.com/305770043/diff/20001/gslib/thread_message.py#n... gslib/thread_message.py:78: starting a component download, it tells how many bytes were already On 2016/07/14 19:57:49, thobrla wrote: > s/tells how/tells how/ Done. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py File gslib/ui_controller.py (right): https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:2: # Copyright 2010 Google Inc. All Rights Reserved. On 2016/07/14 19:57:51, thobrla wrote: > 2016 Done. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:17: """ On 2016/07/14 19:57:50, thobrla wrote: > This can be moved to one line with the comment. > > Also, this class now contains the UI thread as well, yes? [Q] I added to the description, which was just under one line, and I wasn't sure whether I should put the """ to finish the comment in this line and surpass the 80-character limit or in the following line. What is the procedure here? https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:23: import codecs On 2016/07/14 19:57:51, thobrla wrote: > Seems like there are many unused imports here (lint should detect this). Done. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:60: time remaining. Finally, it actually displays the messages to the UI. On 2016/07/14 19:57:49, thobrla wrote: > It provides methods for displaying messages to the UI, right? In other words, > it doesn't display them automatically. Done. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:78: # Minimum period for refreshing information On 2016/07/14 19:57:51, thobrla wrote: > What is the difference between refreshing information and refreshing status? > Please be more precise and descriptive in these comments. [Q] Status was a reference to the first character that keeps spinning to indicate we are running. I think I heard you using a specific term for it, but I can't seem to remember it. Do you know what it is? https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:84: # Minimum waiting time before actually throughput info. On 2016/07/14 19:57:50, thobrla wrote: > Sentence doesn't parse. Did you mean "actually displaying"? Yes, I saw that right after submitting, sorry. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:95: self.existing_progress = 0 On 2016/07/14 19:57:50, thobrla wrote: > How is this different than old_sum? [Q] I think I should be clearer: existing_progress is specific to resuming operations (how many bytes were already transferred before this operation began?). Should I call it resuming_progress? pre_resuming_progress? I'm having a hard time to come up with many short yet clear names. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:114: self.num_iter = 0 On 2016/07/14 19:57:49, thobrla wrote: > What does this track? This is remaining from when everything was in the UIThread, hadn't realized it. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:118: self.throughput_time = self.refresh_time On 2016/07/14 19:57:50, thobrla wrote: > This statement is duplicated. Done. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:125: self.last_progress_time = 0 On 2016/07/14 19:57:50, thobrla wrote: > Last time we reported progress, or received a progress message? It is for the progress message. I will call it last_progress_message_time. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:126: self.file_report_change = False On 2016/07/14 19:57:50, thobrla wrote: > What's the semantic of this variable? It's possible this is described in the > overall algorithm instead of here, but it seems to trigger updates not based on > time. That's exactly it, I'll include a brief description. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:128: def Call(self, status_message, stream, cur_time): On 2016/07/14 19:57:50, thobrla wrote: > Should this just be called ProcessMessage? It also optionally prints, so that > should be documented. I think ProcessMessage is best suited for this first method Call calls. I updated the description, and I'll give some thought if there's a better name. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:132: status_message: Message to be processed. On 2016/07/14 19:57:49, thobrla wrote: > This could be None according to the UIThread? Yes, I hadn't thought about it here, thanks! https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:153: if isinstance(status_message, SeekAheadMessage): On 2016/07/14 19:57:51, thobrla wrote: > This function is quite long. Do you think it would be more readable to break up > the cases into individual functions? > > _HandleNewFileStarted() > _HandleFileFinished() > _HandleComponentStarted() > etc. I totally agree. It also makes some of the if conditions smaller. I'll see the best way to do it. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:161: self.total_size = status_message.size On 2016/07/14 19:57:50, thobrla wrote: > Do we want to set this even if (not status_message.size) ? I don't think so, good catch. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:165: elif isinstance(status_message, FileMessage) and ( On 2016/07/14 19:57:50, thobrla wrote: > We repeat the "is FileMessage" check a lot - could we use nested ifs? This gets way better with our new approach of dividing into methods. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:186: # This guarantees the file has not been counted on SeekAheadThread On 2016/07/14 19:57:51, thobrla wrote: > This isn't necessarily true - it just means the SeekAheadThread isn't done > processing yet. [Q] Ok, but even if this happens, once the SeekAheadThread is done processing it should all be good, right? Are you saying I should just edit the comment? https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:252: (status_message.message_type == FileMessage.COMPONENT_TO_UPLOAD or On 2016/07/14 19:57:50, thobrla wrote: > We already checked the message type above, so could we use a nested if-statement > that just checks finished/not finished? This was made in this new approach. I believe it is much more clear now, thanks! https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:280: # Some progresses are being called without a proper FileMessage. On 2016/07/14 19:57:50, thobrla wrote: > Still need to fix this (bug) Done. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:300: """Updates throughput if the required period for calculation has passed. On 2016/07/14 19:57:50, thobrla wrote: > I think it would be useful to document the top-level algorithm used to calculate > throughput somewhere as well as the variables that can be adjusted to control > it. Ok, I'll add that once I submit the new version. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:311: self.throughput = (self.throughput * (cur_time - self.throughput_time) / On 2016/07/14 19:57:51, thobrla wrote: > Why do we assign the same variable twice in a row? Does one indicate a > different, temporary value? The first one is the underestimation. The second one takes into consideration the time between this measurement and the last observed progress. However, I just realized we are dividing and them multiplying by the same factor, which was not my smartest thought. I will just make sure the calculations are right, but it seems as if we can use only operation. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:314: self.old_sum = self.new_progress On 2016/07/14 19:57:50, thobrla wrote: > Nit: consistent naming - pick "sum" or "progress" Done. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:318: """Prints progress. On 2016/07/14 19:57:50, thobrla wrote: > This doesn't always print progress, so the name is misleading. Could the code > that updates this be separated so that you get something more readable like: > > if _ShouldPrintProgress(): > _PrintProgress() Sure thing! So I split this function between printing progress and the 'status' character (once again, I couldn't remember the term you used to refer to it). From Call(), we now have: if self._ShouldPrintProgress(cur_time): self._PrintProgress(stream) self.refresh_time = cur_time elif self._ShouldPrintStatus(cur_time): self._PrintStatus(stream) This is also good because now the time is only used by the boolean functions that check, whereas the actual printing methods only need the stream. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:346: string_to_print = ( On 2016/07/14 19:57:50, thobrla wrote: > Might be useful to include a sample print in comments here, or in the function > docs. Ok, I'll add that once I submit the new version. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:379: Presently, this class sends the messages it receives to UIController, which On 2016/07/14 19:57:50, thobrla wrote: > Drop "Presently, "? It implies you intend to change this behavior later. Changed here and in the UIThread https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:400: if status_message == ZERO_TASKS_TO_DO_ARGUMENT: On 2016/07/14 19:57:50, thobrla wrote: > This is only used in _ParallelApply, so the MainThreadUIQueue should never > receive this message. Oh ok, I hadn't realized that. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:444: except: On 2016/07/14 19:57:50, thobrla wrote: > We should only handle Queue.Empty here, yes? Other exceptions would need to > propagate up. [Q] I think I understood what you said, but just to make sure, it is basically to use Except Queue.Empty rather than the broad Except, right? https://codereview.appspot.com/305770043/diff/20001/gslib/util.py File gslib/util.py (right): https://codereview.appspot.com/305770043/diff/20001/gslib/util.py#newcode728 gslib/util.py:728: shortened string version for this number. On 2016/07/14 19:57:51, thobrla wrote: > Document the limitations and the expected "shortness". In general, I think it > makes sense to re-use _EXP_STRINGS here, instead of defining your own powers of > 10. [Q] Isn't EXP_STRINGS created on base 1024? I think those are slightly different, right? I also realized the order I'm making this is suboptimal. The exponents should be in descending order. https://codereview.appspot.com/305770043/diff/20001/gslib/util.py#newcode748 gslib/util.py:748: seconds remaining. On 2016/07/14 19:57:51, thobrla wrote: > Trailing whitespace here and below, please run lint Done. https://codereview.appspot.com/305770043/diff/20001/gslib/util.py#newcode762 gslib/util.py:762: hours = remaining_time//3600 On 2016/07/14 19:57:51, thobrla wrote: > Why use floor division here and below (// instead of /)? Will remaining time > ever be negative? I mistakenly defined remaining_time as an integer, but it is actually a float. The idea was to transform it in an integer. I will just use int(remaining_time/60) then. https://codereview.appspot.com/305770043/diff/20001/gslib/util.py#newcode776 gslib/util.py:776: String representing a readable format for number with two trailing On 2016/07/14 19:57:51, thobrla wrote: > Isn't this up to two trailing zeroes, as opposed to exactly two? > > Is the goal to pad the number to an exact string size, or just to add trailing > zeroes and make no guarantees about the size value being returned? > > In either case, would it make more sense to accept a length / number of zeroes > as an argument instead of making this function special-purpose? The goal is to have a fixed number of decimal places (I was incorrectly saying fixed number of trailing zeroes, sorry) so we don't change our output much (it would change when the integer part goes from 9 to 10 or 99 to 100, for instance. I prefer doing this than going from 9.999 to 10.00 or 99.99 to 100.0, which would keep the total number of characters the same, but does not sound very natural to me). I do agree we should receive the number of decimal places as a parameter, I'll do that. However, we must keep in mind that, as a user of MakeHumanReadable, this method works best with 2 or less decimal places (since that's what we get from MakeHumanReadable) https://codereview.appspot.com/305770043/diff/20001/gslib/util.py#newcode782 gslib/util.py:782: if num%100 == 0: On 2016/07/14 19:57:51, thobrla wrote: > Space between operators if you keep this approach. Acknowledged. https://codereview.appspot.com/305770043/diff/20001/gslib/util.py#newcode1528 gslib/util.py:1528: def CreateGsutilLogger(command_name): On 2016/07/14 19:57:51, thobrla wrote: > If you don't need this elsewhere, leave it in command.py for now. I use it in the UIThread in case we don't receive a logger (although we are not quite using logger right now).
Sign in to reply to this message.
https://codereview.appspot.com/305770043/diff/20001/gslib/copy_helper.py File gslib/copy_helper.py (right): https://codereview.appspot.com/305770043/diff/20001/gslib/copy_helper.py#newc... gslib/copy_helper.py:2193: size = component.end_byte - component.start_byte+1 On 2016/07/15 09:44:31, michelzel wrote: > On 2016/07/14 19:57:48, thobrla wrote: > > Space between operators. > > [Q] Once again, sorry. Besides lint, I also checked occurrences of [^ ][op][^ > ]for every operator in all files to make sure I am spacing operators. I also > encountered a couple of places throughout the code that were not mine and had no > spaces, and took the liberty of adjusting them. > > Should I also adjust in this case? What's the difference? > _START_BYTES_PER_CALLBACK = 1024*256 > _MAX_BYTES_PER_CALLBACK = 1024*1024*100 > What about this? > s3_key = self.CreateObject(bucket_uri=s3_bucket, contents='f'*1024*1024) > Done. https://codereview.appspot.com/305770043/diff/20001/gslib/copy_helper.py#newc... gslib/copy_helper.py:2195: ReadOrCreateDownloadTrackerFile(src_obj_metadata, dst_url, logger, On 2016/07/15 09:44:31, michelzel wrote: > On 2016/07/14 19:57:49, thobrla wrote: > > Why do we need to read/create a tracker file here when it's already done in > > _MaintainSlicedDownloadTrackerFiles? Is this just to get download_start_byte? > > > It seems dangerous to potentially create this file here. > > [Q] Agreed. Do you have any suggestions on how to do this? Ok, so I found out that the Apply works is that it ultimately calls ReadOrCreateDownloadTrackerFile. I create a GetStartDownloadByte function that works pretty much the same way as the former, but it just returns the start download byte and does not create a tracker file if there is no existing tracker file. https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/20001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:54: """Tests for ui functions.""" On 2016/07/15 09:44:31, michelzel wrote: > On 2016/07/14 19:57:49, thobrla wrote: > > Capitalize acronyms like UI. > > [Q] What about on TestUi? Should I leave it like this or use TestUI? Done. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py File gslib/ui_controller.py (right): https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:17: """ On 2016/07/15 09:44:33, michelzel wrote: > On 2016/07/14 19:57:50, thobrla wrote: > > This can be moved to one line with the comment. > > > > Also, this class now contains the UI thread as well, yes? > > [Q] I added to the description, which was just under one line, and I wasn't sure > whether I should put the """ to finish the comment in this line and surpass the > 80-character limit or in the following line. What is the procedure here? Done. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:78: # Minimum period for refreshing information On 2016/07/15 09:44:33, michelzel wrote: > On 2016/07/14 19:57:51, thobrla wrote: > > What is the difference between refreshing information and refreshing status? > > Please be more precise and descriptive in these comments. > > [Q] Status was a reference to the first character that keeps spinning to > indicate we are running. I think I heard you using a specific term for it, but I > can't seem to remember it. Do you know what it is? Spinner. Done. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:95: self.existing_progress = 0 On 2016/07/15 09:44:33, michelzel wrote: > On 2016/07/14 19:57:50, thobrla wrote: > > How is this different than old_sum? > > [Q] I think I should be clearer: existing_progress is specific to resuming > operations (how many bytes were already transferred before this operation > began?). Should I call it resuming_progress? pre_resuming_progress? I'm having a > hard time to come up with many short yet clear names. Done. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:186: # This guarantees the file has not been counted on SeekAheadThread On 2016/07/15 09:44:32, michelzel wrote: > On 2016/07/14 19:57:51, thobrla wrote: > > This isn't necessarily true - it just means the SeekAheadThread isn't done > > processing yet. > > [Q] Ok, but even if this happens, once the SeekAheadThread is done processing it > should all be good, right? Are you saying I should just edit the comment? Added a ProducerThreadMessage, that sends updates for each new 1000 files processed. Time remaining and throughput are now calculated only when estimation from either ProducerThread or SeekAheadThread has come. https://codereview.appspot.com/305770043/diff/20001/gslib/ui_controller.py#ne... gslib/ui_controller.py:444: except: On 2016/07/15 09:44:32, michelzel wrote: > On 2016/07/14 19:57:50, thobrla wrote: > > We should only handle Queue.Empty here, yes? Other exceptions would need to > > propagate up. > > [Q] I think I understood what you said, but just to make sure, it is basically > to use Except Queue.Empty rather than the broad Except, right? Done. https://codereview.appspot.com/305770043/diff/20001/gslib/util.py File gslib/util.py (right): https://codereview.appspot.com/305770043/diff/20001/gslib/util.py#newcode728 gslib/util.py:728: shortened string version for this number. On 2016/07/15 09:44:33, michelzel wrote: > On 2016/07/14 19:57:51, thobrla wrote: > > Document the limitations and the expected "shortness". In general, I think it > > makes sense to re-use _EXP_STRINGS here, instead of defining your own powers > of > > 10. > > [Q] Isn't EXP_STRINGS created on base 1024? I think those are slightly > different, right? > > I also realized the order I'm making this is suboptimal. The exponents should be > in descending order. Created EXP_TEN_STRINGS to use in DecimalShort. Done.
Sign in to reply to this message.
https://codereview.appspot.com/305770043/diff/20001/gslib/util.py File gslib/util.py (right): https://codereview.appspot.com/305770043/diff/20001/gslib/util.py#newcode1528 gslib/util.py:1528: def CreateGsutilLogger(command_name): On 2016/07/15 09:44:33, michelzel wrote: > On 2016/07/14 19:57:51, thobrla wrote: > > If you don't need this elsewhere, leave it in command.py for now. > > I use it in the UIThread in case we don't receive a logger (although we are not > quite using logger right now). If we're not using it, we should save it for the CL when we do use it. https://codereview.appspot.com/305770043/diff/40001/gslib/ui_controller.py File gslib/ui_controller.py (right): https://codereview.appspot.com/305770043/diff/40001/gslib/ui_controller.py#ne... gslib/ui_controller.py:56: update_time_period: minimum period for refreshing information. Here and below, what's the behavior of 0 values? https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:49: download_size = 30000 How did you choose these size values? Are they related to any other constants? If so, base them on these constants. If not, minimize the amount of data the tests transfer. https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:107: fpath = self.CreateTempFile(file_name='looks-zipped.gz', Are we testing something specific about gzip? https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:275: upload_size = 40000 This is the same as the previous test, so should exist at the module level. https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:286: start_time = time.time() Don't use current time in unit tests - pick a fixed value. https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:288: # Adds a file. Because this message was already theoretically processed Should this comment go with the code below? The newline makes it appear that it goes with the SeekAheadMessage. https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:292: fpath = self.CreateTempFile(file_name='looks-zipped.gz', We're just using the file path, yes? Do we actually need to write the contents? https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:304: ui_thread.join(self.thread_wait_time) Here and elsewhere, call isAlive() to ensure the thread didn't time out? https://docs.python.org/2/library/threading.html May be useful to write a function like "JoinThreadAndRaiseOnTimeout" https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:371: Add tests for resumption and existing components? https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:413: # otherwise the UI will not print We'll print some UI, right? Or did you mean it won't print estimates? https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:417: message_type=FileMessage.FILE_UPLOAD)) Should we have similar tests (may just be able to use a for loop on the type) for componentized download? for file_message_type, component_message_type in ( (FileMessage.FILE_UPLOAD, FileMessage.COMPONENT_TO_UPLOAD), (FileMessage.FILE_DOWNLOAD, FileMessage.COMPONENT_TO_DOWNLOAD)): etc. How about non-componentized file download/upload? https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:482: # were reported. The throughput here will be file1_progress Nit: end sentences with period. https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:492: return return statement not useful. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py File gslib/ui_controller.py (right): https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:56: update_time_period: minimum period for refreshing information. Nit: start sentences with capital letters. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:56: update_time_period: minimum period for refreshing information. Does refreshing include displaying, or just updating the internal implementation? https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:70: # It has priority over self.information_from_producer It seems like we have three states: 1. Preliminary size before ProducerThread finishes enumerating. 2. Size from SeekAheadThread 3. Final size after ProducerThread has enumerated all tasks As opposed to using two bools, do we want to track this state more explicitly? https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:81: self.throughput = 0.0 What are the units? Bytes/sec? https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:93: # Each individual progress has [new progress sum For the dictionary values, why use list+enums instead of a class with named values? This also has the advantage of making it easier to document. For example, does "new progress sum" mean bytes transferred this gsutil session, or new progress since the last call to Call()? Is "existing progress" only nonzero in the resume case? https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:115: # This overrides time constraints for updating important information. Do you mean that this forces a display update? Or does it just force an internal representation update? https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:126: cur_time: Message time. Used to determine if it is time to refresh What is the meaning if this is None? Why would we use the status message's time instead? https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:179: # status_message.message_type < FileMessage.COMPONENT_TO_UPLOAD means This comment no longer seems to apply? https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:180: # this is a new file Nit: capitalize sentences https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:234: file_dict = file_progress[self.file_dict_num] We do this same lookup here and below a few times. Consider abstracting to a function. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:274: file_dict[key] = (status_message.size - last_update[1], last_update[1]) Class would let you avoid indexing this tuple. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:277: """Handles a ProgressMessage that describes a component. Is this docstring correct? It seems like this handles files and components. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:287: except KeyError: Still a bug that needs fixing. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:297: status_message.processed_bytes = max(status_message.processed_bytes - max() seems like it is really trying to switch on whether this is a component or not. Should we handle that case separately in a conditional so that it's clear? https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:327: if status_message.message_type < FileMessage.COMPONENT_TO_UPLOAD: Still a brittle less-than comparison on an enum. Be explicit about the types and then we don't need to even have a comment. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:345: which are defined by throughput_time and last_progress_message_time, Would this better be called last_throughput_time, for consistency? https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:353: period, it means that it is over, or it is on an extremely slow progress What does "it is over" mean? That it was completed? It seems like we would get a finished message for that. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:363: # Just to avoid -0.00 Bs Super nit: B/s https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:374: whether or not we should print the progress. Nit: Capitalize sentences. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:384: the current progress and the total size, besides the percentage it Does "besides" intend to indicate something about the position in the UI? If so, I think a diagram or example would communicate it better. If not, revise the language to use "and". https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:387: completed, the current progress and total size(which might be updated), Add space between size and ( https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:388: with no percentage as we have no guarantees that no extra files are coming. suggest "as we don't know if more files are coming." https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:389: It could also include time estimation (available only given s/could/may/ https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:392: seconds having been passed since we began controlling the UI, and that "since we began controlling" - does this mean since the UI controller started? https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:394: number of files and total size (for time remaining only). You already mention "for time remaining only" in time estimation above, so I think it is redundant. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:406: time_remaining = 'Undefined' Why undefined instead of None? https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:413: percentage_completed = '%d' % int( Do we want to 0-pad this int so it takes up a fixed width on the screen? https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:415: self.total_size) + '% Done' It seems like this could fit on the line above it. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:445: # '\ [2/3 completed files] [101.00 MiB/1.01 GiB] 10% Done, 82.34 MiBs' Is it possible we'll have % done without throughput? It would be nice to have one guaranteed to precede the other so that the UI doesn't jump around on the user's screen. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:479: char_to_print = self.spinner_char_list[self.current_spinner_char] These lines are duplicated above, can you abstract to something like "GetSpinnerChar"? https://codereview.appspot.com/305770043/diff/60001/gslib/util.py File gslib/util.py (right): https://codereview.appspot.com/305770043/diff/60001/gslib/util.py#newcode735 gslib/util.py:735: shortened string version for this number. Still need more documentation - what's the length of the string that's returned? https://codereview.appspot.com/305770043/diff/60001/gslib/util.py#newcode755 gslib/util.py:755: the nearest integer from remaining_time (%H might be higher than 23). Trailing whitespace - please run lint. https://codereview.appspot.com/305770043/diff/60001/gslib/util.py#newcode759 gslib/util.py:759: if not remaining_time: I think this code is all working around remaining time "undefined", but we shouldn't even call this function if it's undefined. https://codereview.appspot.com/305770043/diff/60001/gslib/util.py#newcode764 gslib/util.py:764: hours = int(remaining_time / 3600) What about the case where hours > 99?
Sign in to reply to this message.
https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:271: class TestUiUnitTests(testcase.GsUtilUnitTestCase): Add unit tests for the various utility functions (PrettyTime, HumanReadableToDecimal, etc.?)
Sign in to reply to this message.
https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:49: download_size = 30000 On 2016/07/18 21:45:26, thobrla wrote: > How did you choose these size values? Are they related to any other constants? > If so, base them on these constants. If not, minimize the amount of data the > tests transfer. It's the second option; I'll do that. https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:107: fpath = self.CreateTempFile(file_name='looks-zipped.gz', On 2016/07/18 21:45:26, thobrla wrote: > Are we testing something specific about gzip? No, it was just an example. Is it better to use a different extension? https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:271: class TestUiUnitTests(testcase.GsUtilUnitTestCase): On 2016/07/18 22:13:26, thobrla wrote: > Add unit tests for the various utility functions (PrettyTime, > HumanReadableToDecimal, etc.?) Those can be found on test_util.py https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:275: upload_size = 40000 On 2016/07/18 21:45:26, thobrla wrote: > This is the same as the previous test, so should exist at the module level. Makes sense. https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:286: start_time = time.time() On 2016/07/18 21:45:26, thobrla wrote: > Don't use current time in unit tests - pick a fixed value. Ok. Added a custom_time argument to UIController() to ensure that if we want to use a fixed value, there is no call to time.time() on UIController() that would prevent us from printing the throughput. https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:288: # Adds a file. Because this message was already theoretically processed On 2016/07/18 21:45:26, thobrla wrote: > Should this comment go with the code below? The newline makes it appear that it > goes with the SeekAheadMessage. Yes, my bad. https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:292: fpath = self.CreateTempFile(file_name='looks-zipped.gz', On 2016/07/18 21:45:26, thobrla wrote: > We're just using the file path, yes? Do we actually need to write the contents? We don't. This was just to create a file that had the same size as we say it does on the FileMessage https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:304: ui_thread.join(self.thread_wait_time) On 2016/07/18 21:45:26, thobrla wrote: > Here and elsewhere, call isAlive() to ensure the thread didn't time out? > https://docs.python.org/2/library/threading.html > > May be useful to write a function like "JoinThreadAndRaiseOnTimeout" Got it. https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:371: On 2016/07/18 21:45:27, thobrla wrote: > Add tests for resumption and existing components? I added two tests adapted from test_cp_resumable_upload. I think we should also test for downloads, but I'll need some help on the best way to do that. In addition to that, one important change I've made to the code was to add two functions CheckUiOutputWithMFlag and CheckUiOutputWithNoMFlag that basically do all the asserting for the tests. This not only makes the code cleaner, but it is also much more reusable if we decide to change something in the UI output. https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:413: # otherwise the UI will not print On 2016/07/18 21:45:26, thobrla wrote: > We'll print some UI, right? Or did you mean it won't print estimates? It was supposed to be estimates, but now I realize we don't even need it for throughput, just for time remaining. https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:417: message_type=FileMessage.FILE_UPLOAD)) On 2016/07/18 21:45:26, thobrla wrote: > Should we have similar tests (may just be able to use a for loop on the type) > for componentized download? > > for file_message_type, component_message_type in ( > (FileMessage.FILE_UPLOAD, FileMessage.COMPONENT_TO_UPLOAD), > (FileMessage.FILE_DOWNLOAD, FileMessage.COMPONENT_TO_DOWNLOAD)): > > etc. > > How about non-componentized file download/upload? I like the idea for download as well, I'll add that, and then add a separate case for non-componentized. https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:482: # were reported. The throughput here will be file1_progress On 2016/07/18 21:45:26, thobrla wrote: > Nit: end sentences with period. Done. https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:492: return On 2016/07/18 21:45:26, thobrla wrote: > return statement not useful. Done. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py File gslib/ui_controller.py (right): https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:56: update_time_period: minimum period for refreshing information. On 2016/07/18 21:45:27, thobrla wrote: > Nit: start sentences with capital letters. Done. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:56: update_time_period: minimum period for refreshing information. On 2016/07/18 21:45:27, thobrla wrote: > Does refreshing include displaying, or just updating the internal > implementation? It does include displaying, I added that. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:70: # It has priority over self.information_from_producer On 2016/07/18 21:45:27, thobrla wrote: > It seems like we have three states: > 1. Preliminary size before ProducerThread finishes enumerating. > 2. Size from SeekAheadThread > 3. Final size after ProducerThread has enumerated all tasks > > As opposed to using two bools, do we want to track this state more explicitly? That's exactly it. To make it clearer, I created 4 variables: total_size_from_file_message(=3), total_size_from_producer_thread(=2), total_size_from_seek_ahead_thread(=1) and total_size_source. total_size_source starts as total_size_from_file_message. As we reach messages that give us a higher priority, we update total_size_source for this new value. When trying to use this information for deciding whether or not to print time_remaining, for instance, we check if the priority is at least that of the first source that allows us to estimate the time_remaining (which is total_size_from_producer_thread, in our case). I adopted this model because it allows us to easily insert new priority types if we ever need to, and I think doing the comparison here (self.total_size_source <= self.total_size_from_producer_thread) makes more logical sense than what we were doing in status_message.type < FileMessage, because we actually comparing the priority of the messages. This seems cleaner and even more correct than listing all sources from total size with higher priority. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:81: self.throughput = 0.0 On 2016/07/18 21:45:27, thobrla wrote: > What are the units? Bytes/sec? Yes, added this information. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:93: # Each individual progress has [new progress sum On 2016/07/18 21:45:27, thobrla wrote: > For the dictionary values, why use list+enums instead of a class with named > values? This also has the advantage of making it easier to document. For > example, does "new progress sum" mean bytes transferred this gsutil session, or > new progress since the last call to Call()? Is "existing progress" only nonzero > in the resume case? I think that's very valid. I will implement it. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:115: # This overrides time constraints for updating important information. On 2016/07/18 21:45:27, thobrla wrote: > Do you mean that this forces a display update? Or does it just force an > internal representation update? Display update, I'll change that as well. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:126: cur_time: Message time. Used to determine if it is time to refresh On 2016/07/18 21:45:27, thobrla wrote: > What is the meaning if this is None? Why would we use the status message's time > instead? This is more for testing. We should be always using the time from the StatusMessage. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:179: # status_message.message_type < FileMessage.COMPONENT_TO_UPLOAD means On 2016/07/18 21:45:27, thobrla wrote: > This comment no longer seems to apply? Yes, sorry. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:180: # this is a new file On 2016/07/18 21:45:28, thobrla wrote: > Nit: capitalize sentences This was not capitalized because it was supposed to be a continuation of the previous message ('it means\n this is a new file'). I'll capitalize it now since we are erasing the previous line. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:234: file_dict = file_progress[self.file_dict_num] On 2016/07/18 21:45:27, thobrla wrote: > We do this same lookup here and below a few times. Consider abstracting to a > function. I think that now that we are using smaller names, I can just use file_progress.dict instead of file_dict and we can avoid creating a function for that. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:274: file_dict[key] = (status_message.size - last_update[1], last_update[1]) On 2016/07/18 21:45:27, thobrla wrote: > Class would let you avoid indexing this tuple. I didn't quite understand why that would be the case. The class allows us to write simpler and clearer code, but this particular tuple will still be used. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:277: """Handles a ProgressMessage that describes a component. On 2016/07/18 21:45:28, thobrla wrote: > Is this docstring correct? It seems like this handles files and components. Done. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:287: except KeyError: On 2016/07/18 21:45:27, thobrla wrote: > Still a bug that needs fixing. It's not happening anymore. I'll just remove the try/except statement. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:297: status_message.processed_bytes = max(status_message.processed_bytes - On 2016/07/18 21:45:27, thobrla wrote: > max() seems like it is really trying to switch on whether this is a component or > not. Should we handle that case separately in a conditional so that it's clear? I think I had this for a previous bug. I tested it again and it worked fine without max. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:327: if status_message.message_type < FileMessage.COMPONENT_TO_UPLOAD: On 2016/07/18 21:45:27, thobrla wrote: > Still a brittle less-than comparison on an enum. Be explicit about the types > and then we don't need to even have a comment. Sorry, I'm not sure if I got it. Are you saying to list all the values that come before COMPONENT_TO_UPLOAD? There are 7 of them, I thought the code would get confusing. I'll create a method _isFile() and put it all over there, what do you think? https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:345: which are defined by throughput_time and last_progress_message_time, On 2016/07/18 21:45:27, thobrla wrote: > Would this better be called last_throughput_time, for consistency? Yes, sure. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:353: period, it means that it is over, or it is on an extremely slow progress On 2016/07/18 21:45:27, thobrla wrote: > What does "it is over" mean? That it was completed? It seems like we would get > a finished message for that. We will eventually. Is it guaranteed to be right after, though? https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:363: # Just to avoid -0.00 Bs On 2016/07/18 21:45:27, thobrla wrote: > Super nit: B/s We are using 'Bs' on the throughput. Should I add a '/' there as well? (I'm assuming yes, let me know otherwise) https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:374: whether or not we should print the progress. On 2016/07/18 21:45:28, thobrla wrote: > Nit: Capitalize sentences. Done. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:384: the current progress and the total size, besides the percentage it On 2016/07/18 21:45:28, thobrla wrote: > Does "besides" intend to indicate something about the position in the UI? If > so, I think a diagram or example would communicate it better. > > If not, revise the language to use "and". No. I think I did not use 'and' initially because I added this later on, and there was already an 'and'. Fixed it. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:387: completed, the current progress and total size(which might be updated), On 2016/07/18 21:45:27, thobrla wrote: > Add space between size and ( Done. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:388: with no percentage as we have no guarantees that no extra files are coming. On 2016/07/18 21:45:28, thobrla wrote: > suggest "as we don't know if more files are coming." Yeah, that was supposed to be "no guarantees that extra files are coming", which would make more sense. But I'll use your suggestion. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:389: It could also include time estimation (available only given On 2016/07/18 21:45:27, thobrla wrote: > s/could/may/ Done. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:392: seconds having been passed since we began controlling the UI, and that On 2016/07/18 21:45:27, thobrla wrote: > "since we began controlling" - does this mean since the UI controller started? Yes. Sometimes I complicate sentences, sorry. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:394: number of files and total size (for time remaining only). On 2016/07/18 21:45:28, thobrla wrote: > You already mention "for time remaining only" in time estimation above, so I > think it is redundant. Agreed, I thought about that but preferred to be redundant rather than unclear. Glad it is clear, though. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:406: time_remaining = 'Undefined' On 2016/07/18 21:45:28, thobrla wrote: > Why undefined instead of None? No reason, actually. I'll use None instead. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:413: percentage_completed = '%d' % int( On 2016/07/18 21:45:28, thobrla wrote: > Do we want to 0-pad this int so it takes up a fixed width on the screen? While I agree we should have a fixed width on the screen for everything we can, I think it is better to use %3d as opposed to %03d here, because 06% does not seem very natural, and we achieve our goal in both cases. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:415: self.total_size) + '% Done' On 2016/07/18 21:45:28, thobrla wrote: > It seems like this could fit on the line above it. Done. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:445: # '\ [2/3 completed files] [101.00 MiB/1.01 GiB] 10% Done, 82.34 MiBs' On 2016/07/18 21:45:28, thobrla wrote: > Is it possible we'll have % done without throughput? It would be nice to have > one guaranteed to precede the other so that the UI doesn't jump around on the > user's screen. You are right that we might have something like progress + throughput and then progress + percentage + throughput. However, both scenarios can happen (the first being only displaying throughput and then throughput and percentage, and the second being only displaying percentage and then percentage and throughput), so I don't think we have much of a choice here. It will only happen once though, so I don't see it as a very big deal. Let me know if you have any suggestion on how we may approach this. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:479: char_to_print = self.spinner_char_list[self.current_spinner_char] On 2016/07/18 21:45:28, thobrla wrote: > These lines are duplicated above, can you abstract to something like > "GetSpinnerChar"? Done. https://codereview.appspot.com/305770043/diff/60001/gslib/util.py File gslib/util.py (right): https://codereview.appspot.com/305770043/diff/60001/gslib/util.py#newcode735 gslib/util.py:735: shortened string version for this number. On 2016/07/18 21:45:28, thobrla wrote: > Still need more documentation - what's the length of the string that's returned? Done. https://codereview.appspot.com/305770043/diff/60001/gslib/util.py#newcode755 gslib/util.py:755: the nearest integer from remaining_time (%H might be higher than 23). On 2016/07/18 21:45:28, thobrla wrote: > Trailing whitespace - please run lint. I did run it right before sending it, don't know what happened. https://codereview.appspot.com/305770043/diff/60001/gslib/util.py#newcode759 gslib/util.py:759: if not remaining_time: On 2016/07/18 21:45:28, thobrla wrote: > I think this code is all working around remaining time "undefined", but we > shouldn't even call this function if it's undefined. Ok, I can easily adjust that. https://codereview.appspot.com/305770043/diff/60001/gslib/util.py#newcode764 gslib/util.py:764: hours = int(remaining_time / 3600) On 2016/07/18 21:45:28, thobrla wrote: > What about the case where hours > 99? Oh yes, I forgot about it, we had agreed on displaying a "Over 100 hours" message, right?
Sign in to reply to this message.
https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:107: fpath = self.CreateTempFile(file_name='looks-zipped.gz', On 2016/07/19 08:13:44, michelzel wrote: > On 2016/07/18 21:45:26, thobrla wrote: > > Are we testing something specific about gzip? > > No, it was just an example. Is it better to use a different extension? Nit: Generally speaking, don't add additional information to a testcase unless it's needed. In this case, the file name implies that you might be testing something related to gzip, when actually the file extension doesn't matter. https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:371: On 2016/07/19 08:13:43, michelzel wrote: > On 2016/07/18 21:45:27, thobrla wrote: > > Add tests for resumption and existing components? > > I added two tests adapted from test_cp_resumable_upload. I think we should also > test for downloads, but I'll need some help on the best way to do that. > > > In addition to that, one important change I've made to the code was to add two > functions CheckUiOutputWithMFlag and CheckUiOutputWithNoMFlag that basically do > all the asserting for the tests. This not only makes the code cleaner, but it is > also much more reusable if we decide to change something in the UI output. Sounds good - still need download tests, though. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py File gslib/ui_controller.py (right): https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:274: file_dict[key] = (status_message.size - last_update[1], last_update[1]) On 2016/07/19 08:13:44, michelzel wrote: > On 2016/07/18 21:45:27, thobrla wrote: > > Class would let you avoid indexing this tuple. > > I didn't quite understand why that would be the case. The class allows us to > write simpler and clearer code, but this particular tuple will still be used. It's a matter of being able to say last_update.existing_progress as opposed to last_update[1]. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:327: if status_message.message_type < FileMessage.COMPONENT_TO_UPLOAD: On 2016/07/19 08:13:44, michelzel wrote: > On 2016/07/18 21:45:27, thobrla wrote: > > Still a brittle less-than comparison on an enum. Be explicit about the types > > and then we don't need to even have a comment. > > Sorry, I'm not sure if I got it. Are you saying to list all the values that come > before COMPONENT_TO_UPLOAD? There are 7 of them, I thought the code would get > confusing. > I'll create a method _isFile() and put it all over there, what do you think? IsFile is a good start, you could use it here, right? https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:353: period, it means that it is over, or it is on an extremely slow progress On 2016/07/19 08:13:44, michelzel wrote: > On 2016/07/18 21:45:27, thobrla wrote: > > What does "it is over" mean? That it was completed? It seems like we would > get > > a finished message for that. > > We will eventually. Is it guaranteed to be right after, though? I am looking for clarification on what "it is over" means in the comments. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:363: # Just to avoid -0.00 Bs On 2016/07/19 08:13:44, michelzel wrote: > On 2016/07/18 21:45:27, thobrla wrote: > > Super nit: B/s > > We are using 'Bs' on the throughput. Should I add a '/' there as well? (I'm > assuming yes, let me know otherwise) Yes, since the unit is bytes/sec, not bytes*seconds https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:413: percentage_completed = '%d' % int( On 2016/07/19 08:13:44, michelzel wrote: > On 2016/07/18 21:45:28, thobrla wrote: > > Do we want to 0-pad this int so it takes up a fixed width on the screen? > > While I agree we should have a fixed width on the screen for everything we can, > I think it is better to use %3d as opposed to %03d here, because 06% does not > seem very natural, and we achieve our goal in both cases. Good choice. https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:445: # '\ [2/3 completed files] [101.00 MiB/1.01 GiB] 10% Done, 82.34 MiBs' On 2016/07/19 08:13:45, michelzel wrote: > On 2016/07/18 21:45:28, thobrla wrote: > > Is it possible we'll have % done without throughput? It would be nice to have > > one guaranteed to precede the other so that the UI doesn't jump around on the > > user's screen. > > You are right that we might have something like progress + throughput and then > progress + percentage + throughput. However, both scenarios can happen (the > first being only displaying throughput and then throughput and percentage, and > the second being only displaying percentage and then percentage and throughput), > so I don't think we have much of a choice here. It will only happen once though, > so I don't see it as a very big deal. Let me know if you have any suggestion on > how we may approach this. One possibility is choosing a strict ordering for printing them and not printing the second one until we have the first. That doesn't seem like it would be consistent, though. Another possibility would be to print placeholders: ----- MiB/s ---% done https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:48: HALT_SIZE = 131072 Why is this value chosen? Needs a comment. https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:67: def CheckUiOutputWithMFlag(test_case, content, num_objects, total_size): This seems like a superset of CheckUiOutputWithNoMFlag, could we reuse it? https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:106: download_size = DOWNLOAD_SIZE Why not just use these constants directly from the module? https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:276: """Tests that UI behaves correctly on a resumed operation with -m flag. What is actually being resumed here? We're using a resumable upload but still starting from scratch. https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:461: for i in range(component_number1): Does "component_number1" mean the total amount of components? This could be confused with a single numbered component. https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:462: # Each update has size size1 / (component_number1*progress_calls_number) pylint: disable=line-too-long Just fix the long line, here and below https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:466: start_time + 300 + j * (component_number1 + component_number2) Would it make more sense to assign start_time separately since its value is a complex computation and is reused? https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:498: # There were 2-second periods when no progress was reported. The Are the periods above really 2 seconds long? It seems like start time varies by 100 https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:501: zero = MakeHumanReadable(0) In the comment, we say we will use HumanReadableWithDecimalPlaces but here we use MakeHumanReadable? https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:513: # There were 2-second periods when one progress from each file was reported. pylint: disable=line-too-long Unless there's an unbreakable URL or the like, you shouldn't use long lines for comments. Just move the remainder of the comment to the next line. https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py File gslib/ui_controller.py (right): https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:50: class _ProgressInformation(object): Nice job with this class, the docs are clear. https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:86: update_throughput_period: Minimum period for updating throughput Nit: end sentences with periods. https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:100: # It has priority over self.total_size_from_producer_thread and over If these are constant enum priorities, define an enum class (example at gslib.tracker_file.TrackerFileType) instead of class member variables? https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:327: def _isFile(self, file_message): # pylint: disable=invalid-name Should be _IsFile instead of a linter disable https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:331: Args: Add newline before "Args:" https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:364: # this refers to a file. Nit: capitalization https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:463: # '\ [2/3 completed files] [101.00 MiB/1.01 GiB] 10% Done'. 10% would be padded by an additional space, yes? Do we want to space-pad the number of completed files so it is in a fixed location? https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:484: # '\ [2/3 completed files] [101.00 MiB/1.01 GiB] 10% Done, 82.34 MiBs' Should be MiB/s here and below https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:499: string_to_print += ', ETA ' + str(time_remaining) Is it helpful to print None for an ETA? Why not just wait until we have something meaningful to report (or print a fixed-width placeholder) https://codereview.appspot.com/305770043/diff/100001/gslib/util.py File gslib/util.py (right): https://codereview.appspot.com/305770043/diff/100001/gslib/util.py#newcode736 gslib/util.py:736: scale (thousand, million or billion) smaller than the number and divides it So this produces a string with maximum length 6? https://codereview.appspot.com/305770043/diff/100001/gslib/util.py#newcode763 gslib/util.py:763: if hours >= 100: 'Over 100 hours' takes up more space that the other estimates, which makes things lining up harder. What about: '>XXX hrs' where 100 <= X <= 999? '05:32:99' <-- lines up and has the same length
Sign in to reply to this message.
https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:48: HALT_SIZE = 131072 On 2016/07/19 22:34:29, thobrla wrote: > Why is this value chosen? Needs a comment. Done. https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:67: def CheckUiOutputWithMFlag(test_case, content, num_objects, total_size): On 2016/07/19 22:34:29, thobrla wrote: > This seems like a superset of CheckUiOutputWithNoMFlag, could we reuse it? Not exactly. For instance, while we have '3 completed files' for CheckUiOutputWithNoMFlag, we have '3/5 completed files' for CheckUiOutputWithMFlag.The total_size_string assert is the same, but that's about it. https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:106: download_size = DOWNLOAD_SIZE On 2016/07/19 22:34:29, thobrla wrote: > Why not just use these constants directly from the module? No reason, I used the format from test_cp, but I don't see any reason for not using directly from the module. I made the change. https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:276: """Tests that UI behaves correctly on a resumed operation with -m flag. On 2016/07/19 22:34:29, thobrla wrote: > What is actually being resumed here? We're using a resumable upload but still > starting from scratch. Yes. I used a different test case now (also editing parallel_composite_upload_component_size on boto). It seems to make a lot more sense now, at least for me. https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:461: for i in range(component_number1): On 2016/07/19 22:34:29, thobrla wrote: > Does "component_number1" mean the total amount of components? This could be > confused with a single numbered component. Yes, I'll use component_num_file1 instead. https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:462: # Each update has size size1 / (component_number1*progress_calls_number) pylint: disable=line-too-long On 2016/07/19 22:34:29, thobrla wrote: > Just fix the long line, here and below Here I made the long line to ensure the expression would stay in the same line, but I'll instead break before the expression starts. https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:466: start_time + 300 + j * (component_number1 + component_number2) On 2016/07/19 22:34:29, thobrla wrote: > Would it make more sense to assign start_time separately since its value is a > complex computation and is reused? I'm not sure I got it. The variable start_time is assigned only once. I do think we could have a progress_base_start_time defined as start_time + 300 + j * (component_num_file1 + component_num_file2), and then use base_start_time + i and base_start_time + i + component_num_file1 as the time field. https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:498: # There were 2-second periods when no progress was reported. The On 2016/07/19 22:34:29, thobrla wrote: > Are the periods above really 2 seconds long? It seems like start time varies by > 100 The time varies by 100 at most time, but between ProgressMessages, where I'm testing the throughput, it only varies by one. https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:501: zero = MakeHumanReadable(0) On 2016/07/19 22:34:29, thobrla wrote: > In the comment, we say we will use HumanReadableWithDecimalPlaces but here we > use MakeHumanReadable? Yes, I saw that. It was supposed to be HumanReadableWithDecimalPlaces on both places. Fixed it. https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:513: # There were 2-second periods when one progress from each file was reported. pylint: disable=line-too-long On 2016/07/19 22:34:29, thobrla wrote: > Unless there's an unbreakable URL or the like, you shouldn't use long lines for > comments. Just move the remainder of the comment to the next line. Understood, thanks. https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py File gslib/ui_controller.py (right): https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:50: class _ProgressInformation(object): On 2016/07/19 22:34:30, thobrla wrote: > Nice job with this class, the docs are clear. Thank you! https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:86: update_throughput_period: Minimum period for updating throughput On 2016/07/19 22:34:30, thobrla wrote: > Nit: end sentences with periods. Done. https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:100: # It has priority over self.total_size_from_producer_thread and over On 2016/07/19 22:34:30, thobrla wrote: > If these are constant enum priorities, define an enum class (example at > gslib.tracker_file.TrackerFileType) instead of class member variables? Done. Thanks for the example! https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:327: def _isFile(self, file_message): # pylint: disable=invalid-name On 2016/07/19 22:34:29, thobrla wrote: > Should be _IsFile instead of a linter disable Oh yeah, that makes sense. I thought it was something like the put from MainThreadUIQueue. https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:331: Args: On 2016/07/19 22:34:29, thobrla wrote: > Add newline before "Args:" Yes, thanks. https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:364: # this refers to a file. On 2016/07/19 22:34:30, thobrla wrote: > Nit: capitalization Done. https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:463: # '\ [2/3 completed files] [101.00 MiB/1.01 GiB] 10% Done'. On 2016/07/19 22:34:30, thobrla wrote: > 10% would be padded by an additional space, yes? > Do we want to space-pad the number of completed files so it is in a fixed > location? Yes, I'll correct that. I'm not so sure about space-padding for the number of completed files: let's say we have a operation of 100.9k files. We would stay a long time displaying [ X.Yk/100.9k completed files]. I don't have a very strong position about this, though. https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:484: # '\ [2/3 completed files] [101.00 MiB/1.01 GiB] 10% Done, 82.34 MiBs' On 2016/07/19 22:34:29, thobrla wrote: > Should be MiB/s here and below Done. https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:499: string_to_print += ', ETA ' + str(time_remaining) On 2016/07/19 22:34:29, thobrla wrote: > Is it helpful to print None for an ETA? Why not just wait until we have > something meaningful to report (or print a fixed-width placeholder) Sounds good. https://codereview.appspot.com/305770043/diff/100001/gslib/util.py File gslib/util.py (right): https://codereview.appspot.com/305770043/diff/100001/gslib/util.py#newcode736 gslib/util.py:736: scale (thousand, million or billion) smaller than the number and divides it On 2016/07/19 22:34:30, thobrla wrote: > So this produces a string with maximum length 6? Supposing we are dealing with <10^12 files, yes. https://codereview.appspot.com/305770043/diff/100001/gslib/util.py#newcode763 gslib/util.py:763: if hours >= 100: On 2016/07/19 22:34:30, thobrla wrote: > 'Over 100 hours' takes up more space that the other estimates, which makes > things lining up harder. > > What about: > '>XXX hrs' where 100 <= X <= 999? > > '05:32:99' <-- lines up and has the same length I think the second has actually one more character, right (7 vs. 8)? What about '100+ hrs' if X >= 100? Or perhaps '%d+ hrs' % min(999, X)? (Currently implementing '%d+ hrs' % min(999, X))
Sign in to reply to this message.
https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:48: HALT_SIZE = 131072 On 2016/07/20 17:35:50, michelzel wrote: > On 2016/07/19 22:34:29, thobrla wrote: > > Why is this value chosen? Needs a comment. > > Done. No, I mean, what is the meaning of this size - I assume this has to do with HaltingCopyCallbackHandler, so can we define it alongside that instead? https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:67: def CheckUiOutputWithMFlag(test_case, content, num_objects, total_size): On 2016/07/20 17:35:50, michelzel wrote: > On 2016/07/19 22:34:29, thobrla wrote: > > This seems like a superset of CheckUiOutputWithNoMFlag, could we reuse it? > > Not exactly. For instance, while we have '3 completed files' for > CheckUiOutputWithNoMFlag, we have '3/5 completed files' for > CheckUiOutputWithMFlag.The total_size_string assert is the same, but that's > about it. Acknowledged. https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:498: # There were 2-second periods when no progress was reported. The On 2016/07/20 17:35:50, michelzel wrote: > On 2016/07/19 22:34:29, thobrla wrote: > > Are the periods above really 2 seconds long? It seems like start time varies > by > > 100 > > The time varies by 100 at most time, but between ProgressMessages, where I'm > testing the throughput, it only varies by one. If it only varies by one, how is the comment correct that there were 2-second periods where no progress was reported? https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py File gslib/ui_controller.py (right): https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:100: # It has priority over self.total_size_from_producer_thread and over On 2016/07/20 17:35:51, michelzel wrote: > On 2016/07/19 22:34:30, thobrla wrote: > > If these are constant enum priorities, define an enum class (example at > > gslib.tracker_file.TrackerFileType) instead of class member variables? > > Done. Thanks for the example! These are much clearer, thanks. https://codereview.appspot.com/305770043/diff/100001/gslib/ui_controller.py#n... gslib/ui_controller.py:463: # '\ [2/3 completed files] [101.00 MiB/1.01 GiB] 10% Done'. On 2016/07/20 17:35:51, michelzel wrote: > On 2016/07/19 22:34:30, thobrla wrote: > > 10% would be padded by an additional space, yes? > > Do we want to space-pad the number of completed files so it is in a fixed > > location? > > Yes, I'll correct that. I'm not so sure about space-padding for the number of > completed files: let's say we have a operation of 100.9k files. We would stay a > long time displaying [ X.Yk/100.9k completed files]. I don't have a very strong > position about this, though. Couldn't you choose to display something like: [ 10/100.9k completed files] [2,359/100.9k completed files] [12.3k/100.9k completed files] ? https://codereview.appspot.com/305770043/diff/100001/gslib/util.py File gslib/util.py (right): https://codereview.appspot.com/305770043/diff/100001/gslib/util.py#newcode763 gslib/util.py:763: if hours >= 100: On 2016/07/20 17:35:51, michelzel wrote: > On 2016/07/19 22:34:30, thobrla wrote: > > 'Over 100 hours' takes up more space that the other estimates, which makes > > things lining up harder. > > > > What about: > > '>XXX hrs' where 100 <= X <= 999? > > > > '05:32:99' <-- lines up and has the same length > > I think the second has actually one more character, right (7 vs. 8)? What about > '100+ hrs' if X >= 100? Or perhaps '%d+ hrs' % min(999, X)? (Currently > implementing '%d+ hrs' % min(999, X)) Looks good. https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:272: @SkipForS3('No resumable upload support for S3.') Still missing tests for: resumable download resumable upload with components resumable download with components https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:553: # except for having to wait at least 2 seconds(considering the time informed Nit: space after paren https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:560: # processed by the UIController. However, the start_time does not influence "does not influence this test" https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:562: # difference between two messages, which is fixed in this text. s/text/test/ https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:589: # We will send progress_calls_number ProgressMessages for each component These aren't components anymore, right? https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:592: size1, j * size1 / 4, src_url1, In this case and the case above, we're testing a constant throughput. Should we also test a non-constant throughput case? https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:601: # Time to finish the components and files. Again, components are not used in this test. https://codereview.appspot.com/305770043/diff/140001/gslib/util.py File gslib/util.py (right): https://codereview.appspot.com/305770043/diff/140001/gslib/util.py#newcode738 gslib/util.py:738: create a string of at most 6 characters, assuming num < 10^12. What about num > 10^12? Would it be hard to add a few more EXP_TEN_STRINGS to cover up to very large numbers?
Sign in to reply to this message.
https://codereview.appspot.com/305770043/diff/1/gslib/command.py File gslib/command.py (right): https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1837 gslib/command.py:1837: self.total_size += status_item.size On 2016/07/14 19:57:47, thobrla wrote: > On 2016/07/13 22:54:08, michelzel wrote: > > On 2016/07/12 21:32:35, thobrla wrote: > > > Once the producer thread finishes, would we want to adjust the number from > the > > > seek-ahead thread in case there is any discrepancy? For example, if during > a > > > long-running operation the user adds files, our final number should > represent > > > what was actually transferred. > > > > I'm not sure I understand. If I get what you are saying (one example would be > > trying to upload all the files of a local folder to a bucket, and during the > > operation adding more files to this folder, right?), would our operation > > actually upload this newly-added file? If so, I'm having a hard time imagining > > what we could in that case. > > I see two possible scenarios: trying to store on seek_ahead_thread the name of > > the files that we passed through (and we would have to add this argument to > the > > SeekAheadMessage), or updating total_size as we get new files. > > I think describing the actual total is better. The ProducerThread will operate > on its own list results, and that will reflect what we actually transfer. My > suggestion is once the ProducerThread finishes enumerating all tasks, we use > that as the final value instead of the SeekAheadThread since the source could > have changed in between those two times and the SeekAheadThread is there for > estimation only. I see. I'll add a finished field on ProducerThreadMessage to be used by the last ProducerThreadMessage we send. The good thing about creating that priority model was that we can easily add this new total size source and keep most of the code intact. https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode1847 gslib/command.py:1847: file_progress = self.individual_file_progress[file_index] On 2016/07/14 19:57:47, thobrla wrote: > On 2016/07/13 22:54:09, michelzel wrote: > > On 2016/07/12 21:32:35, thobrla wrote: > > > It seems like when the file is finished, we should clean up/delete its > entries > > > or else index/progress will grow arbitrarily large. > > > > It will grow up to the number of objects, right? I could maybe create another > > compact dict or list of finished objects and erase all the component-specific > > information from those files. > > Yes, and there will potentially be millions or even billions of objects. It > doesn't seem right that this should take up multiple megabytes of memory if > we're only using 20 threads. Ok, I am deleting the file progress once we receive and end file FileMessage, and I'll add the file src_url on a dict that contains the src_url for all files finished. Using a dict instead of a list to make query faster (in case we need to check whether or not a file has finished). https://codereview.appspot.com/305770043/diff/1/gslib/command.py#newcode2067 gslib/command.py:2067: self.logger = logger if logger else CreateGsutilLogger('UI') On 2016/07/14 19:57:48, thobrla wrote: > On 2016/07/13 22:54:09, michelzel wrote: > > On 2016/07/12 21:32:34, thobrla wrote: > > > Is this used anymore? > > > > No, but since we talked about having a logger mode later on, I thought it > > wouldn't hurt to keep it. Do you think we should remove for now? > > In general, avoid adding code unless you need it now. Acknowledged. https://codereview.appspot.com/305770043/diff/1/gslib/commands/rewrite.py File gslib/commands/rewrite.py (right): https://codereview.appspot.com/305770043/diff/1/gslib/commands/rewrite.py#new... gslib/commands/rewrite.py:385: (fixed) position. On 2016/07/14 19:57:48, thobrla wrote: > On 2016/07/13 22:54:10, michelzel wrote: > > On 2016/07/12 21:32:36, thobrla wrote: > > > Why do we want these in a fixed position as opposed to outputting \n? > Aren't > > we > > > managing these via the UI thread? > > > > I think we should definitely output \n in this case, but those aren't treated > by > > the UIThread directly. This is actually similar to keeping the 'Estimated > number > > of objects ...' message for the UIThread: I kept it because it did not seem to > > affect us, and tests would fail otherwise. > > I will update the description here. > > I see - these are data operations so I think we need to handle them in the UI > thread. However, this can be done in a future CL. Can you add a TODO? Acknowledged. https://codereview.appspot.com/305770043/diff/20001/gslib/util.py File gslib/util.py (right): https://codereview.appspot.com/305770043/diff/20001/gslib/util.py#newcode1528 gslib/util.py:1528: def CreateGsutilLogger(command_name): On 2016/07/18 21:45:26, thobrla wrote: > On 2016/07/15 09:44:33, michelzel wrote: > > On 2016/07/14 19:57:51, thobrla wrote: > > > If you don't need this elsewhere, leave it in command.py for now. > > > > I use it in the UIThread in case we don't receive a logger (although we are > not > > quite using logger right now). > > If we're not using it, we should save it for the CL when we do use it. Ok, changed it. https://codereview.appspot.com/305770043/diff/40001/gslib/ui_controller.py File gslib/ui_controller.py (right): https://codereview.appspot.com/305770043/diff/40001/gslib/ui_controller.py#ne... gslib/ui_controller.py:56: update_time_period: minimum period for refreshing information. On 2016/07/18 21:45:26, thobrla wrote: > Here and below, what's the behavior of 0 values? I'll add the following comment: "A non-positive value will ignore any time restrictions imposed by this field." https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/60001/gslib/tests/test_ui.py#ne... gslib/tests/test_ui.py:107: fpath = self.CreateTempFile(file_name='looks-zipped.gz', On 2016/07/19 22:34:28, thobrla wrote: > On 2016/07/19 08:13:44, michelzel wrote: > > On 2016/07/18 21:45:26, thobrla wrote: > > > Are we testing something specific about gzip? > > > > No, it was just an example. Is it better to use a different extension? > > Nit: Generally speaking, don't add additional information to a testcase unless > it's needed. In this case, the file name implies that you might be testing > something related to gzip, when actually the file extension doesn't matter. Ok, got it. Thanks https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py File gslib/ui_controller.py (right): https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:353: period, it means that it is over, or it is on an extremely slow progress On 2016/07/19 22:34:28, thobrla wrote: > On 2016/07/19 08:13:44, michelzel wrote: > > On 2016/07/18 21:45:27, thobrla wrote: > > > What does "it is over" mean? That it was completed? It seems like we would > > get > > > a finished message for that. > > > > We will eventually. Is it guaranteed to be right after, though? > > I am looking for clarification on what "it is over" means in the comments. I changed the documentation to "it means that the file has finished but has yet to receive a FileMessage indicating so". https://codereview.appspot.com/305770043/diff/60001/gslib/ui_controller.py#ne... gslib/ui_controller.py:445: # '\ [2/3 completed files] [101.00 MiB/1.01 GiB] 10% Done, 82.34 MiBs' On 2016/07/19 22:34:28, thobrla wrote: > On 2016/07/19 08:13:45, michelzel wrote: > > On 2016/07/18 21:45:28, thobrla wrote: > > > Is it possible we'll have % done without throughput? It would be nice to > have > > > one guaranteed to precede the other so that the UI doesn't jump around on > the > > > user's screen. > > > > You are right that we might have something like progress + throughput and then > > progress + percentage + throughput. However, both scenarios can happen (the > > first being only displaying throughput and then throughput and percentage, and > > the second being only displaying percentage and then percentage and > throughput), > > so I don't think we have much of a choice here. It will only happen once > though, > > so I don't see it as a very big deal. Let me know if you have any suggestion > on > > how we may approach this. > > One possibility is choosing a strict ordering for printing them and not printing > the second one until we have the first. That doesn't seem like it would be > consistent, though. Another possibility would be to print placeholders: > ----- MiB/s > ---% done Ok. However, what about operations where we won't show some of these at all? Do you think it is better to put a placeholder and possibly not display any actual information as opposed to adding the information, only when it is ready to be printed? https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:48: HALT_SIZE = 131072 On 2016/07/20 17:56:14, thobrla wrote: > On 2016/07/20 17:35:50, michelzel wrote: > > On 2016/07/19 22:34:29, thobrla wrote: > > > Why is this value chosen? Needs a comment. > > > > Done. > > No, I mean, what is the meaning of this size - I assume this has to do with > HaltingCopyCallbackHandler, so can we define it alongside that instead? Added a comment (I actually changed the value since we changed _START_BYTES_PER_CALLBACK in progress_callback). https://codereview.appspot.com/305770043/diff/100001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:498: # There were 2-second periods when no progress was reported. The On 2016/07/20 17:56:14, thobrla wrote: > On 2016/07/20 17:35:50, michelzel wrote: > > On 2016/07/19 22:34:29, thobrla wrote: > > > Are the periods above really 2 seconds long? It seems like start time > varies > > by > > > 100 > > > > The time varies by 100 at most time, but between ProgressMessages, where I'm > > testing the throughput, it only varies by one. > > If it only varies by one, how is the comment correct that there were 2-second > periods where no progress was reported? You are right. In this case, it is not a 2-second period. This was for the others, I didn't think it through. https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:272: @SkipForS3('No resumable upload support for S3.') On 2016/07/20 17:56:14, thobrla wrote: > Still missing tests for: > resumable download > resumable upload with components > resumable download with components Done. https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:553: # except for having to wait at least 2 seconds(considering the time informed On 2016/07/20 17:56:14, thobrla wrote: > Nit: space after paren Acknowledged. https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:560: # processed by the UIController. However, the start_time does not influence On 2016/07/20 17:56:14, thobrla wrote: > "does not influence this test" Acknowledged. https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:562: # difference between two messages, which is fixed in this text. On 2016/07/20 17:56:14, thobrla wrote: > s/text/test/ That was ugly! https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:589: # We will send progress_calls_number ProgressMessages for each component On 2016/07/20 17:56:14, thobrla wrote: > These aren't components anymore, right? Acknowledged. https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:592: size1, j * size1 / 4, src_url1, On 2016/07/20 17:56:14, thobrla wrote: > In this case and the case above, we're testing a constant throughput. Should we > also test a non-constant throughput case? I'm not really concerned about that because that's not something the UI knows about. The only possible error would be if we were getting wrong values for both the progress and the time, at the same wrong proportion (getting progress on a 1-second basis instead of a 2-second basis, in this case). However, this is already covered by the last assert (the one with average_progress), which means we have covered the right progress for the right timespan. https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:601: # Time to finish the components and files. On 2016/07/20 17:56:14, thobrla wrote: > Again, components are not used in this test. Acknowledged. https://codereview.appspot.com/305770043/diff/140001/gslib/util.py File gslib/util.py (right): https://codereview.appspot.com/305770043/diff/140001/gslib/util.py#newcode738 gslib/util.py:738: create a string of at most 6 characters, assuming num < 10^12. On 2016/07/20 17:56:15, thobrla wrote: > What about num > 10^12? Would it be hard to add a few more EXP_TEN_STRINGS to > cover up to very large numbers? Will add trillion ('t') and quadrillion ('q') to EXP_TEN_STRINGS.
Sign in to reply to this message.
https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:589: # We will send progress_calls_number ProgressMessages for each component On 2016/07/21 14:48:57, michelzel wrote: > On 2016/07/20 17:56:14, thobrla wrote: > > These aren't components anymore, right? > > Acknowledged. But not fixed? https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:601: # Time to finish the components and files. On 2016/07/21 14:48:57, michelzel wrote: > On 2016/07/20 17:56:14, thobrla wrote: > > Again, components are not used in this test. > > Acknowledged. But not fixed? https://codereview.appspot.com/305770043/diff/140001/gslib/tests/util.py File gslib/tests/util.py (right): https://codereview.appspot.com/305770043/diff/140001/gslib/tests/util.py#newc... gslib/tests/util.py:489: # Custom test callbacks must be pickleable, and therefore at global scope. Nit: I think this comment is still useful. https://codereview.appspot.com/305770043/diff/160001/gslib/tests/test_cp.py File gslib/tests/test_cp.py (right): https://codereview.appspot.com/305770043/diff/160001/gslib/tests/test_cp.py#n... gslib/tests/test_cp.py:2226: for _ in xrange(ONE_KIB * 512)]) Why did you change this? https://codereview.appspot.com/305770043/diff/160001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/160001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:347: This was adapted from _test_cp_resumable_download_break_helper Nit: Here and below, end sentences in periods, add whitespace line before "Args:" https://codereview.appspot.com/305770043/diff/160001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:352: gsutil_flags: List of flags to run gsutil with. Or None? https://codereview.appspot.com/305770043/diff/160001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:371: self.assertIn('Artifically halting download.', stderr) Here and below, what do we expect the UI to look like during this first run? https://codereview.appspot.com/305770043/diff/160001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:401: def _test_ui_composite_encrypted_upload_resume_helper(self, You should drop the encryption part from this, I think, as it shouldn't impact the UI or calls. https://codereview.appspot.com/305770043/diff/160001/gslib/thread_message.py File gslib/thread_message.py (right): https://codereview.appspot.com/305770043/diff/160001/gslib/thread_message.py#... gslib/thread_message.py:60: # was to process the component alongside FileMessages from other components s/was/is/ https://codereview.appspot.com/305770043/diff/160001/gslib/ui_controller.py File gslib/ui_controller.py (right): https://codereview.appspot.com/305770043/diff/160001/gslib/ui_controller.py#n... gslib/ui_controller.py:261: del file_progress This won't actually remove it from the individual_file_progress. You'll need to do: del self.individual_file_progress[file_name] Additionally, the goal is not to save any of the finished file names in memory - why do we need to remember them? files_finished_dict is completely unused. https://codereview.appspot.com/305770043/diff/160001/gslib/util.py File gslib/util.py (right): https://codereview.appspot.com/305770043/diff/160001/gslib/util.py#newcode743 gslib/util.py:743: for i in range(len(_EXP_TEN_STRING)-1, 0, -1): Nit: Consider: for divisor, abbrev in reversed(_EXP_TEN_STRING):
Sign in to reply to this message.
https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:589: # We will send progress_calls_number ProgressMessages for each component On 2016/07/21 16:55:43, thobrla wrote: > On 2016/07/21 14:48:57, michelzel wrote: > > On 2016/07/20 17:56:14, thobrla wrote: > > > These aren't components anymore, right? > > > > Acknowledged. > > But not fixed? Changed in the test with components, sorry! https://codereview.appspot.com/305770043/diff/140001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:601: # Time to finish the components and files. On 2016/07/21 16:55:43, thobrla wrote: > On 2016/07/21 14:48:57, michelzel wrote: > > On 2016/07/20 17:56:14, thobrla wrote: > > > Again, components are not used in this test. > > > > Acknowledged. > > But not fixed? Changed in the test with components, sorry! https://codereview.appspot.com/305770043/diff/140001/gslib/tests/util.py File gslib/tests/util.py (right): https://codereview.appspot.com/305770043/diff/140001/gslib/tests/util.py#newc... gslib/tests/util.py:489: # Custom test callbacks must be pickleable, and therefore at global scope. On 2016/07/21 16:55:43, thobrla wrote: > Nit: I think this comment is still useful. I'll put it back. https://codereview.appspot.com/305770043/diff/160001/gslib/tests/test_cp.py File gslib/tests/test_cp.py (right): https://codereview.appspot.com/305770043/diff/160001/gslib/tests/test_cp.py#n... gslib/tests/test_cp.py:2226: for _ in xrange(ONE_KIB * 512)]) On 2016/07/21 16:55:44, thobrla wrote: > Why did you change this? Because we changed the halt size. I might be wrong, but from what I understood, the 128 was there because it was the halt_size, which was 2 times the start_bytes_per_callback, which was changed. Should I put self.halt_size instead of 512? I think it makes more sense, right? https://codereview.appspot.com/305770043/diff/160001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/160001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:347: This was adapted from _test_cp_resumable_download_break_helper On 2016/07/21 16:55:44, thobrla wrote: > Nit: Here and below, end sentences in periods, add whitespace line before > "Args:" Acknowledged. https://codereview.appspot.com/305770043/diff/160001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:352: gsutil_flags: List of flags to run gsutil with. On 2016/07/21 16:55:44, thobrla wrote: > Or None? Yes, I'll update it. I had actually put [] as the default value, but pylint suggested it was dangerous, and I forgot to put None in the comments. https://codereview.appspot.com/305770043/diff/160001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:371: self.assertIn('Artifically halting download.', stderr) On 2016/07/21 16:55:44, thobrla wrote: > Here and below, what do we expect the UI to look like during this first run? I'm guessing we basically need to check if we do not have any signal of the UI having finished, basically the opposite of CheckUIOutputWith(No)MFlag https://codereview.appspot.com/305770043/diff/160001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:401: def _test_ui_composite_encrypted_upload_resume_helper(self, On 2016/07/21 16:55:44, thobrla wrote: > You should drop the encryption part from this, I think, as it shouldn't impact > the UI or calls. Done. https://codereview.appspot.com/305770043/diff/160001/gslib/thread_message.py File gslib/thread_message.py (right): https://codereview.appspot.com/305770043/diff/160001/gslib/thread_message.py#... gslib/thread_message.py:60: # was to process the component alongside FileMessages from other components On 2016/07/21 16:55:44, thobrla wrote: > s/was/is/ Done. https://codereview.appspot.com/305770043/diff/160001/gslib/ui_controller.py File gslib/ui_controller.py (right): https://codereview.appspot.com/305770043/diff/160001/gslib/ui_controller.py#n... gslib/ui_controller.py:261: del file_progress On 2016/07/21 16:55:44, thobrla wrote: > This won't actually remove it from the individual_file_progress. You'll need to > do: > del self.individual_file_progress[file_name] > > Additionally, the goal is not to save any of the finished file names in memory - > why do we need to remember them? files_finished_dict is completely unused. Thanks for the del part. We really are not using files_finished_dict, but I thought it could be useful in the future. But, again, if that's the case, it should be on a future CL. https://codereview.appspot.com/305770043/diff/160001/gslib/util.py File gslib/util.py (right): https://codereview.appspot.com/305770043/diff/160001/gslib/util.py#newcode743 gslib/util.py:743: for i in range(len(_EXP_TEN_STRING)-1, 0, -1): On 2016/07/21 16:55:44, thobrla wrote: > Nit: Consider: > for divisor, abbrev in reversed(_EXP_TEN_STRING): That's cool, did not know about this reversed.
Sign in to reply to this message.
Getting very close! https://codereview.appspot.com/305770043/diff/160001/gslib/tests/test_cp.py File gslib/tests/test_cp.py (right): https://codereview.appspot.com/305770043/diff/160001/gslib/tests/test_cp.py#n... gslib/tests/test_cp.py:2226: for _ in xrange(ONE_KIB * 512)]) On 2016/07/21 18:58:26, michelzel wrote: > On 2016/07/21 16:55:44, thobrla wrote: > > Why did you change this? > > Because we changed the halt size. I might be wrong, but from what I understood, > the 128 was there because it was the halt_size, which was 2 times the > start_bytes_per_callback, which was changed. Should I put self.halt_size instead > of 512? I think it makes more sense, right? using halt_size is fine. https://codereview.appspot.com/305770043/diff/200001/gslib/util.py File gslib/util.py (right): https://codereview.appspot.com/305770043/diff/200001/gslib/util.py#newcode155 gslib/util.py:155: (12, 't'), Add tests for the new 't' and 'q' cases?
Sign in to reply to this message.
Hopefully that should do it! :) https://codereview.appspot.com/305770043/diff/200001/gslib/util.py File gslib/util.py (right): https://codereview.appspot.com/305770043/diff/200001/gslib/util.py#newcode155 gslib/util.py:155: (12, 't'), On 2016/07/21 20:20:09, thobrla wrote: > Add tests for the new 't' and 'q' cases? Done.
Sign in to reply to this message.
LGTM
Sign in to reply to this message.
Looks good, one nit. https://codereview.appspot.com/305770043/diff/220001/gslib/ui_controller.py File gslib/ui_controller.py (right): https://codereview.appspot.com/305770043/diff/220001/gslib/ui_controller.py#n... gslib/ui_controller.py:477: string_to_print = ( It might be nicer if you used format strings with names here: FORMAT_STRING = "[{progress_char} [{finished_file_count}/{total_files} completed files][{bytes_completed}/{total_bytes}]..." string_to_print = FORMAT_STRING.format( progress_char = char_to_print, finished_file_count= _DecimalShort(self.files_finished), total_files = _DecimalShort(self.file_number), ...) Ditto below.
Sign in to reply to this message.
https://codereview.appspot.com/305770043/diff/220001/gslib/ui_controller.py File gslib/ui_controller.py (right): https://codereview.appspot.com/305770043/diff/220001/gslib/ui_controller.py#n... gslib/ui_controller.py:477: string_to_print = ( On 2016/07/25 23:47:51, yarbrough wrote: > It might be nicer if you used format strings with names here: > > FORMAT_STRING = "[{progress_char} [{finished_file_count}/{total_files} completed > files][{bytes_completed}/{total_bytes}]..." > string_to_print = FORMAT_STRING.format( > progress_char = char_to_print, > finished_file_count= _DecimalShort(self.files_finished), > total_files = _DecimalShort(self.file_number), > ...) > > Ditto below. Ok, nice, will do that.
Sign in to reply to this message.
https://codereview.appspot.com/305770043/diff/240001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/240001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:101: command does not give us a total_size, we cannot infer Why do we not have the total size on rewrite?
Sign in to reply to this message.
https://codereview.appspot.com/305770043/diff/240001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/240001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:101: command does not give us a total_size, we cannot infer On 2016/07/27 20:45:16, thobrla wrote: > Why do we not have the total size on rewrite? It does not have the 'size' field, so total_size sums 0 on the ProducerThread. I made some tests here, and the result of args.expanded_result when doing a rewrite operation is {"name": "gsutil-test-test_ui_rewrite_with_m_flag-obj-640134e2"}, which means we are not able to retrieve the size from there. Suggestions?
Sign in to reply to this message.
https://codereview.appspot.com/305770043/diff/240001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/240001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:101: command does not give us a total_size, we cannot infer On 2016/07/27 21:03:24, michelzel wrote: > On 2016/07/27 20:45:16, thobrla wrote: > > Why do we not have the total size on rewrite? > > It does not have the 'size' field, so total_size sums 0 on the ProducerThread. > I made some tests here, and the result of args.expanded_result when doing a > rewrite operation is {"name": > "gsutil-test-test_ui_rewrite_with_m_flag-obj-640134e2"}, which means we are not > able to retrieve the size from there. Suggestions? You can include the size in the NameExpansionIterator fields list in rewrite.py. NameExpansionIterator(..., bucket_listing_fields=['name', 'size'])
Sign in to reply to this message.
https://codereview.appspot.com/305770043/diff/240001/gslib/tests/test_ui.py File gslib/tests/test_ui.py (right): https://codereview.appspot.com/305770043/diff/240001/gslib/tests/test_ui.py#n... gslib/tests/test_ui.py:101: command does not give us a total_size, we cannot infer On 2016/07/27 21:08:14, thobrla wrote: > On 2016/07/27 21:03:24, michelzel wrote: > > On 2016/07/27 20:45:16, thobrla wrote: > > > Why do we not have the total size on rewrite? > > > > It does not have the 'size' field, so total_size sums 0 on the ProducerThread. > > I made some tests here, and the result of args.expanded_result when doing a > > rewrite operation is {"name": > > "gsutil-test-test_ui_rewrite_with_m_flag-obj-640134e2"}, which means we are > not > > able to retrieve the size from there. Suggestions? > > You can include the size in the NameExpansionIterator fields list in rewrite.py. > > NameExpansionIterator(..., bucket_listing_fields=['name', 'size']) Awesome, it's already working!
Sign in to reply to this message.
|