On 2018/04/01 17:32:39, Joachim Metz wrote:
I don't think "to merge" is the name to go with here. Having "merge" and "to
merge" directory names is very confusing. From what I can tell, the idea of this
change was to reduce the amount of time spent iterating over filenames of
completed tasks, for situations where there are large number of files waiting to
merged. What sort of performance increase do you get? My profiling suggested
most of the time spent in the "pending merge" check were in string format()
calls, not listdir() calls, though clearly, removing the number of files to
iterate over will help.
Memoizing task paths might be a less complex way of improving performance,
compared to adding a new directory and task state. We've had a lot of issues
nailing down different task transitions, so I'm not too excited about changing
them again.
https://codereview.appspot.com/338670043/diff/1/plaso/multi_processing/task_manager.py File plaso/multi_processing/task_manager.py (right): https://codereview.appspot.com/338670043/diff/1/plaso/multi_processing/task_manager.py#newcode276 plaso/multi_processing/task_manager.py:276: """Retrieves the task when it is mergeable. You seem ...
https://codereview.appspot.com/338670043/diff/1/plaso/multi_processing/task_manager.py File plaso/multi_processing/task_manager.py (right): https://codereview.appspot.com/338670043/diff/1/plaso/multi_processing/task_manager.py#newcode276 plaso/multi_processing/task_manager.py:276: """Retrieves the task when it is mergeable. Done. https://codereview.appspot.com/338670043/diff/1/plaso/storage/interface.py ...
Issue 338670043: [plaso] Added processed directory to speed up listdir
(Closed)
Created 6 years ago by Joachim Metz
Modified 5 years, 11 months ago
Reviewers: onager
Base URL:
Comments: 42