|
|
Created:
8 years, 8 months ago by Joachim Metz Modified:
8 years, 7 months ago CC:
log2timeline-dev_googlegroups.com Visibility:
Public. |
DescriptionData stream support and special metadata file handling #316 #199
Requires:
* dfvfs 235930043
Patch Set 1 : Clean up. #
Total comments: 44
Patch Set 2 : Changes after review. #
Total comments: 22
Patch Set 3 : Changes after review. #
MessagesTotal messages: 18
Code updated.
Sign in to reply to this message.
On 2015/08/31 14:00:10, whorebrittany wrote: fuck me
Sign in to reply to this message.
I think FUD-bots comments need some more work ;)
Sign in to reply to this message.
although brittany has already replied to your CL I'll still pitch in with few remarks, https://codereview.appspot.com/258650043/diff/20001/plaso/engine/collector.py File plaso/engine/collector.py (right): https://codereview.appspot.com/258650043/diff/20001/plaso/engine/collector.py... plaso/engine/collector.py:184: self._hashlist = {} and no need to track the number of file entries anymore? https://codereview.appspot.com/258650043/diff/20001/plaso/engine/collector.py... plaso/engine/collector.py:185: why removing the __enter__ and __exit__? https://codereview.appspot.com/258650043/diff/20001/plaso/engine/mediator.py File plaso/engine/mediator.py (right): https://codereview.appspot.com/258650043/diff/20001/plaso/engine/mediator.py#... plaso/engine/mediator.py:97: file_entry.path_spec.type_indicator, relative_path) here we are repeating https://github.com/log2timeline/plaso/issues/310 and why creating an engine mediator and not just use the parser mediator that already has logic for getting display names? Are we removing the parser mediator or ?
Sign in to reply to this message.
*16* of *3,477* [image: Collapse all] [image: Print all] [image: In new window] Re: Data stream support. (issue 258650043 by joachim.metz@gmail.com) Inbox x brittany@completemedcare.net via <https://support.google.com/mail/answer/1311182?hl=en> m3kw2wvrgufz5godrsrytgd7.apphosting.bounces.google.com 9:00 AM (7 hours ago) Reply to joachim.metz, kiddi, onager, log2timeline-d., reply https://codereview.appspot.com/258650043/ brittany@completemedcare.net via <https://support.google.com/mail/answer/1311182?hl=en> m3kw2wvrgufz5godrsrytgd7.apphosting.bounces.google.com 9:04 AM (7 hours ago) Reply to joachim.metz, kiddi, onager, log2timeline-d., reply On 2015/08/31 14:00:10, whorebrittany wrote: fuck me https://codereview.appspot.com/258650043/ *Brittany Alaniz NR-CMA, NCT* *Complete Med Care* 8989 Forest Ln # 146 Dallas, TX 75243 (972)-792-7777 On Mon, Aug 31, 2015 at 9:04 AM, <brittany@completemedcare.net> wrote: > On 2015/08/31 14:00:10, whorebrittany wrote: > > fuck me > > https://codereview.appspot.com/258650043/ >
Sign in to reply to this message.
https://codereview.appspot.com/258650043/diff/20001/plaso/engine/collector.py File plaso/engine/collector.py (right): https://codereview.appspot.com/258650043/diff/20001/plaso/engine/collector.py... plaso/engine/collector.py:184: self._hashlist = {} On 2015/08/31 20:36:26, kiddi wrote: > and no need to track the number of file entries anymore? We don't use it. Dead code https://codereview.appspot.com/258650043/diff/20001/plaso/engine/collector.py... plaso/engine/collector.py:185: On 2015/08/31 20:36:26, kiddi wrote: > why removing the __enter__ and __exit__? We don't use it. Dead code https://codereview.appspot.com/258650043/diff/20001/plaso/engine/mediator.py File plaso/engine/mediator.py (right): https://codereview.appspot.com/258650043/diff/20001/plaso/engine/mediator.py#... plaso/engine/mediator.py:97: file_entry.path_spec.type_indicator, relative_path) The changes from issue 310 need to be merged. This is a split off of common code to make the parser mediator less bloated.
Sign in to reply to this message.
I'm not sure the design is right here. Some comments about the mediator(s) specifically. https://codereview.appspot.com/258650043/diff/20001/plaso/engine/collector.py File plaso/engine/collector.py (right): https://codereview.appspot.com/258650043/diff/20001/plaso/engine/collector.py... plaso/engine/collector.py:243: file_entry: a file entry (instance of dfvfs.FileEntry). the file entry that refers to the directory to process. https://codereview.appspot.com/258650043/diff/20001/plaso/engine/mediator.py File plaso/engine/mediator.py (right): https://codereview.appspot.com/258650043/diff/20001/plaso/engine/mediator.py#... plaso/engine/mediator.py:2: """The engine file entry mediator object.""" "mediator" isn't the right design pattern here. This is just a wrapper for a file entry, whereas the parser mediator provides access to a range of functionality provided by a range of different subsystems (queue interfaces, the KB etc.) https://codereview.appspot.com/258650043/diff/20001/plaso/engine/mediator.py#... plaso/engine/mediator.py:10: """Class that implements the file entry mediator.""" Which does what? "A file entry mediator provides convenience methods for interacting with a single file entry at a time." https://codereview.appspot.com/258650043/diff/20001/plaso/engine/mediator.py#... plaso/engine/mediator.py:18: """Retrieves the relative path. Retrieves the path from the pathspec relative to... https://codereview.appspot.com/258650043/diff/20001/plaso/engine/mediator.py#... plaso/engine/mediator.py:24: A string containing the relative path or None. When will this return None? https://codereview.appspot.com/258650043/diff/20001/plaso/engine/mediator.py#... plaso/engine/mediator.py:55: """Retrieves the display name for the active file entry. Generates a human readable string describing a file entry object. Nothing is being "retrieved", the string is built here. Also, we don't describe what a DisplayName is anywhere. https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py File plaso/engine/worker.py (right): https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py#ne... plaso/engine/worker.py:128: file_object: the file-like object to be hashed. "whose contents will be hashed" https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py#ne... plaso/engine/worker.py:151: The resulting digest hashes are set in the parser mediator as attribute +s https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py#ne... plaso/engine/worker.py:153: to have data streams, e.g. NTFS. So? The file entry arg may actually be a directory, not just a file? https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py#ne... plaso/engine/worker.py:166: self._current_display_name)) Recurrent theme - this log line doesn't account for different data streams in/on the same file. It's going to be important to know when we run into an error if its in a particular stream on a file. https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py#ne... plaso/engine/worker.py:192: digest_hash_string, self._current_display_name)) Need to do something with the stream name here too. https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py#ne... plaso/engine/worker.py:221: will use the parser mediator to open the file-like object. So this will open the default data stream of the file_entry set in the mediator, right? https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py#ne... plaso/engine/worker.py:436: """Processes a specific data stream of a file entry. Duplicate docstring? If not, how is this method different to _ProcessDataStream? https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py#ne... plaso/engine/worker.py:440: data_stream_name: optional data stream name. The default is Do we need this arg here? https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py#ne... plaso/engine/worker.py:524: data_stream_name = getattr(path_spec, u'data_stream', None) It seems like our abstractions are breaking down a little here - we have to go from pathspec->file_entry->pathspec+file_entry->file_object, which is pretty messy. Any thoughts on a better way to do this? Some other class that encapsulates a file_entry with the data stream information? https://codereview.appspot.com/258650043/diff/20001/plaso/hashers/manager.py File plaso/hashers/manager.py (right): https://codereview.appspot.com/258650043/diff/20001/plaso/hashers/manager.py#... plaso/hashers/manager.py:140: def HashFileObject(cls, hasher_names_string, file_object, buffer_size=4096): Lets not duplicate the default buffer size here - it's already set in the worker. https://codereview.appspot.com/258650043/diff/20001/plaso/parsers/interface.py File plaso/parsers/interface.py (right): https://codereview.appspot.com/258650043/diff/20001/plaso/parsers/interface.p... plaso/parsers/interface.py:247: # The initial file offset set to None if not set. This isn't None https://codereview.appspot.com/258650043/diff/20001/plaso/parsers/interface.p... plaso/parsers/interface.py:250: def Parse(self, parser_mediator, file_object=None, **kwargs): This method is now very convoluted. Can it be refactored, perhaps separate out the parse-specific-file-object case from the use-file-object-from-mediator case? https://codereview.appspot.com/258650043/diff/20001/plaso/parsers/interface.p... plaso/parsers/interface.py:299: will use the parser mediator to open the file-like object. which file-like object? https://codereview.appspot.com/258650043/diff/20001/plaso/parsers/mediator.py File plaso/parsers/mediator.py (right): https://codereview.appspot.com/258650043/diff/20001/plaso/parsers/mediator.py... plaso/parsers/mediator.py:14: class ParserMediator(file_entry_mediator.FileEntryMediator): As noted in the engine mediator file, this isn't right - the parser mediator "has a" file entry, but it isn't a file entry. And the file entry mediator is just a wrapper, whereas the parser mediator simplifies a bunch of interfaces for the clients (parsers).
Sign in to reply to this message.
onager as discussed wrapping dfvfs file entry into a plaso one could solve our needs as well
Sign in to reply to this message.
https://codereview.appspot.com/258650043/diff/20001/plaso/engine/collector.py File plaso/engine/collector.py (right): https://codereview.appspot.com/258650043/diff/20001/plaso/engine/collector.py... plaso/engine/collector.py:243: file_entry: a file entry (instance of dfvfs.FileEntry). On 2015/09/01 12:47:46, onager wrote: > the file entry that refers to the directory to process. Done. https://codereview.appspot.com/258650043/diff/20001/plaso/engine/mediator.py File plaso/engine/mediator.py (right): https://codereview.appspot.com/258650043/diff/20001/plaso/engine/mediator.py#... plaso/engine/mediator.py:2: """The engine file entry mediator object.""" On 2015/09/01 12:47:47, onager wrote: > "mediator" isn't the right design pattern here. This is just a wrapper for a > file entry, whereas the parser mediator provides access to a range of > functionality provided by a range of different subsystems (queue interfaces, the > KB etc.) Acknowledged. https://codereview.appspot.com/258650043/diff/20001/plaso/engine/mediator.py#... plaso/engine/mediator.py:10: """Class that implements the file entry mediator.""" On 2015/09/01 12:47:47, onager wrote: > Which does what? > > "A file entry mediator provides convenience methods for interacting with a > single file entry at a time." Acknowledged. https://codereview.appspot.com/258650043/diff/20001/plaso/engine/mediator.py#... plaso/engine/mediator.py:18: """Retrieves the relative path. On 2015/09/01 12:47:46, onager wrote: > Retrieves the path from the pathspec relative to... Acknowledged. https://codereview.appspot.com/258650043/diff/20001/plaso/engine/mediator.py#... plaso/engine/mediator.py:24: A string containing the relative path or None. On 2015/09/01 12:47:47, onager wrote: > When will this return None? Acknowledged. https://codereview.appspot.com/258650043/diff/20001/plaso/engine/mediator.py#... plaso/engine/mediator.py:55: """Retrieves the display name for the active file entry. On 2015/09/01 12:47:47, onager wrote: > Generates a human readable string describing a file entry object. > > Nothing is being "retrieved", the string is built here. Also, we don't describe > what a DisplayName is anywhere. Acknowledged. https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py File plaso/engine/worker.py (right): https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py#ne... plaso/engine/worker.py:128: file_object: the file-like object to be hashed. On 2015/09/01 12:47:47, onager wrote: > "whose contents will be hashed" Acknowledged. https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py#ne... plaso/engine/worker.py:151: The resulting digest hashes are set in the parser mediator as attribute On 2015/09/01 12:47:47, onager wrote: > +s Done. https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py#ne... plaso/engine/worker.py:153: to have data streams, e.g. NTFS. Yes. File entry is short for file system entry, which can be a link, directory or regular file. https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py#ne... plaso/engine/worker.py:166: self._current_display_name)) On 2015/09/01 12:47:47, onager wrote: > Recurrent theme - this log line doesn't account for different data streams in/on > the same file. It's going to be important to know when we run into an error if > its in a particular stream on a file. Acknowledged. Probably best to do so in the display name. https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py#ne... plaso/engine/worker.py:192: digest_hash_string, self._current_display_name)) On 2015/09/01 12:47:47, onager wrote: > Need to do something with the stream name here too. Acknowledged. https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py#ne... plaso/engine/worker.py:221: will use the parser mediator to open the file-like object. The mediator only has understanding of a file entry not of a data stream. https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py#ne... plaso/engine/worker.py:440: data_stream_name: optional data stream name. The default is yes https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py#ne... plaso/engine/worker.py:524: data_stream_name = getattr(path_spec, u'data_stream', None) No, file entry is metadata file object is data. Pathspec can point to both. https://codereview.appspot.com/258650043/diff/20001/plaso/parsers/interface.py File plaso/parsers/interface.py (right): https://codereview.appspot.com/258650043/diff/20001/plaso/parsers/interface.p... plaso/parsers/interface.py:247: # The initial file offset set to None if not set. Actually something different is meant here. https://codereview.appspot.com/258650043/diff/20001/plaso/parsers/mediator.py File plaso/parsers/mediator.py (right): https://codereview.appspot.com/258650043/diff/20001/plaso/parsers/mediator.py... plaso/parsers/mediator.py:14: class ParserMediator(file_entry_mediator.FileEntryMediator): I'll move this to a separate CL
Sign in to reply to this message.
Code updated.
Sign in to reply to this message.
https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py File plaso/engine/worker.py (right): https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py#ne... plaso/engine/worker.py:153: to have data streams, e.g. NTFS. On 2015/09/15 15:44:32, Joachim Metz wrote: > Yes. File entry is short for file system entry, which can be a link, directory > or regular file. But why mention this? Why is this important to know here? https://codereview.appspot.com/258650043/diff/20001/plaso/engine/worker.py#ne... plaso/engine/worker.py:221: will use the parser mediator to open the file-like object. On 2015/09/15 15:44:32, Joachim Metz wrote: > The mediator only has understanding of a file entry not of a data stream. So why not change this? Have the parser mediator operate per file object, rather than per file entry. https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py File plaso/engine/worker.py (right): https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py#ne... plaso/engine/worker.py:28: and directories (file entries) for which events need to be extracted. files, directories and data streams https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py#ne... plaso/engine/worker.py:31: for parsing a particular file is available. All extracted event objects -file +pathspec https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py#ne... plaso/engine/worker.py:125: """Determines if a file matches one of the known signatures. -file +file like object https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py#ne... plaso/engine/worker.py:149: def _HashFileEntry(self, file_entry, data_stream_name=u''): This doesn't actually hash a file entry - it hashes a data stream of a file entry. Rename the method? https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py#ne... plaso/engine/worker.py:154: allow directories to have data streams, e.g. NTFS. Why note this? https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py#ne... plaso/engine/worker.py:157: file_entry: the file entry object to be hashed (instance of the file entry whose data stream will be hashed, or "the file entry relating to the data to be hashed" or something https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py#ne... plaso/engine/worker.py:222: will use the parser mediator to open the file-like object. to open the file_entry's default data stream as a file-like object? https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py#ne... plaso/engine/worker.py:470: # We always want to use the filestat parser if set. ...but we only want to invoke it once per file entry, so we only use it if we're not processing a named data stream? https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py#ne... plaso/engine/worker.py:525: data_stream_name = getattr(path_spec, u'data_stream', None) Why not have this default to the empty string, then you can make the data stream arg mandatory for process file entry etc.? https://codereview.appspot.com/258650043/diff/40001/plaso/parsers/mediator.py File plaso/parsers/mediator.py (right): https://codereview.appspot.com/258650043/diff/40001/plaso/parsers/mediator.py... plaso/parsers/mediator.py:149: """Retrieves the display name for the active file entry. -for the active +a If a specific file_entry is passed in, the display name for the active entry isn't retrieved. https://codereview.appspot.com/258650043/diff/40001/plaso/parsers/mediator.py... plaso/parsers/mediator.py:205: """Provides a file-like object for the active file entry. +the default data stream
Sign in to reply to this message.
https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py File plaso/engine/worker.py (right): https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py#ne... plaso/engine/worker.py:28: and directories (file entries) for which events need to be extracted. On 2015/09/22 15:16:27, onager wrote: > files, directories and data streams Done. https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py#ne... plaso/engine/worker.py:31: for parsing a particular file is available. All extracted event objects On 2015/09/22 15:16:27, onager wrote: > -file +pathspec Acknowledged. https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py#ne... plaso/engine/worker.py:125: """Determines if a file matches one of the known signatures. On 2015/09/22 15:16:27, onager wrote: > -file +file like object Done. https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py#ne... plaso/engine/worker.py:149: def _HashFileEntry(self, file_entry, data_stream_name=u''): On 2015/09/22 15:16:27, onager wrote: > This doesn't actually hash a file entry - it hashes a data stream of a file > entry. Rename the method? Done. https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py#ne... plaso/engine/worker.py:154: allow directories to have data streams, e.g. NTFS. On 2015/09/22 15:16:27, onager wrote: > Why note this? The directory ADS part you mean? To make sure we don't ignore directory based file_entries. https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py#ne... plaso/engine/worker.py:157: file_entry: the file entry object to be hashed (instance of On 2015/09/22 15:16:27, onager wrote: > the file entry whose data stream will be hashed, or "the file entry relating to > the data to be hashed" or something Done. https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py#ne... plaso/engine/worker.py:222: will use the parser mediator to open the file-like object. On 2015/09/22 15:16:27, onager wrote: > to open the file_entry's default data stream as a file-like object? Done. https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py#ne... plaso/engine/worker.py:470: # We always want to use the filestat parser if set. On 2015/09/22 15:16:27, onager wrote: > ...but we only want to invoke it once per file entry, so we only use it if we're > not processing a named data stream? Done. https://codereview.appspot.com/258650043/diff/40001/plaso/engine/worker.py#ne... plaso/engine/worker.py:525: data_stream_name = getattr(path_spec, u'data_stream', None) On 2015/09/22 15:16:27, onager wrote: > Why not have this default to the empty string, then you can make the data stream > arg mandatory for process file entry etc.? Good point. https://codereview.appspot.com/258650043/diff/40001/plaso/parsers/mediator.py File plaso/parsers/mediator.py (right): https://codereview.appspot.com/258650043/diff/40001/plaso/parsers/mediator.py... plaso/parsers/mediator.py:149: """Retrieves the display name for the active file entry. On 2015/09/22 15:16:27, onager wrote: > -for the active +a > > If a specific file_entry is passed in, the display name for the active entry > isn't retrieved. Done. https://codereview.appspot.com/258650043/diff/40001/plaso/parsers/mediator.py... plaso/parsers/mediator.py:205: """Provides a file-like object for the active file entry. On 2015/09/22 15:16:27, onager wrote: > +the default data stream Done.
Sign in to reply to this message.
Code updated.
Sign in to reply to this message.
On 2015/09/22 21:40:56, Joachim Metz wrote: > Code updated. I'm still a little concerned about the extra complexity this is adding, but I don't think there's a better way around it at the moment. I suspect the solution is more and better segmentation of plugins into categories (file entry parser, file object parser, multiple file parser, hasher, submitter etc.) LGTM
Sign in to reply to this message.
When time permits I'll write a plan to address the complexity. This likely ties together with other ideas like phased processing to make it less complex.
Sign in to reply to this message.
To track this activity: https://github.com/log2timeline/plaso/issues/338
Sign in to reply to this message.
|