|
|
Created:
11 years, 11 months ago by david.nides Modified:
11 years, 10 months ago Reviewers:
kiddi CC:
log2timeline-dev_googlegroups.com Visibility:
Public. |
DescriptionGoogle Drive Parser (my first parser!)
Patch Set 1 #
Total comments: 42
Patch Set 2 : . #
Total comments: 1
Patch Set 3 : . #Patch Set 4 : . #Patch Set 5 : . #
Total comments: 4
Patch Set 6 : . #Patch Set 7 : . #Patch Set 8 : . #Patch Set 9 : . #
MessagesTotal messages: 17
few comments https://codereview.appspot.com/6963043/diff/1/parsers/__init__.py File parsers/__init__.py (right): https://codereview.appspot.com/6963043/diff/1/parsers/__init__.py#newcode26 parsers/__init__.py:26: from plaso.parsers import winreg don't copy paste this line, no need to have it twice ;) https://codereview.appspot.com/6963043/diff/1/parsers/__init__.py#newcode27 parsers/__init__.py:27: from plaso.parsers import gdrive this should not be here (think alphabetical) https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py File parsers/gdrive.py (right): https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode22 parsers/gdrive.py:22: class GDrive(parser.SQLiteParser): perhaps name it more descriptive, as in GoogleDrive ? https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode28 parsers/gdrive.py:28: QUERIES = [(('SELECT filename, modified, ' I see you do not pull the resource_type any reason for that? there is a valuable piece of information stored there (as in is this a folder, spreadsheet, etc...) as well as the actual document identification (despite being in the url too). https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode45 parsers/gdrive.py:45: '0': 'LINK', LINK, isn't this a folder? https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode46 parsers/gdrive.py:46: '1': 'TYPED', shouldn't this be a "FILE"? That is how the type is stored in the resource_id, and refers to file being uploaded? of various types... https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode47 parsers/gdrive.py:47: '2': 'AUTO_BOOKMARK', auto bookmark? I think you need to revisit the entire list... are you copying the list from Chrome...??? this looks awfully a lot like the Chrome page transition table.... https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode64 parsers/gdrive.py:64: text_long = u'File name: {0} size: {1}'.format( \ why do you use this \? https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode65 parsers/gdrive.py:65: row['filename'],\ weird indentation https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode68 parsers/gdrive.py:68: date = int(row['modified']) * self.DATE_MULTIPLIER what about entries that do not have a modified field? (happens with folders) https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode70 parsers/gdrive.py:70: evt = event.SQLiteEvent(date, descripton, text_long, text_short, \ is there a reason you don't try to extract the full path to the file? (it is stored there, inside the local_relations table, so you can get full paths to files). https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode73 parsers/gdrive.py:73: evt.download_size = str(row['size']) any particular reason you want to store this attribute specifically? Do you see the need to filter based on download size later on? https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode82 parsers/gdrive.py:82: doc_types = self.DOC_TYPES.get(doc_type, '') shouldn't the default "if not found" behavior be different than ''? something like "n/a" or "UNKNOWN" or something like that? https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode83 parsers/gdrive.py:83: text_long = u'File name: {0} size: {1} URL: {2} shared: {3} \ why the \ ? and weird indentation.. https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode90 parsers/gdrive.py:90: date = int(row['created']) * self.DATE_MULTIPLIER what about when the created field is empty? (hint common) https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode92 parsers/gdrive.py:92: evt = event.SQLiteEvent(date, descripton, text_long, text_short, \ I see that you don't create the full path here either (hint the cloud_relations table) and again the \ https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode95 parsers/gdrive.py:95: evt.download_size = str(row['size']) again, any particular reason you want to specifically store this attribute? https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode104 parsers/gdrive.py:104: evt.downloaded_file = u'%s' % row['filename'] \?
Sign in to reply to this message.
See comments https://codereview.appspot.com/6963043/diff/1/parsers/__init__.py File parsers/__init__.py (right): https://codereview.appspot.com/6963043/diff/1/parsers/__init__.py#newcode26 parsers/__init__.py:26: from plaso.parsers import winreg On 2012/12/20 17:01:42, kiddi wrote: > don't copy paste this line, no need to have it twice ;) Done. https://codereview.appspot.com/6963043/diff/1/parsers/__init__.py#newcode27 parsers/__init__.py:27: from plaso.parsers import gdrive On 2012/12/20 17:01:42, kiddi wrote: > this should not be here (think alphabetical) Done. https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py File parsers/gdrive.py (right): https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode22 parsers/gdrive.py:22: class GDrive(parser.SQLiteParser): On 2012/12/20 17:01:42, kiddi wrote: > perhaps name it more descriptive, as in GoogleDrive ? Done. https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode28 parsers/gdrive.py:28: QUERIES = [(('SELECT filename, modified, ' On 2012/12/20 17:01:42, kiddi wrote: > I see you do not pull the resource_type any reason for that? > > there is a valuable piece of information stored there (as in is this a folder, > spreadsheet, etc...) as well as the actual document identification (despite > being in the url too). Done. https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode45 parsers/gdrive.py:45: '0': 'LINK', On 2012/12/20 17:01:42, kiddi wrote: > LINK, isn't this a folder? Done. https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode46 parsers/gdrive.py:46: '1': 'TYPED', On 2012/12/20 17:01:42, kiddi wrote: > shouldn't this be a "FILE"? That is how the type is stored in the resource_id, > and refers to file being uploaded? of various types... Done. https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode46 parsers/gdrive.py:46: '1': 'TYPED', On 2012/12/20 17:01:42, kiddi wrote: > shouldn't this be a "FILE"? That is how the type is stored in the resource_id, > and refers to file being uploaded? of various types... Done. https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode47 parsers/gdrive.py:47: '2': 'AUTO_BOOKMARK', On 2012/12/20 17:01:42, kiddi wrote: > auto bookmark? I think you need to revisit the entire list... are you copying > the list from Chrome...??? this looks awfully a lot like the Chrome page > transition table.... I need to stop coding and drinking at the same time. Fixed. https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode64 parsers/gdrive.py:64: text_long = u'File name: {0} size: {1}'.format( \ On 2012/12/20 17:01:42, kiddi wrote: > why do you use this \? Done. https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode65 parsers/gdrive.py:65: row['filename'],\ On 2012/12/20 17:01:42, kiddi wrote: > weird indentation Done. https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode68 parsers/gdrive.py:68: date = int(row['modified']) * self.DATE_MULTIPLIER On 2012/12/20 17:01:42, kiddi wrote: > what about entries that do not have a modified field? (happens with folders) Well as-is my SQL query does not return rows that have a blank modified time. However in my testing, folders (with the exception of root) all have a modified entry. Do you have any instances where it would be blank? https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode70 parsers/gdrive.py:70: evt = event.SQLiteEvent(date, descripton, text_long, text_short, \ On 2012/12/20 17:01:42, kiddi wrote: > is there a reason you don't try to extract the full path to the file? > > (it is stored there, inside the local_relations table, so you can get full paths > to files). I am looking at the local_relations table and only see 2 fields: child_inode_number and a parent inode_number. Are you referring to the inode number and doing a lookup to see what the file path is or just including the inode number? I guess that could, can we discuss offline about it if you see value in adding it, I could use help with this. https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode73 parsers/gdrive.py:73: evt.download_size = str(row['size']) On 2012/12/20 17:01:42, kiddi wrote: > any particular reason you want to store this attribute specifically? Do you see > the need to filter based on download size later on? Nothing specifically, I just think it's a "nice to have" attribute to include since its there. Someone might have a use case for filtering on it. https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode82 parsers/gdrive.py:82: doc_types = self.DOC_TYPES.get(doc_type, '') On 2012/12/20 17:01:42, kiddi wrote: > shouldn't the default "if not found" behavior be different than ''? something > like "n/a" or "UNKNOWN" or something like that? Done. https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode83 parsers/gdrive.py:83: text_long = u'File name: {0} size: {1} URL: {2} shared: {3} \ On 2012/12/20 17:01:42, kiddi wrote: > why the \ ? and weird indentation.. Done. https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode90 parsers/gdrive.py:90: date = int(row['created']) * self.DATE_MULTIPLIER On 2012/12/20 17:01:42, kiddi wrote: > what about when the created field is empty? (hint common) I can change the SQL query so it does not exclude blank modified fields. However, what do you want the time to be set to then? https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode92 parsers/gdrive.py:92: evt = event.SQLiteEvent(date, descripton, text_long, text_short, \ On 2012/12/20 17:01:42, kiddi wrote: > I see that you don't create the full path here either (hint the cloud_relations > table) > > and again the \ See above. https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode95 parsers/gdrive.py:95: evt.download_size = str(row['size']) On 2012/12/20 17:01:42, kiddi wrote: > again, any particular reason you want to specifically store this attribute? See above https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode104 parsers/gdrive.py:104: evt.downloaded_file = u'%s' % row['filename'] On 2012/12/20 17:01:42, kiddi wrote: > \? Done.
Sign in to reply to this message.
.
Sign in to reply to this message.
there seems to be an issue with the upload .... the gdrive seems to have "vanished" yet is listed twice. https://codereview.appspot.com/6963043/diff/6001/parsers/__init__.py File parsers/__init__.py (right): https://codereview.appspot.com/6963043/diff/6001/parsers/__init__.py#newcode27 parsers/__init__.py:27: from plaso.parsers import winreg winreg is still twice?
Sign in to reply to this message.
.
Sign in to reply to this message.
.
Sign in to reply to this message.
.
Sign in to reply to this message.
few more comments.... sending code samples in mail https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py File parsers/gdrive.py (right): https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode47 parsers/gdrive.py:47: '2': 'AUTO_BOOKMARK', that might not be such a bad idea..... On 2012/12/20 21:46:10, david.nides wrote: > On 2012/12/20 17:01:42, kiddi wrote: > > auto bookmark? I think you need to revisit the entire list... are you copying > > the list from Chrome...??? this looks awfully a lot like the Chrome page > > transition table.... > > I need to stop coding and drinking at the same time. Fixed. > https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode68 parsers/gdrive.py:68: date = int(row['modified']) * self.DATE_MULTIPLIER hi sorry, you had already included the "WHERE modified IS NOT NULL" so this is fine... not it should be the root only.. On 2012/12/20 21:46:10, david.nides wrote: > On 2012/12/20 17:01:42, kiddi wrote: > > what about entries that do not have a modified field? (happens with folders) > > Well as-is my SQL query does not return rows that have a blank modified time. > However in my testing, folders (with the exception of root) all have a modified > entry. Do you have any instances where it would be blank? https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode70 parsers/gdrive.py:70: evt = event.SQLiteEvent(date, descripton, text_long, text_short, \ ack... sending you a code example. On 2012/12/20 21:46:10, david.nides wrote: > On 2012/12/20 17:01:42, kiddi wrote: > > is there a reason you don't try to extract the full path to the file? > > > > (it is stored there, inside the local_relations table, so you can get full > paths > > to files). > > I am looking at the local_relations table and only see 2 fields: > child_inode_number and a parent inode_number. Are you referring to the inode > number and doing a lookup to see what the file path is or just including the > inode number? I guess that could, can we discuss offline about it if you see > value in adding it, I could use help with this. https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode73 parsers/gdrive.py:73: evt.download_size = str(row['size']) perhaps change it something more generic then? as in size? On 2012/12/20 21:46:10, david.nides wrote: > On 2012/12/20 17:01:42, kiddi wrote: > > any particular reason you want to store this attribute specifically? Do you > see > > the need to filter based on download size later on? > > Nothing specifically, I just think it's a "nice to have" attribute to include > since its there. Someone might have a use case for filtering on it. https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode90 parsers/gdrive.py:90: date = int(row['created']) * self.DATE_MULTIPLIER so....you already excluded the blank modified times... however the CREATED time can still be blank, and you don't want to also include an exclusion for blank created times, because then you will miss some modified timestamps (as in there are entries with blank created time, yet have a modified timestamp). So you need to make sure there is a test here catching blank created tiemstamps, and if so SKIP yielding that value, yet still yield the modified value. On 2012/12/20 21:46:10, david.nides wrote: > On 2012/12/20 17:01:42, kiddi wrote: > > what about when the created field is empty? (hint common) > > I can change the SQL query so it does not exclude blank modified fields. > However, what do you want the time to be set to then? https://codereview.appspot.com/6963043/diff/13004/parsers/gdrive.py File parsers/gdrive.py (right): https://codereview.appspot.com/6963043/diff/13004/parsers/gdrive.py#newcode61 parsers/gdrive.py:61: row['filename'], wrong indentation, from the style guide (http://google-styleguide.googlecode.com/svn/trunk/pyguide.html#Indentation): "; or using a hanging indent of 4 spaces, in which " https://codereview.appspot.com/6963043/diff/13004/parsers/gdrive.py#newcode79 parsers/gdrive.py:79: text_long = u'File name: {0} size: {1} URL: {2} shared: {3} \ weird indentation text_long = (u'sdfsadfasdf' 'asdfasdfasdfs')
Sign in to reply to this message.
Thank you. Odd, when I click on the urls to reply to the comments I get the error again "Bad content. Try to upload again." Does not allow me to respond. On Fri, Dec 21, 2012 at 12:12 PM, <kiddi@kiddaland.net> wrote: > few more comments.... sending code samples in mail > > > > https://codereview.appspot.**com/6963043/diff/1/parsers/**gdrive.py<https://c... > File parsers/gdrive.py (right): > > https://codereview.appspot.**com/6963043/diff/1/parsers/** > gdrive.py#newcode47<https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode47> > parsers/gdrive.py:47: '2': 'AUTO_BOOKMARK', > that might not be such a bad idea..... > > On 2012/12/20 21:46:10, david.nides wrote: > >> On 2012/12/20 17:01:42, kiddi wrote: >> > auto bookmark? I think you need to revisit the entire list... are >> > you copying > >> > the list from Chrome...??? this looks awfully a lot like the Chrome >> > page > >> > transition table.... >> > > I need to stop coding and drinking at the same time. Fixed. >> > > > https://codereview.appspot.**com/6963043/diff/1/parsers/** > gdrive.py#newcode68<https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode68> > parsers/gdrive.py:68: date = int(row['modified']) * self.DATE_MULTIPLIER > hi > > sorry, you had already included the "WHERE modified IS NOT NULL" so this > is fine... not it should be the root only.. > > > On 2012/12/20 21:46:10, david.nides wrote: > >> On 2012/12/20 17:01:42, kiddi wrote: >> > what about entries that do not have a modified field? (happens with >> > folders) > > Well as-is my SQL query does not return rows that have a blank >> > modified time. > >> However in my testing, folders (with the exception of root) all have a >> > modified > >> entry. Do you have any instances where it would be blank? >> > > https://codereview.appspot.**com/6963043/diff/1/parsers/** > gdrive.py#newcode70<https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode70> > parsers/gdrive.py:70: evt = event.SQLiteEvent(date, descripton, > text_long, text_short, \ > ack... sending you a code example. > > > On 2012/12/20 21:46:10, david.nides wrote: > >> On 2012/12/20 17:01:42, kiddi wrote: >> > is there a reason you don't try to extract the full path to the >> > file? > >> > >> > (it is stored there, inside the local_relations table, so you can >> > get full > >> paths >> > to files). >> > > I am looking at the local_relations table and only see 2 fields: >> child_inode_number and a parent inode_number. Are you referring to the >> > inode > >> number and doing a lookup to see what the file path is or just >> > including the > >> inode number? I guess that could, can we discuss offline about it if >> > you see > >> value in adding it, I could use help with this. >> > > https://codereview.appspot.**com/6963043/diff/1/parsers/** > gdrive.py#newcode73<https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode73> > parsers/gdrive.py:73: evt.download_size = str(row['size']) > perhaps change it something more generic then? as in size? > > > On 2012/12/20 21:46:10, david.nides wrote: > >> On 2012/12/20 17:01:42, kiddi wrote: >> > any particular reason you want to store this attribute specifically? >> > Do you > >> see >> > the need to filter based on download size later on? >> > > Nothing specifically, I just think it's a "nice to have" attribute to >> > include > >> since its there. Someone might have a use case for filtering on it. >> > > https://codereview.appspot.**com/6963043/diff/1/parsers/** > gdrive.py#newcode90<https://codereview.appspot.com/6963043/diff/1/parsers/gdrive.py#newcode90> > parsers/gdrive.py:90: date = int(row['created']) * self.DATE_MULTIPLIER > so....you already excluded the blank modified times... however the > CREATED time can still be blank, and you don't want to also include an > exclusion for blank created times, because then you will miss some > modified timestamps (as in there are entries with blank created time, > yet have a modified timestamp). > > So you need to make sure there is a test here catching blank created > tiemstamps, and if so SKIP yielding that value, yet still yield the > modified value. > > On 2012/12/20 21:46:10, david.nides wrote: > >> On 2012/12/20 17:01:42, kiddi wrote: >> > what about when the created field is empty? (hint common) >> > > I can change the SQL query so it does not exclude blank modified >> > fields. > >> However, what do you want the time to be set to then? >> > > https://codereview.appspot.**com/6963043/diff/13004/**parsers/gdrive.py<https... > File parsers/gdrive.py (right): > > https://codereview.appspot.**com/6963043/diff/13004/** > parsers/gdrive.py#newcode61<https://codereview.appspot.com/6963043/diff/13004/parsers/gdrive.py#newcode61> > parsers/gdrive.py:61: row['filename'], > wrong indentation, from the style guide > (http://google-styleguide.**googlecode.com/svn/trunk/** > pyguide.html#Indentation<http://google-styleguide.googlecode.com/svn/trunk/pyguide.html#Indentation> > ): > > "; or using a hanging indent of 4 spaces, in which " > > https://codereview.appspot.**com/6963043/diff/13004/** > parsers/gdrive.py#newcode79<https://codereview.appspot.com/6963043/diff/13004/parsers/gdrive.py#newcode79> > parsers/gdrive.py:79: text_long = u'File name: {0} size: {1} URL: {2} > shared: {3} \ > weird indentation > > text_long = (u'sdfsadfasdf' > 'asdfasdfasdfs') > > https://codereview.appspot.**com/6963043/<https://codereview.appspot.com/6963... >
Sign in to reply to this message.
.
Sign in to reply to this message.
You need to upload again. This weird "error in upload" is happening again, preventing me from seeing the latest updates.
Sign in to reply to this message.
.
Sign in to reply to this message.
https://codereview.appspot.com/6963043/diff/13004/parsers/gdrive.py File parsers/gdrive.py (right): https://codereview.appspot.com/6963043/diff/13004/parsers/gdrive.py#newcode61 parsers/gdrive.py:61: row['filename'], On 2012/12/21 18:12:05, kiddi wrote: > wrong indentation, from the style guide > (http://google-styleguide.googlecode.com/svn/trunk/pyguide.html#Indentation%29: > "; or using a hanging indent of 4 spaces, in which " Done. https://codereview.appspot.com/6963043/diff/13004/parsers/gdrive.py#newcode79 parsers/gdrive.py:79: text_long = u'File name: {0} size: {1} URL: {2} shared: {3} \ On 2012/12/21 18:12:05, kiddi wrote: > weird indentation > > text_long = (u'sdfsadfasdf' > 'asdfasdfasdfs') Done.
Sign in to reply to this message.
.
Sign in to reply to this message.
.
Sign in to reply to this message.
|