|
|
DescriptionThese files are missing unit tests:
+ plaso/formatters/__init__.py + plaso/formatters/apachelog.py + plaso/parsers/__init__.py
Patch Set 1 #
Total comments: 42
Patch Set 2 : Uploading changes made to code. #
Total comments: 19
Patch Set 3 : Uploading changes made to code. #
Total comments: 24
MessagesTotal messages: 10
Few comments to start your day https://codereview.appspot.com/76230045/diff/1/plaso/formatters/apachelog.py File plaso/formatters/apachelog.py (right): https://codereview.appspot.com/76230045/diff/1/plaso/formatters/apachelog.py#... plaso/formatters/apachelog.py:28: u'Source IP: {ipAddr}', don't use camelCase, use ip_address here instead. or better yet, use the same notation as in pcap parser, which is source_ip and dest_ip https://codereview.appspot.com/76230045/diff/1/plaso/formatters/apachelog.py#... plaso/formatters/apachelog.py:33: u'UserAgent: {clientSfw}'] same here... clientSfw is not a good argument name, why not just use user_agent? https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py File plaso/parsers/apachelog.py (right): https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:28: two empty lines here https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:43: self.ipAddr = structure.ipAddr so same comments as before regarding naming conventions (comment in the formatter) https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:65: Time = pyparsing.Combine(text_parser.PyparsingConstants.INTEGER + ":" + s/Time/TIME/ https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:68: timeZoneOffset = pyparsing.Word("+-", pyparsing.nums) this is also a class constant and needs to be named according to style guide TIME_ZONE_OFFSET https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:96: ] have the end bracket on the previous line https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:98: DATA_TYPE = 'apache:log:line' data type should be defined in the event itself, not the parser https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:111: # """Verify that this file is a apache acces log file.""" why the # and then """ just stick with """ since this is the docstring https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:120: if self._GetTimestamp(line.day, indentation is wrong if self._GetTimestamp( line.day, month..... and this could also be just if self._GetTime....: return True return False https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:127: #return self._ParseLogline(structure) remove this line https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:128: month = timelib.MONTH_DICT.get(structure.month.lower()) I would do a timeli....get(struc..., 0) if not month: return timestamp ..... https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:139: timestamp, structure) this can be on the same line, not long enough to need to split https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:140: return event_object and just do return ApacheLogEvent(timestamp, structure) https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:150: timestamp_string: The pyparsing ParseResults object the arg section does not match the actual arguments https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:153: day: An integer representing the day. the return section does not match what the function returns https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:162: timestamp = timelib.Timestamp.FromTimeString(timeString) why do you use this instead of just timelib.Timestamp.FromTimeParts? https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog_test.py File plaso/parsers/apachelog_test.py (right): https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog_test.p... plaso/parsers/apachelog_test.py:31: two empty lines before class https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog_test.p... plaso/parsers/apachelog_test.py:42: self.maxDiff = None no need for this https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog_test.p... plaso/parsers/apachelog_test.py:46: #test_file = os.path.join('test_data', 'access_log') remove this https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog_test.p... plaso/parsers/apachelog_test.py:109: 'Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0') what's missing here is also the test for the formatter, the message string test.. self._TestGetMessageStrings(....
Sign in to reply to this message.
Few comments to start your day
Sign in to reply to this message.
Code updated.
Sign in to reply to this message.
Sorry for the delay https://codereview.appspot.com/76230045/diff/1/plaso/formatters/apachelog.py File plaso/formatters/apachelog.py (right): https://codereview.appspot.com/76230045/diff/1/plaso/formatters/apachelog.py#... plaso/formatters/apachelog.py:28: u'Source IP: {ipAddr}', On 2014/03/18 19:16:21, kiddi wrote: > don't use camelCase, use ip_address here instead. > > or better yet, use the same notation as in pcap parser, which is source_ip and > dest_ip Done. https://codereview.appspot.com/76230045/diff/1/plaso/formatters/apachelog.py#... plaso/formatters/apachelog.py:33: u'UserAgent: {clientSfw}'] On 2014/03/18 19:16:21, kiddi wrote: > same here... clientSfw is not a good argument name, why not just use user_agent? Done. https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py File plaso/parsers/apachelog.py (right): https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:28: On 2014/03/18 19:16:21, kiddi wrote: > two empty lines here Done. https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:43: self.ipAddr = structure.ipAddr On 2014/03/18 19:16:21, kiddi wrote: > so same comments as before regarding naming conventions (comment in the > formatter) Done. https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:65: Time = pyparsing.Combine(text_parser.PyparsingConstants.INTEGER + ":" + On 2014/03/18 19:16:21, kiddi wrote: > s/Time/TIME/ Done. https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:68: timeZoneOffset = pyparsing.Word("+-", pyparsing.nums) On 2014/03/18 19:16:21, kiddi wrote: > this is also a class constant and needs to be named according to style guide > > TIME_ZONE_OFFSET Done. https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:96: ] On 2014/03/18 19:16:21, kiddi wrote: > have the end bracket on the previous line Done. https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:98: DATA_TYPE = 'apache:log:line' On 2014/03/18 19:16:21, kiddi wrote: > data type should be defined in the event itself, not the parser Done. https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:111: # """Verify that this file is a apache acces log file.""" On 2014/03/18 19:16:21, kiddi wrote: > why the # and then """ > > just stick with """ since this is the docstring Done. https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:120: if self._GetTimestamp(line.day, On 2014/03/18 19:16:21, kiddi wrote: > indentation is wrong > > if self._GetTimestamp( > line.day, month..... > > and this could also be just > > if self._GetTime....: > return True > > return False Done. https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:127: #return self._ParseLogline(structure) On 2014/03/18 19:16:21, kiddi wrote: > remove this line Done. https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:128: month = timelib.MONTH_DICT.get(structure.month.lower()) On 2014/03/18 19:16:21, kiddi wrote: > I would do a > > timeli....get(struc..., 0) > > if not month: > return > > timestamp ..... Done. https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:139: timestamp, structure) On 2014/03/18 19:16:21, kiddi wrote: > this can be on the same line, not long enough to need to split Done. https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:140: return event_object On 2014/03/18 19:16:21, kiddi wrote: > and just do > > return ApacheLogEvent(timestamp, structure) Done. https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:150: timestamp_string: The pyparsing ParseResults object On 2014/03/18 19:16:21, kiddi wrote: > the arg section does not match the actual arguments Done. https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:153: day: An integer representing the day. On 2014/03/18 19:16:21, kiddi wrote: > the return section does not match what the function returns Done. https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog.py#new... plaso/parsers/apachelog.py:162: timestamp = timelib.Timestamp.FromTimeString(timeString) On 2014/03/18 19:16:21, kiddi wrote: > why do you use this instead of just timelib.Timestamp.FromTimeParts? I use this because of the timestamp format (2014-02-12 21:53:04 -0500) and since the timezone was -0500, it didn't work with FromTimeParts https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog_test.py File plaso/parsers/apachelog_test.py (right): https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog_test.p... plaso/parsers/apachelog_test.py:31: On 2014/03/18 19:16:21, kiddi wrote: > two empty lines before class Done. https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog_test.p... plaso/parsers/apachelog_test.py:42: self.maxDiff = None On 2014/03/18 19:16:21, kiddi wrote: > no need for this Done. https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog_test.p... plaso/parsers/apachelog_test.py:46: #test_file = os.path.join('test_data', 'access_log') On 2014/03/18 19:16:21, kiddi wrote: > remove this Done. https://codereview.appspot.com/76230045/diff/1/plaso/parsers/apachelog_test.p... plaso/parsers/apachelog_test.py:109: 'Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0') On 2014/03/18 19:16:21, kiddi wrote: > what's missing here is also the test for the formatter, the message string > test.. > > self._TestGetMessageStrings(.... Done.
Sign in to reply to this message.
Few more comments https://codereview.appspot.com/76230045/diff/20001/plaso/formatters/apachelog.py File plaso/formatters/apachelog.py (right): https://codereview.appspot.com/76230045/diff/20001/plaso/formatters/apachelog... plaso/formatters/apachelog.py:31: u'Status Code: {statusCode}', still using statusCode instead of status_code https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py File plaso/parsers/apachelog.py (right): https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:47: self.statusCode = structure.statusCode this is still in camelCase same with few others... fix those https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:66: TIME = pyparsing.Combine(text_parser.PyparsingConstants.INTEGER + ":" + indentation... I would prefer to have the first line indented as well TIME = py....( text_parser.Pyp..... instead of TIME = py...(text_parser.... text_parse.... https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:69: TIME_ZONE_OFFSET = pyparsing.Word("+-", pyparsing.nums) we use single quote characters (this also applies to other places inthe code, for instance two lines above and few lines below... so fx it in the other places as well) https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:108: self.DATA_TYPE = 'apache:log:line' don't store the data type in the parser, that's not needed, that's in the event object. https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:111: """Verify that this file is a apache acces log file.""" s/acces/access/ s/apache/Apache/ https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:115: logging.debug(u'Not a Apache log file') s/a/an/ https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:121: return True need to have a return False then after this if you get to the timestamp check and that fails, then you return None, that should be False https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:160: timestamp = timelib.Timestamp.FromTimeString(timeString) So this is also for efficiency reasons. Parsing a timestring is a lot slower than knowing all the parts beforehand (like you do in this case, except for perhaps the timezon). A simple check like (in pshell): [1] $ timelib.Timestamp.FromTimeString('2014-02-12 21:53:04 -0500') [1]<<< 1392259984000000 [2] $ timelib.Timestamp.FromTimeParts(2014, 02, 12, 21, 53, 04,0, 'Etc/GMT-5') [2]<<< 1392223984000000 That is you can see we get the same answer on both places (noticed the use of "Etc/GMT-5" choice of timestamps). Simple timing check reveals: [3] $ import timeit [4] $ timeit.timeit(setup='from plaso.lib import timelib', stmt="timelib.Timestamp.FromTimeParts(2014, 02, 12, 21, 53, 04,0, 'Etc/GMT-5')") [4]<<< 12.02266001701355 [5] $ timeit.timeit(setup='from plaso.lib import timelib', stmt="timelib.Timestamp.FromTimeString('2014-02-12 21:53:04 -0500')") [5]<<< 138.15482711791992 Did you notice the difference? It is quite vast... so we need to limit the use of timestrings to where we absolutely have to. But in this case you already have everything split up... the only "trouble" is the timezone. One way to "solve" that would be to transform that to a timezone understandable by pytz,... perhaps something like: try: timzone_int = int(timezone, 10) except ValueError # either return 0 or use UTC as the default fallback ... # check the timestamp, can you divide it by 100? if timezone_int % 100 == 0: timezone_string = u'Etc/GMT{:d}'.format(timeonze_int//100) else: # do something else here, is it by 30 min??? flag it/raise an error? something like this..... then you have a timezone that you can pass on to GetTimeParts And then there is the second thing.... I'm assuming the timezone is not about to change for every file passed in.. perhaps import pytz, aka import pytz and then set self._timezone = '' in the beginning of Parse and then for the first time you do this calculation to save the timezone there, as in self._timezone = pytz.timezone(timezone_string) and then from no on you would just pass in self._timezone directly to the function, thus saving that calculation... If we do that: In [1]: s = """ ...: from plaso.lib import timelib ...: from __main__ import zone ...: """ In [2]: import pytz ... In [8]: timeit.timeit(setup=s, stmt="timelib.Timestamp.FromTimeParts(2014, 02, 12, 21, 53, 04,0, zone)") Out[8]: 8.266901969909668 See now we've gotten this down from 12s to 8s by just pre-calculating that zone and storing it.
Sign in to reply to this message.
Code updated.
Sign in to reply to this message.
Sorry for the delay :) https://codereview.appspot.com/76230045/diff/20001/plaso/formatters/apachelog.py File plaso/formatters/apachelog.py (right): https://codereview.appspot.com/76230045/diff/20001/plaso/formatters/apachelog... plaso/formatters/apachelog.py:31: u'Status Code: {statusCode}', On 2014/04/21 20:27:08, kiddi wrote: > still using statusCode instead of status_code Done. https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py File plaso/parsers/apachelog.py (right): https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:47: self.statusCode = structure.statusCode On 2014/04/21 20:27:08, kiddi wrote: > this is still in camelCase same with few others... fix those Done. https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:66: TIME = pyparsing.Combine(text_parser.PyparsingConstants.INTEGER + ":" + On 2014/04/21 20:27:08, kiddi wrote: > indentation... I would prefer to have the first line indented as well > > TIME = py....( > text_parser.Pyp..... > > instead of > > TIME = py...(text_parser.... > text_parse.... Done. https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:69: TIME_ZONE_OFFSET = pyparsing.Word("+-", pyparsing.nums) On 2014/04/21 20:27:08, kiddi wrote: > we use single quote characters (this also applies to other places inthe code, > for instance two lines above and few lines below... so fx it in the other places > as well) Done. https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:108: self.DATA_TYPE = 'apache:log:line' On 2014/04/21 20:27:08, kiddi wrote: > don't store the data type in the parser, that's not needed, that's in the event > object. Done. https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:111: """Verify that this file is a apache acces log file.""" On 2014/04/21 20:27:08, kiddi wrote: > s/acces/access/ > > s/apache/Apache/ Done. https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:115: logging.debug(u'Not a Apache log file') On 2014/04/21 20:27:08, kiddi wrote: > s/a/an/ Done. https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:121: return True On 2014/04/21 20:27:08, kiddi wrote: > need to have a return False then after this > > if you get to the timestamp check and that fails, then you return None, that > should be False Done. https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:160: timestamp = timelib.Timestamp.FromTimeString(timeString) On 2014/04/21 20:27:08, kiddi wrote: > So this is also for efficiency reasons. Parsing a timestring is a lot slower > than knowing all the parts beforehand (like you do in this case, except for > perhaps the timezon). > > A simple check like (in pshell): > [1] $ timelib.Timestamp.FromTimeString('2014-02-12 21:53:04 -0500') > [1]<<< 1392259984000000 > > [2] $ timelib.Timestamp.FromTimeParts(2014, 02, 12, 21, 53, 04,0, 'Etc/GMT-5') > [2]<<< 1392223984000000 > > That is you can see we get the same answer on both places (noticed the use of > "Etc/GMT-5" choice of timestamps). > > Simple timing check reveals: > [3] $ import timeit > > > [4] $ timeit.timeit(setup='from plaso.lib import timelib', > stmt="timelib.Timestamp.FromTimeParts(2014, 02, 12, 21, 53, 04,0, 'Etc/GMT-5')") > [4]<<< 12.02266001701355 > > [5] $ timeit.timeit(setup='from plaso.lib import timelib', > stmt="timelib.Timestamp.FromTimeString('2014-02-12 21:53:04 -0500')") > [5]<<< 138.15482711791992 > > Did you notice the difference? It is quite vast... so we need to limit the use > of timestrings to where we absolutely have to. But in this case you already have > everything split up... the only "trouble" is the timezone. > > One way to "solve" that would be to transform that to a timezone understandable > by pytz,... perhaps something like: > > try: > timzone_int = int(timezone, 10) > except ValueError > # either return 0 or use UTC as the default fallback > > ... > > # check the timestamp, can you divide it by 100? > if timezone_int % 100 == 0: > timezone_string = u'Etc/GMT{:d}'.format(timeonze_int//100) > else: > # do something else here, is it by 30 min??? flag it/raise an error? > > > something like this..... then you have a timezone that you can pass on to > GetTimeParts > > And then there is the second thing.... I'm assuming the timezone is not about to > change for every file passed in.. perhaps import pytz, aka > > import pytz > > and then set > > self._timezone = '' > > in the beginning of Parse > > and then for the first time you do this calculation to save the timezone there, > as in > > self._timezone = pytz.timezone(timezone_string) > > and then from no on you would just pass in self._timezone directly to the > function, thus saving that calculation... > > If we do that: > In [1]: s = """ > ...: from plaso.lib import timelib > ...: from __main__ import zone > ...: """ > > In [2]: import pytz > > ... > In [8]: timeit.timeit(setup=s, stmt="timelib.Timestamp.FromTimeParts(2014, 02, > 12, 21, 53, 04,0, zone)") > Out[8]: 8.266901969909668 > > See now we've gotten this down from 12s to 8s by just pre-calculating that zone > and storing it. Done. Since pytz have a bug with the Etc/GMT, I need to convert the Etc/GMT-5 to Etc/GMT+5 with timezone_int = timezone_int * -1. Here is the bug : https://bugs.launchpad.net/pytz/+bug/91671
Sign in to reply to this message.
OK, few more comments https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py File plaso/parsers/apachelog.py (right): https://codereview.appspot.com/76230045/diff/20001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:160: timestamp = timelib.Timestamp.FromTimeString(timeString) ok, great... reference that in a comment in the code, not just here... On 2014/06/21 14:31:46, segumarc wrote: > On 2014/04/21 20:27:08, kiddi wrote: > > So this is also for efficiency reasons. Parsing a timestring is a lot slower > > than knowing all the parts beforehand (like you do in this case, except for > > perhaps the timezon). > > > > A simple check like (in pshell): > > [1] $ timelib.Timestamp.FromTimeString('2014-02-12 21:53:04 -0500') > > [1]<<< 1392259984000000 > > > > [2] $ timelib.Timestamp.FromTimeParts(2014, 02, 12, 21, 53, 04,0, 'Etc/GMT-5') > > [2]<<< 1392223984000000 > > > > That is you can see we get the same answer on both places (noticed the use of > > "Etc/GMT-5" choice of timestamps). > > > > Simple timing check reveals: > > [3] $ import timeit > > > > > > [4] $ timeit.timeit(setup='from plaso.lib import timelib', > > stmt="timelib.Timestamp.FromTimeParts(2014, 02, 12, 21, 53, 04,0, > 'Etc/GMT-5')") > > [4]<<< 12.02266001701355 > > > > [5] $ timeit.timeit(setup='from plaso.lib import timelib', > > stmt="timelib.Timestamp.FromTimeString('2014-02-12 21:53:04 -0500')") > > [5]<<< 138.15482711791992 > > > > Did you notice the difference? It is quite vast... so we need to limit the use > > of timestrings to where we absolutely have to. But in this case you already > have > > everything split up... the only "trouble" is the timezone. > > > > One way to "solve" that would be to transform that to a timezone > understandable > > by pytz,... perhaps something like: > > > > try: > > timzone_int = int(timezone, 10) > > except ValueError > > # either return 0 or use UTC as the default fallback > > > > ... > > > > # check the timestamp, can you divide it by 100? > > if timezone_int % 100 == 0: > > timezone_string = u'Etc/GMT{:d}'.format(timeonze_int//100) > > else: > > # do something else here, is it by 30 min??? flag it/raise an error? > > > > > > something like this..... then you have a timezone that you can pass on to > > GetTimeParts > > > > And then there is the second thing.... I'm assuming the timezone is not about > to > > change for every file passed in.. perhaps import pytz, aka > > > > import pytz > > > > and then set > > > > self._timezone = '' > > > > in the beginning of Parse > > > > and then for the first time you do this calculation to save the timezone > there, > > as in > > > > self._timezone = pytz.timezone(timezone_string) > > > > and then from no on you would just pass in self._timezone directly to the > > function, thus saving that calculation... > > > > If we do that: > > In [1]: s = """ > > ...: from plaso.lib import timelib > > ...: from __main__ import zone > > ...: """ > > > > In [2]: import pytz > > > > ... > > In [8]: timeit.timeit(setup=s, stmt="timelib.Timestamp.FromTimeParts(2014, 02, > > 12, 21, 53, 04,0, zone)") > > Out[8]: 8.266901969909668 > > > > See now we've gotten this down from 12s to 8s by just pre-calculating that > zone > > and storing it. > > Done. > Since pytz have a bug with the Etc/GMT, I need to convert the Etc/GMT-5 to > Etc/GMT+5 with timezone_int = timezone_int * -1. > Here is the bug : https://bugs.launchpad.net/pytz/+bug/91671 https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog.py File plaso/parsers/apachelog.py (right): https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:113: """Verify that this file is a apache acces log file.""" acces is still with a single s s/acces/access/ https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:117: logging.debug(u'Not a Apache log file') still a, s/a/an/ Not an Apache log file https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:122: if self._GetTimezone(int(line.timezone_int)) == 'UTC': use base and have a test try: timezone_int = int(line.timezone_int, 10) except ValueError: return False if self._GetTimezone(timezone_int) == 'UTC' .... and why can't it be UTC???? What is "wrong" with that? What if my webserver uses UTC? https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:124: if self._GetTimestamp(line.day, month, line.year, indentation Timestamp( line.day, m......) == 0: https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:135: self._GetTimezone(int(structure.timezone_int)) again, let's try to surround this with a try/except and use a base 10 on it. https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:138: month, this can be reduced to fewer lines Timestamp( structure.day, month, structure.year, structure.time) https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:146: """Get a pytz timezone first line ends with a . Then Args: timeszone_int: A valid timezone in the format +/-HHMM Return: A .... "" That is, few spelling mistakes, missing a dot. indentation on the args section and missing a returns section (cause the Get naming scheme suggests you are about to return something, otherwise it could be SetTimeZone) AND then the timezone_int suggests that this is an integer, so can this be a positive and a negative integer? Cause the format "+/-HHMM" almost seems to suggest this could be a string? https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:150: timezone_int = timezone_int * -1 why this? https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:152: timezone_etc = u'Etc/GMT+{:d}' {0:d} https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:155: if timezone_int % 100 == 0: why? https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:165: self._timezone = pytz.timezone(timezone_string) surround this with a try/except and shouldn't this be a return timezone ?? Otherwise the function name should be def SetTimeZone instead of Get.... https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog.py... plaso/parsers/apachelog.py:187: year, month, day, int(hour), int(minute), int(second), 0, self._timezone) add base to the int conversion https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog_te... File plaso/parsers/apachelog_test.py (right): https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog_te... plaso/parsers/apachelog_test.py:50: "2012-12-18 21:53:03-05:00") indentation and double quotes Timestamp( '2012-.... https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog_te... plaso/parsers/apachelog_test.py:55: self.assertEqual(event_object.status_code, u'403') should this be an integer? https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog_te... plaso/parsers/apachelog_test.py:56: self.assertEqual(event_object.num_bytes_sent, u'5043') should this be an integer? https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog_te... plaso/parsers/apachelog_test.py:58: self.assertEqual(event_object.user_agent, u'Mozilla/5.0 (X11; Ubuntu; ' indentation self.assertEqual(event_object.user_agent, ( u'Mozilla......' u'.....')) https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog_te... plaso/parsers/apachelog_test.py:72: "2012-12-18 21:53:07-05:00") sme here with indentation and double quotes https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog_te... plaso/parsers/apachelog_test.py:79: self.assertEqual(event_object.user_agent, u'Mozilla/5.0 (X11; Ubuntu; ' same here with the indentation https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog_te... plaso/parsers/apachelog_test.py:93: "2012-12-18 21:53:07-05:00") again double quotes and indentation https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog_te... plaso/parsers/apachelog_test.py:101: self.assertEqual(event_object.user_agent, u'Mozilla/5.0 (X11; Ubuntu; ' indentation https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog_te... plaso/parsers/apachelog_test.py:114: expected_timestamp = timelib_test.CopyStringToTimestamp( indent + double quote https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog_te... plaso/parsers/apachelog_test.py:123: self.assertEqual(event_object.user_agent, u'Mozilla/5.0 (X11; Ubuntu; ' indent https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog_te... plaso/parsers/apachelog_test.py:136: expected_timestamp = timelib_test.CopyStringToTimestamp( indent + double quote https://codereview.appspot.com/76230045/diff/40001/plaso/parsers/apachelog_te... plaso/parsers/apachelog_test.py:145: self.assertEqual(event_object.user_agent, u'Mozilla/5.0 (X11; Ubuntu; ' indent
Sign in to reply to this message.
|