|
|
Created:
10 years, 4 months ago by Joaquin Moreno Modified:
9 years, 5 months ago Reviewers:
kiddi CC:
log2timeline-dev_googlegroups.com Visibility:
Public. |
DescriptionThese files are missing unit tests:
+ plaso/formatters/mac_bsdplaintext.py + plaso/parsers/__init__.py
Patch Set 1 #Patch Set 2 : Uploading changes made to code. #Patch Set 3 : Uploading changes made to code. #
Total comments: 5
Patch Set 4 : Uploading changes made to code. #Patch Set 5 : Uploading changes made to code. #Patch Set 6 : Uploading changes made to code. #Patch Set 7 : Uploading changes made to code. #
Total comments: 18
MessagesTotal messages: 18
Sign in to reply to this message.
Code updated.
Sign in to reply to this message.
Well, this is a proof of concept because it is not able to read all the BSD parser logs, have some issues and it is not fast. I have done a small script with 1/3 than the Plaso parser and it is able to parse everything. I don't know how to do it in a proper way in Plaso. I have checked event, Parse and text_parser, but I couldn't understand how it suppose to deal with a logs with more than one line per entry. This is the script that can parse every ASL BSD file: #!/usr/bin/python # -*- coding: utf-8 -*- import pyparsing import sys # PLASO CODE def PyParseIntCast(dummy_string, dummy_location, tokens): for index, token in enumerate(tokens): try: tokens[index] = int(token) except ValueError: logging.error(u'Unable to cast [{}] to an int, returning -1'.format( token)) tokens[index] = 0 MONTH = pyparsing.Word( pyparsing.string.uppercase, pyparsing.string.lowercase, exact=3) INTEGER = pyparsing.Word(pyparsing.nums).setParseAction(PyParseIntCast) TWO_DIGITS = pyparsing.Word(pyparsing.nums, exact=2).setParseAction( PyParseIntCast) ONE_OR_TWO_DIGITS = pyparsing.Word( pyparsing.nums, min=1, max=2).setParseAction(PyParseIntCast) TIME = pyparsing.Group( TWO_DIGITS + pyparsing.Suppress(':') + TWO_DIGITS + pyparsing.Suppress(':') + TWO_DIGITS) PID= pyparsing.Word(pyparsing.nums, min=1, max=5).setParseAction(PyParseIntCast) BSD_PLAINTEXT = ( MONTH.setResultsName('month') + ONE_OR_TWO_DIGITS.setResultsName('day') + TIME.setResultsName('time') + pyparsing.Word(pyparsing.printables).setResultsName('computer_name') + pyparsing.CharsNotIn(u'[').setResultsName('agent') + pyparsing.Literal(u'[').suppress() + PID.setResultsName('pid') + pyparsing.Literal(u']').suppress() + pyparsing.Literal(u':') + pyparsing.SkipTo(pyparsing.lineEnd).setResultsName('message')) # Repeated line. REPEATED_LINE = ( MONTH.setResultsName('month') + ONE_OR_TWO_DIGITS.setResultsName('day') + TIME.setResultsName('time') + pyparsing.Literal(u'---').suppress() + pyparsing.CharsNotIn(u'---').setResultsName('repeated_line') + pyparsing.Literal(u'---').suppress()) previous_structure = None def RawToUTF8(text): try: text = text.decode('utf-8') except UnicodeDecodeError: print "Mierda y cuchara" text = text.decode('utf-8', 'ignore') return text def ParseLine(line): print u'Timestamp: {} {} {}, Host: {}, Agent: {} (Pid: {}).'.format( line.month, line.day, line.time, line.computer_name, line.agent.split(' ', 1)[1], line.pid) print u'Message: {}'.format(line.message) try: f = open(sys.argv[1]) except: print 'No file, no senorito, no file.' cont = 1 for line in f: try: data = BSD_PLAINTEXT.parseString(line) if previous_structure != None: ParseLine(previous_structure) data.message = RawToUTF8(data.message) previous_structure = data except: try: data = REPEATED_LINE.parseString(line) ParseLine(previous_structure) previous_structure.month = data.month previous_structure.day = data.day previous_structure.time = data.time ParseLine(previous_structure) except: previous_structure.message += RawToUTF8(line) ParseLine(previous_structure) f.close()
Sign in to reply to this message.
Code updated.
Sign in to reply to this message.
New version completely re-programmed. It is not so clear code, but it can parse all the ASL BSD files. Merry Xmas.
Sign in to reply to this message.
Few comments this time, but I think you should not use the base parser but use the newly added multi line pyparsing assistant (https://code.google.com/p/plaso/source/browse/plaso/lib/text_parser.py#744) https://codereview.appspot.com/41530045/diff/40001/plaso/parsers/mac_bsdplain... File plaso/parsers/mac_bsdplaintext.py (right): https://codereview.appspot.com/41530045/diff/40001/plaso/parsers/mac_bsdplain... plaso/parsers/mac_bsdplaintext.py:58: class BsdEntry(object): why is this necessary? https://codereview.appspot.com/41530045/diff/40001/plaso/parsers/mac_bsdplain... plaso/parsers/mac_bsdplaintext.py:84: def changeTimestamp(self, month, day, time): s/changeTim.../ChangeTime/ https://codereview.appspot.com/41530045/diff/40001/plaso/parsers/mac_bsdplain... plaso/parsers/mac_bsdplaintext.py:97: class MacBsdPlaintextLogParser(parser.BaseParser): why are you using the base parser here? let's use the multi-line pyparsing assistant, seems like a good fit here. https://codereview.appspot.com/41530045/diff/40001/plaso/parsers/mac_bsdplain... plaso/parsers/mac_bsdplaintext.py:138: def Parse(self, file_object): should not be necessary to define this function if you use the multi-line assistant. https://codereview.appspot.com/41530045/diff/40001/plaso/parsers/mac_bsdplain... plaso/parsers/mac_bsdplaintext.py:189: file_object.seek(0) add here (top import os) file_object.seek(0, os.SEEK_SET)
Sign in to reply to this message.
Code updated.
Sign in to reply to this message.
Syslog using Pyparsing and multiline from text_parsed instead of regular expression. I have detected some issues with multiline. I will check it tomorrow with more time. But as an example the first three lines, it parse correctly the first and the third, but not the second: $ head -3 test_data/syslog Dec 24 01:21:46 DarkTemplar-2 kernel[0]: AppleThunderboltNHIType2::waitForOk2Go2Sx - retries = 3 Dec 24 01:21:47 --- last message repeated 1 time --- Dec 24 01:21:48 DarkTemplar-2.local Mail[30131]: *** Assertion failure in -[MFEWSGateway fetchCopyOfSyncIssuesEntryID], /SourceCache/Mail/Mail-1822/FrameworkTargets/MailFramework/EWS/MFEWSGateway.m:1848 Maybe, it might be my mistake, it is quite late. Thank you.
Sign in to reply to this message.
so are you removing the old syslog parser for this one?
Sign in to reply to this message.
On 2014/01/02 18:50:42, kiddi wrote: > so are you removing the old syslog parser for this one? Yes, well, if you think that it is a good idea, because the old one uses RE. But I have some issues with the multiline parser because it does not parser well when exists two different possible structures. I am trying to know why.
Sign in to reply to this message.
I'm reviewing the multi line parser right now and looking at this implementation as a PoC. I'll update this CL shortly.
Sign in to reply to this message.
Code updated.
Sign in to reply to this message.
Code updated.
Sign in to reply to this message.
First of all, sorry for the delay. Normal syslog works well using the pyparsing, can be reviewed. Mac_Syslog (multiline) should (has to) wait until refactorization (the majority of the logs are in ASL binary format, not rush).
Sign in to reply to this message.
Code updated.
Sign in to reply to this message.
Now only multiline support (ASL BSD plaintext).
Sign in to reply to this message.
Few comments, been a while since I looked at this CL, sorry for the extremely long wait. I think you need to upgrade this client and make sure this still works and then work on the comments. https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog.py File plaso/parsers/mac_syslog.py (right): https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog.... plaso/parsers/mac_syslog.py:23: # for Mac OS X. no need for this comment really. You can just say that the Mac BSD syslog format is multilined, thus the need to have a separate parser. https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog.... plaso/parsers/mac_syslog.py:51: computer_name: string with the name of the computer. same comment here as last time, why not pass in the values of computer_name and pid instead of passing some values in here and then pass in the structure for the rest? https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog.... plaso/parsers/mac_syslog.py:69: # Regular expressions for known actions. what regular expressions? https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog.... plaso/parsers/mac_syslog.py:70: # Define how a log line should look like. u can remove this comment https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog.... plaso/parsers/mac_syslog.py:103: # Mac ASL Bsd plaintext repeated line support end with a . https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog.... plaso/parsers/mac_syslog.py:113: return True no more verifications here? same comment as in the regular syslog https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog.... plaso/parsers/mac_syslog.py:174: timestamp, structure, structure.reporter.split()[0], body) only 4 space hanging indent https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog.... plaso/parsers/mac_syslog.py:174: timestamp, structure, structure.reporter.split()[0], body) and not use the split()[0] here, rather use partition above and pass that in https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog.... plaso/parsers/mac_syslog.py:195: year, month, int(day, 10), hour, minute, second) no need for the int anymore https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog.... plaso/parsers/mac_syslog.py:213: timestamp = datetime.datetime.fromtimestamp(time, zone) and don't use datetime.datetime directly here, use timelib.Timestamp https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog_... File plaso/parsers/mac_syslog_test.py (right): https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog_... plaso/parsers/mac_syslog_test.py:20: # import pytz why this all commented out? https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog_... plaso/parsers/mac_syslog_test.py:23: # pylint: disable=W0611 and don't use a number pylint suppression https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog_... plaso/parsers/mac_syslog_test.py:24: # pylint: disable=pointless-string-statement and what is this? https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog_... plaso/parsers/mac_syslog_test.py:48: # Hack only to be submitted to codereview. this needs to be fixed ;) was there an issue with the multiline parser itself or just your parser? and if it was with the multiline parent, what about now? https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog_... plaso/parsers/mac_syslog_test.py:51: ''' what is this? https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog_... plaso/parsers/mac_syslog_test.py:59: u'waitForOk2Go2Sx - retries = 3') let's use the test_lib test for message strings, so that we test both the short and long one as well. https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog_... plaso/parsers/mac_syslog_test.py:61: self.assertEqual(1387848106000000, event.timestamp) missing the comments https://codereview.appspot.com/41530045/diff/120001/plaso/parsers/mac_syslog_... plaso/parsers/mac_syslog_test.py:164: ''' what is this?
Sign in to reply to this message.
|