DescriptionAdd a test for _ReadCsvDict
In 1.1.6 _ReadCsvDict returned all whitespace in every field. _ReadCsv strips trailing and following space. Now _ReadCsvDict uses the csv module to skip initial whitespace that is before quotes.
Using skipinitialspace is good because it fixes http://code.google.com/p/googletransitdatafeed/issues/detail?id=36 "Feed parsing doesn't deal with quoted values correctly when surrounded by spaces"
Switching from stripping in _ReadCsv to inside the csv module is good because it lets the validator warn or error about fields that contain only quoted whitespace such as the middle field here
myid, " ", "foo"
_ReadCsvDict now warns if any field in the first row of a file starts or ends with whitespace after csv has skipped initial whitespace. This lets us detect whitespace inside quotes, though it is a rare problem. It adds a warning for people who have whitespace after a header field but of 675 gtfs files I found this in only 15 and only 2 partners had a file that put whitespace after every header field.
When I move stop_times to use _ReadCsvDict this will fix
http://code.google.com/p/googletransitdatafeed/issues/detail?id=97
I don't want to add tests for _ReadCsv if it is about to be deleted.
It would be nicer to have validation inside csv parser.
Patch Set 1 #Patch Set 2 : add test for quoted fields, TODO: add warning #Patch Set 3 : svn update #Patch Set 4 : strip whitespace only in header #Patch Set 5 : whitespace and update comment about whitespace in existing feeds #MessagesTotal messages: 5
|