|
|
Created:
14 years, 6 months ago by adg Modified:
14 years, 5 months ago Reviewers:
CC:
rsc, golang-dev Visibility:
Public. |
Descriptionarchive/zip: new package for reading ZIP files
Patch Set 1 #Patch Set 2 : code review 2125042: archive/zip: new package for reading ZIP files #Patch Set 3 : code review 2125042: archive/zip: new package for reading ZIP files #Patch Set 4 : code review 2125042: archive/zip: new package for reading ZIP files #Patch Set 5 : code review 2125042: archive/zip: new package for reading ZIP files #Patch Set 6 : code review 2125042: archive/zip: new package for reading ZIP files #Patch Set 7 : code review 2125042: archive/zip: new package for reading ZIP files #Patch Set 8 : code review 2125042: archive/zip: new package for reading ZIP files #Patch Set 9 : code review 2125042: archive/zip: new package for reading ZIP files #Patch Set 10 : code review 2125042: archive/zip: new package for reading ZIP files #
Total comments: 28
Patch Set 11 : code review 2125042: archive/zip: new package for reading ZIP files #
Total comments: 2
Patch Set 12 : code review 2125042: archive/zip: new package for reading ZIP files #
Total comments: 1
Patch Set 13 : code review 2125042: archive/zip: new package for reading ZIP files #
Total comments: 23
Patch Set 14 : code review 2125042: archive/zip: new package for reading ZIP files #Patch Set 15 : code review 2125042: archive/zip: new package for reading ZIP files #
Total comments: 10
Patch Set 16 : code review 2125042: archive/zip: new package for reading ZIP files #
Total comments: 16
Patch Set 17 : code review 2125042: archive/zip: new package for reading ZIP files #
MessagesTotal messages: 27
Hello rsc, r (cc: golang-dev@googlegroups.com), I'd like you to review this change.
Sign in to reply to this message.
Requesting comments on the design. It's still incomplete; at the very least it needs support for the 'store' method, and I would like to allow it to read ZIP64 files. On 5 September 2010 15:01, <adg@golang.org> wrote: > Reviewers: rsc, r, > > Message: > Hello rsc, r (cc: golang-dev@googlegroups.com), > > I'd like you to review this change. > > > Description: > archive/zip: new package for reading ZIP files > > Please review this at http://codereview.appspot.com/2125042/ > > Affected files: > A src/pkg/archive/zip/Makefile > A src/pkg/archive/zip/testfiles/test.zip > A src/pkg/archive/zip/zip.go > A src/pkg/archive/zip/zip_test.go > > >
Sign in to reply to this message.
After some small reflection, I added support for "store" (no compression). I also added the restriction that files must be Read one at a time (although in any order) - this makes the code simpler in places (and parallel reads from separate files wasn't correctly implemented, anyway). On 5 September 2010 15:02, Andrew Gerrand <adg@golang.org> wrote: > Requesting comments on the design. > > It's still incomplete; at the very least it needs support for the > 'store' method, and I would like to allow it to read ZIP64 files. > > On 5 September 2010 15:01, <adg@golang.org> wrote: >> Reviewers: rsc, r, >> >> Message: >> Hello rsc, r (cc: golang-dev@googlegroups.com), >> >> I'd like you to review this change. >> >> >> Description: >> archive/zip: new package for reading ZIP files >> >> Please review this at http://codereview.appspot.com/2125042/ >> >> Affected files: >> A src/pkg/archive/zip/Makefile >> A src/pkg/archive/zip/testfiles/test.zip >> A src/pkg/archive/zip/zip.go >> A src/pkg/archive/zip/zip_test.go >> >> >> >
Sign in to reply to this message.
I forwarded you an email from golang-nuts in May about a possible design. It looks similar to what you're using, except that you have both zip.ZipFile zip.File which is definitely confusing. The mail I forwarded used zip.Reader and zip.File, which is a little better but still not great. Another important difference is that this CL makes the Files singletons associated with the zip file. That means there is only one read offset associated with a particular file, so you've got multiple goroutines or functions (possibly) fighting over the offset. And there's not even a seek, so the entries in the zip file are read once. Instead of this, I'd suggest having an explicit Open method as in the email. zip.Reader and zip.Writer match archive/tar, so I'd definitely suggest those as the top-level names. I also suggest splitting the package into reader.go and writer.go. I'll write more tomorrow. It's still the weekend here. Russ
Sign in to reply to this message.
Hello rsc, r (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
I've taken these suggestions onboard and made some changes. The downside is that now NewReader (previously Open) requires an io.ReaderAt and a file length (otherwise we don't know where to seek to). As described in your other email to hotei, there are ways we can improve the io interfaces to better support this kind of thing. On 7 September 2010 07:43, Russ Cox <rsc@golang.org> wrote: > I forwarded you an email from golang-nuts in May > about a possible design. It looks similar to what > you're using, except that you have both > > zip.ZipFile > zip.File > > which is definitely confusing. The mail I forwarded > used zip.Reader and zip.File, which is a little > better but still not great. > > Another important difference is that this CL makes > the Files singletons associated with the zip file. > That means there is only one read offset associated > with a particular file, so you've got multiple goroutines > or functions (possibly) fighting over the offset. > And there's not even a seek, so the entries in the > zip file are read once. Instead of this, I'd suggest > having an explicit Open method as in the email. > > zip.Reader and zip.Writer match archive/tar, > so I'd definitely suggest those as the top-level names. > I also suggest splitting the package into reader.go > and writer.go. > > I'll write more tomorrow. It's still the weekend here. > > Russ >
Sign in to reply to this message.
looks pretty good. a bunch of small things. http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.go File src/pkg/archive/zip/reader.go (right): http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:28: directoryEnd Does this need to be in the struct? http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:30: Files []*File we're not entirely consistent about this but most of the go code uses singular here: File []*File so that you write File[i].Name http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:36: directoryHeader you have to expose this if you are trying to expose the fields. otherwise godoc won't tell people about it. see archive/tar. I'd suggest making this FileHeader http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:40: Filename string It's already in type File. s/Filename/Name/ http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:44: func ReadFile(name string) (r *Reader, err os.Error) { s/ReadFile/OpenReader/? http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:56: // NewReader takes a ReaderAt and stream length, You can drop the "takes" notes: it's in the code. If you want to mention the names it's better to define their meanings. // NewReader returns a new Reader reading from r, which is assumed // to have the given size in bytes. http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:60: func NewReader(r io.ReaderAt, length int64) (z *Reader, err os.Error) { s/length/size/ to match os.FileInfo http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:62: offs, err := findDirectoryEndOffset(rs) s/offs/off/ offs sounds plural http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:85: return nil, err I don't like using named return values in anything but trivial functions, for exactly this reason. I won't insist but please consider dropping the names from all the return values in this file. It has the added benefit that the signatures are easier for humans to read quickly: as a client I don't care about seeing "z" and "err", just "*Reader, os.Error". http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:93: offs := int64(f.fileHeaderOffset) s/offs/off/ http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:139: if rerr, ok := recover().(os.Error); rerr != nil && ok { rerr != nil is implied by ok http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:165: if rerr, ok := recover().(os.Error); rerr != nil && ok { same http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/struct.go File src/pkg/archive/zip/struct.go (right): http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/struct.g... src/pkg/archive/zip/struct.go:9: type fileHeader struct { fileHeader and directoryHeader are really the same data structure; they just have different encodings at different points in the file. I'd suggest making a single type FileHeader struct { ... } with only exported fields, fields that will be useful to callers. Things like "signature" can be local variables with no loss in functionality. You'd still have two reading functions, one for the header next to the file and one for the TOC entry, but they'd both return (or read into) a *FileHeader. http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/zip_test.go File src/pkg/archive/zip/zip_test.go (right): http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/zip_test... src/pkg/archive/zip/zip_test.go:15: testZipFile = "testfiles/test.zip" directory should be named testdata to match other packages. look at the archive/tar reader test. this should probably be much more table driven. as is it's going to be painful enough to add new tests that people won't. please add http://swtch.com/r.zip to the testdata.
Sign in to reply to this message.
Hello rsc, r (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
PTAL http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.go File src/pkg/archive/zip/reader.go (right): http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:28: directoryEnd On 2010/09/08 14:55:50, rsc1 wrote: > Does this need to be in the struct? No, removed. http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:30: Files []*File On 2010/09/08 14:55:50, rsc1 wrote: > we're not entirely consistent about this but > most of the go code uses singular here: > > File []*File > > so that you write File[i].Name > > Done. http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:36: directoryHeader On 2010/09/08 14:55:50, rsc1 wrote: > you have to expose this if you are trying to expose the fields. > otherwise godoc won't tell people about it. see archive/tar. > > I'd suggest making this FileHeader > Done. http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:40: Filename string On 2010/09/08 14:55:50, rsc1 wrote: > It's already in type File. s/Filename/Name/ > Done. http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:44: func ReadFile(name string) (r *Reader, err os.Error) { On 2010/09/08 14:55:50, rsc1 wrote: > s/ReadFile/OpenReader/? > Done. http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:56: // NewReader takes a ReaderAt and stream length, On 2010/09/08 14:55:50, rsc1 wrote: > You can drop the "takes" notes: it's in the code. > If you want to mention the names it's better to define their meanings. > > // NewReader returns a new Reader reading from r, which is assumed > // to have the given size in bytes. Done. http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:60: func NewReader(r io.ReaderAt, length int64) (z *Reader, err os.Error) { On 2010/09/08 14:55:50, rsc1 wrote: > s/length/size/ to match os.FileInfo > Done. http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:62: offs, err := findDirectoryEndOffset(rs) On 2010/09/08 14:55:50, rsc1 wrote: > s/offs/off/ > offs sounds plural > Done. http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:85: return nil, err On 2010/09/08 14:55:50, rsc1 wrote: > I don't like using named return values in > anything but trivial functions, for exactly this reason. > I won't insist but please consider dropping the names > from all the return values in this file. It has the > added benefit that the signatures are easier for > humans to read quickly: as a client I don't care > about seeing "z" and "err", just "*Reader, os.Error". > Done. http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:93: offs := int64(f.fileHeaderOffset) On 2010/09/08 14:55:50, rsc1 wrote: > s/offs/off/ > Done. http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:139: if rerr, ok := recover().(os.Error); rerr != nil && ok { On 2010/09/08 14:55:50, rsc1 wrote: > rerr != nil is implied by ok > Done. http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:165: if rerr, ok := recover().(os.Error); rerr != nil && ok { On 2010/09/08 14:55:50, rsc1 wrote: > same > Done. http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/struct.go File src/pkg/archive/zip/struct.go (right): http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/struct.g... src/pkg/archive/zip/struct.go:9: type fileHeader struct { On 2010/09/08 14:55:50, rsc1 wrote: > fileHeader and directoryHeader are really the same > data structure; they just have different encodings > at different points in the file. I'd suggest making a single > > type FileHeader struct { ... } > > with only exported fields, fields that will be useful to callers. > Things like "signature" can be local variables with no loss > in functionality. > > You'd still have two reading functions, one for the header > next to the file and one for the TOC entry, but they'd both > return (or read into) a *FileHeader. > > Done. http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/zip_test.go File src/pkg/archive/zip/zip_test.go (right): http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/zip_test... src/pkg/archive/zip/zip_test.go:15: testZipFile = "testfiles/test.zip" On 2010/09/08 14:55:50, rsc1 wrote: > directory should be named testdata to match other packages. > > look at the archive/tar reader test. this should probably be much > more table driven. as is it's going to be painful enough to add new > tests that people won't. > > please add http://swtch.com/r.zip to the testdata. > Done.
Sign in to reply to this message.
http://codereview.appspot.com/2125042/diff/21006/src/pkg/archive/zip/reader.go File src/pkg/archive/zip/reader.go (right): http://codereview.appspot.com/2125042/diff/21006/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:29: File []*FileHeader I wasn't clear earlier; I intended File []*File with type File struct { FileHeader some unexported fields } type FileHeader struct { All uint32 Exported uint32 Fields uint32 } the same way that archive/tar does. That makes FileHeader something clients can manipulate, which will help when it's time to add write support. http://codereview.appspot.com/2125042/diff/21006/src/pkg/archive/zip/struct.go File src/pkg/archive/zip/struct.go (right): http://codereview.appspot.com/2125042/diff/21006/src/pkg/archive/zip/struct.g... src/pkg/archive/zip/struct.go:11: type FileHeader struct { FileHeader should be fields useful for clients, all exported. Anything necessary for the implementation should be in File instead, unexported. It might be just type FileHeader struct { Name string Comment string Size uint32 } for now, with more information added as necessary. Compare to archive/tar Header.
Sign in to reply to this message.
Hello rsc, r (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
http://codereview.appspot.com/2125042/diff/36001/src/pkg/archive/zip/reader.go File src/pkg/archive/zip/reader.go (right): http://codereview.appspot.com/2125042/diff/36001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:114: if _, err = f.r.Seek(0, 0); err != nil { Instead of that sorted list above, I would just do r := io.NewSectionReader(f.zipr, f.headerOffset, f.size-f.headerOffset) and use that, and then below use f.zipr too with appropriate adjustments. (where zipr is the original io.ReaderAt passed in) and then you can get rid of one level of indirection here.
Sign in to reply to this message.
Hello rsc, r (cc: golang-dev@googlegroups.com), I'd like you to review this change.
Sign in to reply to this message.
Looking pretty good. Comments are mainly ways to improve robustness now. http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.go File src/pkg/archive/zip/reader.go (right): http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:6: The zip package provides support for reading ZIP archives as described here: The first sentence is shown in the package list, which doesn't need the URL. The zip package provides support for reading ZIP archives. See http://www.pkware.com/documents/casestudies/APPNOTE.TXT. This package does not support ZIP64 or disk spanning. http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:65: end, err := readDirectoryEnd(rs) It's kind of odd to have findDirectoryEndOffset and readDirectoryEnd be different functions. I'd roll the finding into readDirectoryEnd, and then you can pass in the ReaderAt and only read the data once (use bytes.NewBuffer to get the reader you need). http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:77: for i := range z.File { before this line buf := bufio.NewReader(rs) http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:79: if err := readDirectoryHeader(z.File[i], rs); err != nil { s/rs/buf/ It will matter for big zip files. All those tiny little reads add up. http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:91: if _, err = r.Seek(0, 0); err != nil { Should not be necessary. NewSectionReader presumably returns a reader already positioned at zero. http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:102: switch f.Method { These readers need to check the checksum once they reach the end of the file. See compress/gzip for inspiration. http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:214: const minRecordSize = 4 + 2 + 2 + 2 + 2 + 4 + 4 + 2 minSize http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:221: // TODO: this is very inefficient, but it works. Not if the file is a huge non-zip file. http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:222: // Better to write the little-endian representation of Please do. You shouldn't need to seek backwards either: just read the last say 1024 bytes from the file and look in the one slice. bytes.LastIndex should suffice. I think to do that you'll want to pass in the ReaderAt and the size, not the ReadSeeker. Also the test should check that if you add the size of the record + the string length at the end of the struct you get the end of the file. Consider what happens if you run this on an ISO image that happens to have a zip file embedded in it somewhere. http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:248: n, err := r.Read(b) _, err := io.ReadFull(r, b) then you don't need to check n; err will be set for a short read. also it fixes a bug: a single Read is not required to fill the buffer. http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader_t... File src/pkg/archive/zip/reader_test.go (right): http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader_t... src/pkg/archive/zip/reader_test.go:22: Contents []byte // if blank, will attempt to compare against File Content please
Sign in to reply to this message.
http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.go File src/pkg/archive/zip/reader.go (right): http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:6: The zip package provides support for reading ZIP archives as described here: On 2010/09/24 04:05:43, rsc1 wrote: > The first sentence is shown in the package list, > which doesn't need the URL. > > The zip package provides support for reading ZIP archives. > See http://www.pkware.com/documents/casestudies/APPNOTE.TXT. > > This package does not support ZIP64 or disk spanning. Done. http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:65: end, err := readDirectoryEnd(rs) On 2010/09/24 04:05:43, rsc1 wrote: > It's kind of odd to have findDirectoryEndOffset and readDirectoryEnd be > different functions. I'd roll the > finding into readDirectoryEnd, and then you can pass > in the ReaderAt and only read the data once > (use bytes.NewBuffer to get the reader you need). > Done, although the approach I took requires two reads (minimum). (more discussion below) http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:77: for i := range z.File { On 2010/09/24 04:05:43, rsc1 wrote: > before this line > > buf := bufio.NewReader(rs) > Done. http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:79: if err := readDirectoryHeader(z.File[i], rs); err != nil { On 2010/09/24 04:05:43, rsc1 wrote: > s/rs/buf/ > > It will matter for big zip files. > All those tiny little reads add up. > Done. http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:91: if _, err = r.Seek(0, 0); err != nil { On 2010/09/24 04:05:43, rsc1 wrote: > Should not be necessary. NewSectionReader presumably > returns a reader already positioned at zero. You're right; this code is the legacy of an older approach. http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:102: switch f.Method { On 2010/09/24 04:05:43, rsc1 wrote: > These readers need to check the checksum > once they reach the end of the file. > See compress/gzip for inspiration. > Done. http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:214: const minRecordSize = 4 + 2 + 2 + 2 + 2 + 4 + 4 + 2 On 2010/09/24 04:05:43, rsc1 wrote: > minSize > Done. http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:221: // TODO: this is very inefficient, but it works. On 2010/09/24 04:05:43, rsc1 wrote: > Not if the file is a huge non-zip file. If it's a huge non-zip file, reading it will fail somewhere, if not here. How do you know if you're reading past a huge comment or just junk? You can only check after finding the header and confirming the comment length. http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:222: // Better to write the little-endian representation of On 2010/09/24 04:05:43, rsc1 wrote: > Please do. > You shouldn't need to seek backwards either: just read the > last say 1024 bytes from the file and look in the one slice. > bytes.LastIndex should suffice. Done. But what if the comment is longer than 1024 bytes? (Or n bytes.) I've written code to deal with this case, but this brings us to your next point: > Also the test should check that if you add the size of > the record + the string length at the end of the struct > you get the end of the file. Consider what happens if > you run this on an ISO image that happens to have a zip > file embedded in it somewhere. What happens? The zip file should be opened correctly. Is that bad? http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:248: n, err := r.Read(b) On 2010/09/24 04:05:43, rsc1 wrote: > _, err := io.ReadFull(r, b) > > then you don't need to check n; err will be set for a short read. > also it fixes a bug: a single Read is not required to > fill the buffer. > > Done. http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader_t... File src/pkg/archive/zip/reader_test.go (right): http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader_t... src/pkg/archive/zip/reader_test.go:22: Contents []byte // if blank, will attempt to compare against File On 2010/09/24 04:05:43, rsc1 wrote: > Content please > Done.
Sign in to reply to this message.
http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.go File src/pkg/archive/zip/reader.go (right): http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:222: // Better to write the little-endian representation of > > Also the test should check that if you add the size of > > the record + the string length at the end of the struct > > you get the end of the file. Consider what happens if > > you run this on an ISO image that happens to have a zip > > file embedded in it somewhere. > > What happens? The zip file should be opened correctly. Is that bad? I'm not sure we understand each other. Are you saying that if I call zip.OpenReader("ubuntu.iso") it should find some random zip file inside the ISO and open it? That's the case I was talking about. If it found a signature halfway through the ISO, the old code would use it, but all the offsets would be wrong since ISO file 0 != zip file 0. And if it didn't find a signature, it would first read the entire ISO image backward, which could take a while. The comment length is at most 64k. You can read the last 65k of the file and decide whether there's a trailer. Also the comment can contain the trailer bytes. Having the right signature is not enough (I will build a zip file where the CRC == the signature if you like). A trailer is only valid if the signature matches *and* adding the structure size plus the comment length gets you to the end of the file.
Sign in to reply to this message.
On 28 September 2010 12:03, <rsc@google.com> wrote: > > http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.go > File src/pkg/archive/zip/reader.go (right): > > http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... > src/pkg/archive/zip/reader.go:222: // Better to write the little-endian > representation of >> >> > Also the test should check that if you add the size of >> > the record + the string length at the end of the struct >> > you get the end of the file. Consider what happens if >> > you run this on an ISO image that happens to have a zip >> > file embedded in it somewhere. > >> What happens? The zip file should be opened correctly. Is that bad? > > I'm not sure we understand each other. > > Are you saying that if I call zip.OpenReader("ubuntu.iso") > it should find some random zip file inside the ISO and > open it? That's the case I was talking about. If it found > a signature halfway through the ISO, the old code would > use it, but all the offsets would be wrong since ISO file 0 > != zip file 0. And if it didn't find a signature, it would > first read the entire ISO image backward, which could take > a while. Yes, that's true. > The comment length is at most 64k. You can read the > last 65k of the file and decide whether there's a trailer. Yep, while reading the spec I'd just arrived at the same conclusion. > Also the comment can contain the trailer bytes. > Having the right signature is not enough (I will build > a zip file where the CRC == the signature if you like). > A trailer is only valid if the signature matches > *and* adding the structure size plus the comment > length gets you to the end of the file. So if it encounters the signature it should attempt to read the signature, and keep looking if it's invalid? What if the comment contains an entire valid directory end header? It seems there are lots of pathological cases, and I'm not sure where to draw the line. Thanks for your help.
Sign in to reply to this message.
On 28 September 2010 12:10, Andrew Gerrand <adg@golang.org> wrote: > On 28 September 2010 12:03, <rsc@google.com> wrote: >> >> http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.go >> File src/pkg/archive/zip/reader.go (right): >> >> http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g... >> src/pkg/archive/zip/reader.go:222: // Better to write the little-endian >> representation of >>> >>> > Also the test should check that if you add the size of >>> > the record + the string length at the end of the struct >>> > you get the end of the file. Consider what happens if >>> > you run this on an ISO image that happens to have a zip >>> > file embedded in it somewhere. >> >>> What happens? The zip file should be opened correctly. Is that bad? >> >> I'm not sure we understand each other. >> >> Are you saying that if I call zip.OpenReader("ubuntu.iso") >> it should find some random zip file inside the ISO and >> open it? That's the case I was talking about. If it found >> a signature halfway through the ISO, the old code would >> use it, but all the offsets would be wrong since ISO file 0 >> != zip file 0. And if it didn't find a signature, it would >> first read the entire ISO image backward, which could take >> a while. > > Yes, that's true. > >> The comment length is at most 64k. You can read the >> last 65k of the file and decide whether there's a trailer. > > Yep, while reading the spec I'd just arrived at the same conclusion. > >> Also the comment can contain the trailer bytes. >> Having the right signature is not enough (I will build >> a zip file where the CRC == the signature if you like). >> A trailer is only valid if the signature matches >> *and* adding the structure size plus the comment >> length gets you to the end of the file. > > So if it encounters the signature it should attempt to read the > signature, and keep looking if it's invalid? What if the comment > contains an entire valid directory end header? It seems there are lots > of pathological cases, and I'm not sure where to draw the line. I just read through the infozip source (which, incidentally, is quite horrifying) and they don't do anything special to validate the signature. They just use the first one found, so I guess that's good enough here, too. Will validate the record length + file size and be done with it. Andrew
Sign in to reply to this message.
> So if it encounters the signature it should attempt to read the > signature, and keep looking if it's invalid? What if the comment > contains an entire valid directory end header? It seems there are lots > of pathological cases, and I'm not sure where to draw the line. True. I think one can always invent an input that will kill the algorithm; I just want to avoid non-malicious ones. It can legitimately happen, with no malicious intent, that the CRC32 will match the signature; checking the comment length avoids that case. I would suggest reading the last kilobyte, if there is that much, and then scanning backward for a correct signature with a matching comment length: func findInBlock(b []byte) int { for i := len(b)-minSize-4; i >= 0; i-- { if b[i] == 0x12 && b[i+1] == 0x23 && b[i+2] == 0x34 && b[i+3] == 0x45 { // (use hex numbers for signature) n := int(b[i+minSize-2])<<8 | int(b[i+minSize-1]) // make sure comment extends to end of file if n+minSize == i { return i } } } return -1 } You could read the last kilobyte, call findInBlock, and if that fails read the last 65k and call findInBlock again. Most of the time you'll avoid the bigger read. Russ
Sign in to reply to this message.
PTAL I still need to craft a test to trigger the 65kb read, and probably also test reading a non-zip file.
Sign in to reply to this message.
Looks good. Will wait for tests. If you want some test cases http://www/~rsc/readme.zip http://www/~rsc/readme.notzip http://codereview.appspot.com/2125042/diff/69001/src/pkg/archive/zip/reader.go File src/pkg/archive/zip/reader.go (right): http://codereview.appspot.com/2125042/diff/69001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:213: b = make([]byte, bLen) if bLen > size { bLen = size } b = make([]byte, bLen) avoids the b[-off:] below and allocates less anyway. http://codereview.appspot.com/2125042/diff/69001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:222: if p := findSignatureInBlock(b); p > -1 { p >= 0 is more idiomatic; either way http://codereview.appspot.com/2125042/diff/69001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:226: if i == 1 { i == 1 || bLen == size http://codereview.appspot.com/2125042/diff/69001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:255: if b[i+3] == 0x06 && b[i+2] == 0x05 && b[i+1] == 0x4b && b[i] == 0x50 { Looks backwards. b[i] == 'P' && b[i+1] == 'K' && b[i+2] == 0x05 && b[i+3] == 0x06 (the PK is from PKZIP) http://codereview.appspot.com/2125042/diff/69001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:266: func readAtFull(r io.ReaderAt, b []byte, off int64) (err os.Error) { Every ReadAt is a "readAtFull". This wrapper can go away. (godoc io ReaderAt)
Sign in to reply to this message.
On 29 September 2010 00:06, <rsc@google.com> wrote: > Looks good. Will wait for tests. > > If you want some test cases > > http://www/~rsc/readme.zip > http://www/~rsc/readme.notzip I can't seem to find these.
Sign in to reply to this message.
Hello rsc, r (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Pretty close. reader_test.go:145: (Rietveld is having issues) Tighten these error messages. Think about whether they'll be easy to scan quickly when the code breaks: t.Errorf("%s: data[%d] = %#x, want %#x", ft.Name, i, b, c[i]) (I know I complain a lot about error message format, but I probably spend more time than anyone else looking at test failures, when I break the compiler.) http://codereview.appspot.com/2125042/diff/76001/src/pkg/archive/zip/reader.go File src/pkg/archive/zip/reader.go (right): http://codereview.appspot.com/2125042/diff/76001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:26: WrongSignatureError = os.NewError("Invalid header signature") error should all begin with a lower case letter and usually are not complete sentences invalid header signature signature not found unsupported compression algorithm checksum error also WrongSignatureError and NotFoundError expose internal details that are not relevant to the caller. i suggest merging them into FormatError = os.NewError("not a valid zip file") http://codereview.appspot.com/2125042/diff/76001/src/pkg/archive/zip/reader_t... File src/pkg/archive/zip/reader_test.go (right): http://codereview.appspot.com/2125042/diff/76001/src/pkg/archive/zip/reader_t... src/pkg/archive/zip/reader_test.go:65: if zt.Error != nil { All the prints and some of the tests in this file are backward from normal convention (like in the testing podcast). Please fix them (grep for expected). Also lines 65-73 boil down to: if err != zt.Error { t.Errorf(`%s: open error = %q, want %q`, zt.Name, err, zt.Error) return } http://codereview.appspot.com/2125042/diff/76001/src/pkg/archive/zip/reader_t... src/pkg/archive/zip/reader_test.go:72: t.Fatal(err) Error+return http://codereview.appspot.com/2125042/diff/76001/src/pkg/archive/zip/reader_t... src/pkg/archive/zip/reader_test.go:81: t.Errorf("incorrect comment: expected %q got %q", zt.Comment, z.Comment) Fix print order, but also add test context (zip file name): t.Errorf("%s: comment = %q, want %q", zt.Name, z.Comment, zt.Comment) http://codereview.appspot.com/2125042/diff/76001/src/pkg/archive/zip/reader_t... src/pkg/archive/zip/reader_test.go:128: t.Fatal(err) These Fatals seem like they should be Error + return. You don't want to abort the entire test, just this one case. http://codereview.appspot.com/2125042/diff/76001/src/pkg/archive/zip/reader_t... src/pkg/archive/zip/reader_test.go:134: var c []byte r.Close() http://codereview.appspot.com/2125042/diff/76001/src/pkg/archive/zip/reader_t... src/pkg/archive/zip/reader_test.go:140: if len(c) != b.Len() { b.Len() != len(c) have != want http://codereview.appspot.com/2125042/diff/76001/src/pkg/archive/zip/reader_t... src/pkg/archive/zip/reader_test.go:141: t.Errorf("file lengths do not match: expected %v got %v", len(c), b.Len()) t.Errorf("%s: len=%d, want %d", b.Len(), len(c)) return
Sign in to reply to this message.
Thanks http://codereview.appspot.com/2125042/diff/69001/src/pkg/archive/zip/reader.go File src/pkg/archive/zip/reader.go (right): http://codereview.appspot.com/2125042/diff/69001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:213: b = make([]byte, bLen) On 2010/09/28 14:06:36, rsc1 wrote: > if bLen > size { > bLen = size > } > b = make([]byte, bLen) > > avoids the b[-off:] below and > allocates less anyway. > > > Done. http://codereview.appspot.com/2125042/diff/69001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:222: if p := findSignatureInBlock(b); p > -1 { On 2010/09/28 14:06:36, rsc1 wrote: > p >= 0 is more idiomatic; either way > Done. http://codereview.appspot.com/2125042/diff/69001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:226: if i == 1 { On 2010/09/28 14:06:36, rsc1 wrote: > i == 1 || bLen == size > Done. http://codereview.appspot.com/2125042/diff/69001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:255: if b[i+3] == 0x06 && b[i+2] == 0x05 && b[i+1] == 0x4b && b[i] == 0x50 { On 2010/09/28 14:06:36, rsc1 wrote: > Looks backwards. > > b[i] == 'P' && b[i+1] == 'K' && b[i+2] == 0x05 && b[i+3] == 0x06 > > (the PK is from PKZIP) Done. http://codereview.appspot.com/2125042/diff/69001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:266: func readAtFull(r io.ReaderAt, b []byte, off int64) (err os.Error) { On 2010/09/28 14:06:36, rsc1 wrote: > Every ReadAt is a "readAtFull". This wrapper can go away. > > (godoc io ReaderAt) > Done. http://codereview.appspot.com/2125042/diff/76001/src/pkg/archive/zip/reader.g... src/pkg/archive/zip/reader.go:26: WrongSignatureError = os.NewError("Wrong signature for this header") On 2010/09/29 14:01:08, rsc1 wrote: > error should all begin with a lower case letter > and usually are not complete sentences > > invalid header signature > signature not found > unsupported compression algorithm > checksum error > > also WrongSignatureError and NotFoundError expose > internal details that are not relevant to the caller. > i suggest merging them into > > FormatError = os.NewError("not a valid zip file") Done. http://codereview.appspot.com/2125042/diff/76001/src/pkg/archive/zip/reader_t... File src/pkg/archive/zip/reader_test.go (right): http://codereview.appspot.com/2125042/diff/76001/src/pkg/archive/zip/reader_t... src/pkg/archive/zip/reader_test.go:65: if zt.Error != nil { On 2010/09/29 14:01:08, rsc1 wrote: > All the prints and some of the tests in this file are backward > from normal convention (like in the testing podcast). > Please fix them (grep for expected). > > Also lines 65-73 boil down to: > > if err != zt.Error { > t.Errorf(`%s: open error = %q, want %q`, zt.Name, err, zt.Error) > return > } Done. http://codereview.appspot.com/2125042/diff/76001/src/pkg/archive/zip/reader_t... src/pkg/archive/zip/reader_test.go:72: t.Fatal(err) On 2010/09/29 14:01:08, rsc1 wrote: > Error+return Done. http://codereview.appspot.com/2125042/diff/76001/src/pkg/archive/zip/reader_t... src/pkg/archive/zip/reader_test.go:81: t.Errorf("incorrect comment: expected %q got %q", zt.Comment, z.Comment) On 2010/09/29 14:01:08, rsc1 wrote: > Fix print order, but also add test context (zip file name): > > t.Errorf("%s: comment = %q, want %q", zt.Name, z.Comment, zt.Comment) Done. http://codereview.appspot.com/2125042/diff/76001/src/pkg/archive/zip/reader_t... src/pkg/archive/zip/reader_test.go:128: t.Fatal(err) On 2010/09/29 14:01:08, rsc1 wrote: > These Fatals seem like they should be Error + return. > You don't want to abort the entire test, just this one case. Done. http://codereview.appspot.com/2125042/diff/76001/src/pkg/archive/zip/reader_t... src/pkg/archive/zip/reader_test.go:134: var c []byte On 2010/09/29 14:01:08, rsc1 wrote: > r.Close() Done. http://codereview.appspot.com/2125042/diff/76001/src/pkg/archive/zip/reader_t... src/pkg/archive/zip/reader_test.go:140: if len(c) != b.Len() { On 2010/09/29 14:01:08, rsc1 wrote: > b.Len() != len(c) > > have != want Done. http://codereview.appspot.com/2125042/diff/76001/src/pkg/archive/zip/reader_t... src/pkg/archive/zip/reader_test.go:141: t.Errorf("file lengths do not match: expected %v got %v", len(c), b.Len()) On 2010/09/29 14:01:08, rsc1 wrote: > t.Errorf("%s: len=%d, want %d", b.Len(), len(c)) > return Done.
Sign in to reply to this message.
LGTM add archive/zip to pkg/Makefile hg file 2125042 pkg/Makefile
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=d6a4eb7fee9d *** archive/zip: new package for reading ZIP files R=rsc CC=golang-dev http://codereview.appspot.com/2125042 Committer: Andrew Gerrand <adg@golang.org>
Sign in to reply to this message.
|