Issue 2125042: code review 2125042: archive/zip: new package for reading ZIP files

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+503 lines, -0 lines)			Patch
A	src/pkg/archive/zip/Makefile	View	1 2 3 4 5 6 7 8 9	1 chunk	+12 lines, -0 lines	0 comments	Download
A	src/pkg/archive/zip/reader.go	View	10 11 12 13 14 15 16	1 chunk	+278 lines, -0 lines	0 comments	Download
A	src/pkg/archive/zip/reader_test.go	View	11 12 13 14 15 16	1 chunk	+180 lines, -0 lines	0 comments	Download
A	src/pkg/archive/zip/struct.go	View	10 11	1 chunk	+33 lines, -0 lines	0 comments	Download
A	src/pkg/archive/zip/testdata/gophercolor16x16.png	View		Binary file		0 comments	Download
A	src/pkg/archive/zip/testdata/r.zip	View		Binary file		0 comments	Download
A	src/pkg/archive/zip/testdata/readme.notzip	View		Binary file		0 comments	Download
A	src/pkg/archive/zip/testdata/readme.zip	View		Binary file		0 comments	Download
A	src/pkg/archive/zip/testdata/test.zip	View		Binary file		0 comments	Download

Messages

Total messages: 27

Expand All Messages | Collapse All Messages

adg

Hello rsc, r (cc: golang-dev@googlegroups.com), I'd like you to review this change.

14 years, 10 months ago (2010-09-05 05:01:09 UTC) #1

adg

Requesting comments on the design. It's still incomplete; at the very least it needs support ...

14 years, 10 months ago (2010-09-05 05:03:09 UTC) #2

adg

After some small reflection, I added support for "store" (no compression). I also added the ...

14 years, 10 months ago (2010-09-05 06:08:36 UTC) #3

rsc

I forwarded you an email from golang-nuts in May about a possible design. It looks ...

14 years, 10 months ago (2010-09-06 21:43:09 UTC) #4

adg

Hello rsc, r (cc: golang-dev@googlegroups.com), Please take another look.

14 years, 10 months ago (2010-09-07 12:29:53 UTC) #5

adg

I've taken these suggestions onboard and made some changes. The downside is that now NewReader ...

14 years, 10 months ago (2010-09-07 12:36:00 UTC) #6

rsc1

looks pretty good. a bunch of small things. http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.go File src/pkg/archive/zip/reader.go (right): http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.go#newcode28 src/pkg/archive/zip/reader.go:28: directoryEnd ...

14 years, 10 months ago (2010-09-08 14:55:50 UTC) #7

adg

Hello rsc, r (cc: golang-dev@googlegroups.com), Please take another look.

14 years, 10 months ago (2010-09-09 10:46:25 UTC) #8

adg

PTAL http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.go File src/pkg/archive/zip/reader.go (right): http://codereview.appspot.com/2125042/diff/15006/src/pkg/archive/zip/reader.go#newcode28 src/pkg/archive/zip/reader.go:28: directoryEnd On 2010/09/08 14:55:50, rsc1 wrote: > Does ...

14 years, 10 months ago (2010-09-09 10:47:44 UTC) #9

rsc

http://codereview.appspot.com/2125042/diff/21006/src/pkg/archive/zip/reader.go File src/pkg/archive/zip/reader.go (right): http://codereview.appspot.com/2125042/diff/21006/src/pkg/archive/zip/reader.go#newcode29 src/pkg/archive/zip/reader.go:29: File []*FileHeader I wasn't clear earlier; I intended File ...

14 years, 10 months ago (2010-09-09 18:08:18 UTC) #10

adg

Hello rsc, r (cc: golang-dev@googlegroups.com), Please take another look.

14 years, 9 months ago (2010-09-15 11:45:33 UTC) #11

rsc1

http://codereview.appspot.com/2125042/diff/36001/src/pkg/archive/zip/reader.go File src/pkg/archive/zip/reader.go (right): http://codereview.appspot.com/2125042/diff/36001/src/pkg/archive/zip/reader.go#newcode114 src/pkg/archive/zip/reader.go:114: if _, err = f.r.Seek(0, 0); err != nil ...

14 years, 9 months ago (2010-09-16 18:06:34 UTC) #12

adg

Hello rsc, r (cc: golang-dev@googlegroups.com), I'd like you to review this change.

14 years, 9 months ago (2010-09-24 03:33:17 UTC) #13

rsc1

Looking pretty good. Comments are mainly ways to improve robustness now. http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.go File src/pkg/archive/zip/reader.go (right): ...

14 years, 9 months ago (2010-09-24 04:05:43 UTC) #14

adg

http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.go File src/pkg/archive/zip/reader.go (right): http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.go#newcode6 src/pkg/archive/zip/reader.go:6: The zip package provides support for reading ZIP archives ...

14 years, 9 months ago (2010-09-28 01:55:13 UTC) #15

http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.go
File src/pkg/archive/zip/reader.go (right):

http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g...
src/pkg/archive/zip/reader.go:6: The zip package provides support for reading
ZIP archives as described here:
On 2010/09/24 04:05:43, rsc1 wrote:
> The first sentence is shown in the package list,
> which doesn't need the URL.
> 
> The zip package provides support for reading ZIP archives.
> See http://www.pkware.com/documents/casestudies/APPNOTE.TXT.
> 
> This package does not support ZIP64 or disk spanning.

Done.

http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g...
src/pkg/archive/zip/reader.go:65: end, err := readDirectoryEnd(rs)
On 2010/09/24 04:05:43, rsc1 wrote:
> It's kind of odd to have findDirectoryEndOffset and readDirectoryEnd be
> different functions.  I'd roll the
> finding into readDirectoryEnd, and then you can pass
> in the ReaderAt and only read the data once
> (use bytes.NewBuffer to get the reader you need).
> 

Done, although the approach I took requires two reads (minimum). (more
discussion below)

http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g...
src/pkg/archive/zip/reader.go:77: for i := range z.File {
On 2010/09/24 04:05:43, rsc1 wrote:
> before this line
> 
> buf := bufio.NewReader(rs)
> 

Done.

http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g...
src/pkg/archive/zip/reader.go:79: if err := readDirectoryHeader(z.File[i], rs);
err != nil {
On 2010/09/24 04:05:43, rsc1 wrote:
> s/rs/buf/
> 
> It will matter for big zip files.
> All those tiny little reads add up.
> 

Done.

http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g...
src/pkg/archive/zip/reader.go:91: if _, err = r.Seek(0, 0); err != nil {
On 2010/09/24 04:05:43, rsc1 wrote:
> Should not be necessary.  NewSectionReader presumably
> returns a reader already positioned at zero.

You're right; this code is the legacy of an older approach.

http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g...
src/pkg/archive/zip/reader.go:102: switch f.Method {
On 2010/09/24 04:05:43, rsc1 wrote:
> These readers need to check the checksum
> once they reach the end of the file.
> See compress/gzip for inspiration.
> 

Done.

http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g...
src/pkg/archive/zip/reader.go:214: const minRecordSize = 4 + 2 + 2 + 2 + 2 + 4 +
4 + 2
On 2010/09/24 04:05:43, rsc1 wrote:
> minSize
> 

Done.

http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g...
src/pkg/archive/zip/reader.go:221: // TODO: this is very inefficient, but it
works.
On 2010/09/24 04:05:43, rsc1 wrote:
> Not if the file is a huge non-zip file.

If it's a huge non-zip file, reading it will fail somewhere, if not here. How do
you know if you're reading past a huge comment or just junk? You can only check
after finding the header and confirming the comment length.

http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g...
src/pkg/archive/zip/reader.go:222: // Better to write the little-endian
representation of
On 2010/09/24 04:05:43, rsc1 wrote:
> Please do.
> You shouldn't need to seek backwards either: just read the
> last say 1024 bytes from the file and look in the one slice.
> bytes.LastIndex should suffice.

Done. But what if the comment is longer than 1024 bytes? (Or n bytes.) I've
written code to deal with this case, but this brings us to your next point:

> Also the test should check that if you add the size of
> the record + the string length at the end of the struct
> you get the end of the file.  Consider what happens if
> you run this on an ISO image that happens to have a zip
> file embedded in it somewhere.

What happens? The zip file should be opened correctly. Is that bad?

http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g...
src/pkg/archive/zip/reader.go:248: n, err := r.Read(b)
On 2010/09/24 04:05:43, rsc1 wrote:
> _, err := io.ReadFull(r, b)
> 
> then you don't need to check n; err will be set for a short read.
> also it fixes a bug: a single Read is not required to
> fill the buffer.
> 
> 

Done.

http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader_t...
File src/pkg/archive/zip/reader_test.go (right):

http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader_t...
src/pkg/archive/zip/reader_test.go:22: Contents []byte // if blank, will attempt
to compare against File
On 2010/09/24 04:05:43, rsc1 wrote:
> Content please
> 

Done.

rsc1

http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.go File src/pkg/archive/zip/reader.go (right): http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.go#newcode222 src/pkg/archive/zip/reader.go:222: // Better to write the little-endian representation of > ...

14 years, 9 months ago (2010-09-28 02:03:10 UTC) #16

adg

On 28 September 2010 12:03, <rsc@google.com> wrote: > > http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.go > File src/pkg/archive/zip/reader.go (right): > ...

14 years, 9 months ago (2010-09-28 02:10:54 UTC) #17

On 28 September 2010 12:03,  <rsc@google.com> wrote:
>
> http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.go
> File src/pkg/archive/zip/reader.go (right):
>
>
http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g...
> src/pkg/archive/zip/reader.go:222: // Better to write the little-endian
> representation of
>>
>> > Also the test should check that if you add the size of
>> > the record + the string length at the end of the struct
>> > you get the end of the file.  Consider what happens if
>> > you run this on an ISO image that happens to have a zip
>> > file embedded in it somewhere.
>
>> What happens? The zip file should be opened correctly. Is that bad?
>
> I'm not sure we understand each other.
>
> Are you saying that if I call zip.OpenReader("ubuntu.iso")
> it should find some random zip file inside the ISO and
> open it?  That's the case I was talking about.  If it found
> a signature halfway through the ISO, the old code would
> use it, but all the offsets would be wrong since ISO file 0
> != zip file 0.  And if it didn't find a signature, it would
> first read the entire ISO image backward, which could take
> a while.

Yes, that's true.

> The comment length is at most 64k.  You can read the
> last 65k of the file and decide whether there's a trailer.

Yep, while reading the spec I'd just arrived at the same conclusion.

> Also the comment can contain the trailer bytes.
> Having the right signature is not enough (I will build
> a zip file where the CRC == the signature if you like).
> A trailer is only valid if the signature matches
> *and* adding the structure size plus the comment
> length gets you to the end of the file.

So if it encounters the signature it should attempt to read the
signature, and keep looking if it's invalid? What if the comment
contains an entire valid directory end header? It seems there are lots
of pathological cases, and I'm not sure where to draw the line.

Thanks for your help.

adg

On 28 September 2010 12:10, Andrew Gerrand <adg@golang.org> wrote: > On 28 September 2010 12:03, ...

14 years, 9 months ago (2010-09-28 02:42:47 UTC) #18

On 28 September 2010 12:10, Andrew Gerrand <adg@golang.org> wrote:
> On 28 September 2010 12:03,  <rsc@google.com> wrote:
>>
>>
http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.go
>> File src/pkg/archive/zip/reader.go (right):
>>
>>
http://codereview.appspot.com/2125042/diff/54001/src/pkg/archive/zip/reader.g...
>> src/pkg/archive/zip/reader.go:222: // Better to write the little-endian
>> representation of
>>>
>>> > Also the test should check that if you add the size of
>>> > the record + the string length at the end of the struct
>>> > you get the end of the file.  Consider what happens if
>>> > you run this on an ISO image that happens to have a zip
>>> > file embedded in it somewhere.
>>
>>> What happens? The zip file should be opened correctly. Is that bad?
>>
>> I'm not sure we understand each other.
>>
>> Are you saying that if I call zip.OpenReader("ubuntu.iso")
>> it should find some random zip file inside the ISO and
>> open it?  That's the case I was talking about.  If it found
>> a signature halfway through the ISO, the old code would
>> use it, but all the offsets would be wrong since ISO file 0
>> != zip file 0.  And if it didn't find a signature, it would
>> first read the entire ISO image backward, which could take
>> a while.
>
> Yes, that's true.
>
>> The comment length is at most 64k.  You can read the
>> last 65k of the file and decide whether there's a trailer.
>
> Yep, while reading the spec I'd just arrived at the same conclusion.
>
>> Also the comment can contain the trailer bytes.
>> Having the right signature is not enough (I will build
>> a zip file where the CRC == the signature if you like).
>> A trailer is only valid if the signature matches
>> *and* adding the structure size plus the comment
>> length gets you to the end of the file.
>
> So if it encounters the signature it should attempt to read the
> signature, and keep looking if it's invalid? What if the comment
> contains an entire valid directory end header? It seems there are lots
> of pathological cases, and I'm not sure where to draw the line.

I just read through the infozip source (which, incidentally, is quite
horrifying) and they don't do anything special to validate the
signature. They just use the first one found, so I guess that's good
enough here, too. Will validate the record length + file size and be
done with it.

Andrew

rsc

> So if it encounters the signature it should attempt to read the > signature, ...

14 years, 9 months ago (2010-09-28 02:43:58 UTC) #19

adg

PTAL I still need to craft a test to trigger the 65kb read, and probably ...

14 years, 9 months ago (2010-09-28 04:17:58 UTC) #20

rsc1

Looks good. Will wait for tests. If you want some test cases http://www/~rsc/readme.zip http://www/~rsc/readme.notzip http://codereview.appspot.com/2125042/diff/69001/src/pkg/archive/zip/reader.go ...

14 years, 9 months ago (2010-09-28 14:06:36 UTC) #21

adg

On 29 September 2010 00:06, <rsc@google.com> wrote: > Looks good. Will wait for tests. > ...

14 years, 9 months ago (2010-09-29 03:14:30 UTC) #22

adg

Hello rsc, r (cc: golang-dev@googlegroups.com), Please take another look.

14 years, 9 months ago (2010-09-29 04:22:56 UTC) #23

rsc1

Pretty close. reader_test.go:145: (Rietveld is having issues) Tighten these error messages. Think about whether they'll ...

14 years, 9 months ago (2010-09-29 14:01:08 UTC) #24

adg

Thanks http://codereview.appspot.com/2125042/diff/69001/src/pkg/archive/zip/reader.go File src/pkg/archive/zip/reader.go (right): http://codereview.appspot.com/2125042/diff/69001/src/pkg/archive/zip/reader.go#newcode213 src/pkg/archive/zip/reader.go:213: b = make([]byte, bLen) On 2010/09/28 14:06:36, rsc1 ...

14 years, 9 months ago (2010-09-30 00:51:48 UTC) #25

rsc1

LGTM add archive/zip to pkg/Makefile hg file 2125042 pkg/Makefile

14 years, 9 months ago (2010-09-30 01:41:10 UTC) #26

adg

14 years, 9 months ago (2010-09-30 01:59:58 UTC) #27

*** Submitted as http://code.google.com/p/go/source/detail?r=d6a4eb7fee9d ***

archive/zip: new package for reading ZIP files

R=rsc
CC=golang-dev
http://codereview.appspot.com/2125042

Committer: Andrew Gerrand <adg@golang.org>

Expand All Messages | Collapse All Messages