Issue 2510041: code review 2510041: first draft for a Lempel-Ziv-Welch compression package.

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+673 lines, -0 lines)			Patch
M	src/pkg/Makefile	View	1 2 3 4 5 6	1 chunk	+1 line, -0 lines	0 comments	Download
A	src/pkg/compress/lzw/Makefile	View	1 2 3 4 5 6	1 chunk	+12 lines, -0 lines	1 comment	Download
A	src/pkg/compress/lzw/lzw_test.go	View	1 2 3 4 5 6	1 chunk	+134 lines, -0 lines	2 comments	Download
A	src/pkg/compress/lzw/reader.go	View	1 2 3 4 5 6	1 chunk	+260 lines, -0 lines	5 comments	Download
A	src/pkg/compress/lzw/testdata/e.txt	View	1 2 3 4 5 6 7	1 chunk	+1 line, -0 lines	0 comments	Download
A	src/pkg/compress/lzw/testdata/pi.txt	View	1 2 3 4 5 6 7	1 chunk	+1 line, -0 lines	0 comments	Download
A	src/pkg/compress/lzw/writer.go	View	1 2 3 4 5 6	1 chunk	+264 lines, -0 lines	0 comments	Download

Messages

Total messages: 31

Expand All Messages | Collapse All Messages

mpl

Hello golang-dev@googlegroups.com (cc: golang-dev@googlegroups.com), I'd like you to review this change.

13 years, 7 months ago (2010-10-14 13:41:27 UTC) #1

mpl

I know there's room for improvement. in particular, it could do some buffering and the ...

13 years, 7 months ago (2010-10-14 13:46:57 UTC) #2

rsc

Do you know that there are multiple file formats that use the same encoding, to ...

13 years, 7 months ago (2010-10-14 14:07:56 UTC) #3

rsc

As a first step, you should run hg gofmt hg upload 2510041 There are also ...

13 years, 7 months ago (2010-10-22 13:48:55 UTC) #4

nigeltao_gnome

I agree with everything Russ said, and addressing his concerns would probably involve re-writing the ...

13 years, 6 months ago (2010-10-23 00:41:26 UTC) #5

mpl

Thanks for the comments, I'm on it. I've got MSB packing order working as well ...

13 years, 6 months ago (2010-10-23 16:13:46 UTC) #6

Thanks for the comments, I'm on it.
I've got MSB packing order working as well now and I'm using an exposed setter
function to let the user set the packing order. Should the variable representing
that be a global var or rather a member of the reader/writer struct? 

Thanks,
Mathieu

On 2010/10/23 00:41:26, nigeltao_gnome wrote:
> I agree with everything Russ said, and addressing his concerns would
> probably involve re-writing the code that the comments below apply to,
> but modulo that, here are a few more style issues, in no particular
> order:
> 
> A method's receiver name is usually just one letter: w instead of
> lzww. Similarly, if you implement io.Reader and io.Writer, the []byte
> argument is typically called p and not out or data.
> 
> Delete the commented-out prints. They're fine for your own debugging
> but they're not for submission or review.
> 
> The package doc comment is erroneously duplicated in all four .go
> files. Also, there should be no blank line between it and the package
> line. Run "godoc compress/lzw" to check it.
> 
> Consecutive var statements can be put in a block, so:
> var i int
> var s string
> var b bool
> can be written as
> var (
>     i int
>     s string
>     b bool
> )
> and the result usually looks less cluttered, especially if the
> variables are related.
> 
> A cleaner way to write
> var words []byte = make([]byte, 2)
> is just
> words := make([]byte, 2)
> or if the size is constant, just
> var words [2]byte
> 
> I'd drop the "= 0" out of "var shift uint = 0". Variables are
> zero-initialized by default.
> 
> Both dictSizeIni and endSignal should be constants, not variables.
> 
> Imports should be sorted alphabetically.
> 
> "for ;; { ... }" can be just "for { ... }", but since you're really
> iterating over a channel use "for word := range lzwr.c { ... }" and
> close the channel rather than sending the endSignal sentinal.
> 
> 
> Finally, if you're adding a new package, your change should also
> modify src/pkg/Makefile.

nigeltao_gnome

On 24 October 2010 03:13, <mathieu.lonjaret@gmail.com> wrote: > I've got MSB packing order working as ...

13 years, 6 months ago (2010-10-24 07:24:01 UTC) #7

mpl

Hello nigeltao_gnome, rsc (cc: golang-dev@googlegroups.com), Please take another look.

13 years, 6 months ago (2010-11-15 21:50:29 UTC) #9

mpl

this is a new version of the reader which, I think addresses all of your ...

13 years, 6 months ago (2010-11-15 21:55:24 UTC) #10

mpl

Hello nigeltao_gnome, rsc (cc: golang-dev@googlegroups.com), Please take another look.

13 years, 5 months ago (2010-11-22 23:32:01 UTC) #11

nigeltao

I have many concerns about the basic efficiency of your implementation and general programming style ...

13 years, 5 months ago (2010-11-23 07:14:32 UTC) #12

mpl

Thanks for the comments, I'll work on them. There's one point I'm not sure about ...

13 years, 5 months ago (2010-11-23 10:49:37 UTC) #13

rsc

Regarding Nigel's comment about duplication, please pick one byte order - the most common one, ...

13 years, 5 months ago (2010-11-29 16:04:20 UTC) #14

mpl

I remember you already mentionned such a technique, so I thought about it for a ...

13 years, 5 months ago (2010-11-29 16:19:13 UTC) #15

rsc

On Mon, Nov 29, 2010 at 11:19, <mathieu.lonjaret@gmail.com> wrote: > I remember you already mentionned ...

13 years, 5 months ago (2010-11-29 17:04:36 UTC) #16

mpl

On Mon, Nov 29, 2010 at 6:04 PM, Russ Cox <rsc@golang.org> wrote: > The table ...

13 years, 5 months ago (2010-11-30 09:52:47 UTC) #17

rsc

Ah. You are saying that the MSB/LSB setting affects both the order in which bits ...

13 years, 5 months ago (2010-11-30 13:58:59 UTC) #18

mpl

Hello, Another question: when the writer is created with a wordsize (and hence a max ...

13 years, 4 months ago (2011-01-10 11:07:20 UTC) #19

rsc

> when the writer is created with a wordsize (and hence a max dictionary > ...

13 years, 4 months ago (2011-01-11 16:48:24 UTC) #20

mpl

On 2011/01/11 16:48:24, rsc wrote: > > when the writer is created with a wordsize ...

13 years, 4 months ago (2011-01-11 17:50:05 UTC) #21

mpl

after some more reading, it seems like the original LZW used 12 bits words from ...

13 years, 4 months ago (2011-01-20 09:36:13 UTC) #23

nigeltao_gnome

On 20 January 2011 20:36, <mathieu.lonjaret@gmail.com> wrote: > after some more reading, it seems like ...

13 years, 3 months ago (2011-01-20 22:26:46 UTC) #24

mpl

Hello nigeltao_gnome, nigeltao (cc: golang-dev@googlegroups.com), Please take another look.

13 years, 3 months ago (2011-02-10 10:46:44 UTC) #25

mpl

On 2010/11/23 07:14:32, nigeltao wrote: > http://codereview.appspot.com/2510041/diff/32001/src/pkg/compress/lzw/reader.go > File src/pkg/compress/lzw/reader.go (right): > > http://codereview.appspot.com/2510041/diff/32001/src/pkg/compress/lzw/reader.go#newcode16 > ...

13 years, 3 months ago (2011-02-10 12:21:43 UTC) #26

On 2010/11/23 07:14:32, nigeltao wrote:

>
http://codereview.appspot.com/2510041/diff/32001/src/pkg/compress/lzw/reader.go
> File src/pkg/compress/lzw/reader.go (right):
> 
>
http://codereview.appspot.com/2510041/diff/32001/src/pkg/compress/lzw/reader....
> src/pkg/compress/lzw/reader.go:16: const dictSizeIni uint16 = 256
> Drop the uint16. Constants are typically ideal numbers.
> 
> Similarly for writer.go.

done.
 
>http://codereview.appspot.com/2510041/diff/32001/src/pkg/compress/lzw/reader.go#newcode22
> src/pkg/compress/lzw/reader.go:22: err      os.Error
> r.err is never re-used between different methods. Why is it a field and not a
> local variable?

done.

>http://codereview.appspot.com/2510041/diff/32001/src/pkg/compress/lzw/reader.go#newcode23
> src/pkg/compress/lzw/reader.go:23: dict     map[uint16]string
> If dict is a dense map with entries from 0 up to some n, then a map is
overkill
> and inefficient. Just use a []string and append.

done.

>
http://codereview.appspot.com/2510041/diff/32001/src/pkg/compress/lzw/reader....
> src/pkg/compress/lzw/reader.go:30: func NewReader(r io.Reader, ws uint8, order
> string) (io.ReadCloser, os.Error) {
> All public functions need comments. For example, what does ws and order mean?
> What are their valid values?

done.

>
http://codereview.appspot.com/2510041/diff/32001/src/pkg/compress/lzw/reader....
> src/pkg/compress/lzw/reader.go:31: lzwr := new(reader)
> Rather than assigning each field in a separate statement, do
> lzwr := &reader{
>   r: r,
>   wordsize: ws,
>   // etcetera.
> }

done.
 
>
http://codereview.appspot.com/2510041/diff/32001/src/pkg/compress/lzw/reader....
> src/pkg/compress/lzw/reader.go:58: func (r *reader) Read(p []byte) (n int, err
> os.Error) {
> Rather than reader implementing ReadCloser, it would be simpler if NewReader
> just returned the read end of the pipe, exactly the same as what
compress/flate
> does. You don't need to use csync to signal the write end that the read end is
> ready. The io.Pipe already does that. The writer goroutine should just write
to
> the pipe, and it will block until the reader goroutine is ready.

I tried going that way, but I needed to implement an explicit Read/Write to do
the "breaking" of large inputs into reasonably sized chunks (but maybe there's a
better way?). So NewReader/NewWriter still returns a *reader/*writer.

>
http://codereview.appspot.com/2510041/diff/32001/src/pkg/compress/lzw/reader....
> src/pkg/compress/lzw/reader.go:75: input      []byte = make([]byte, 1)
> Reading from a Reader one byte at a time is terribly inefficient. Use a
> bufio.Reader and ReadByte.

done.

>
http://codereview.appspot.com/2510041/diff/32001/src/pkg/compress/lzw/reader....
> src/pkg/compress/lzw/reader.go:122: func (r *reader) readBitsMSB() {
> There is a lot of copy-and-paste between readBitsMSB and readBitsLSB. I think
> there's a lot of opportunity to refactor out some duplicated code.

done. merging the two implied adding a lot of branches (switch) so it's probably
(a bit?) slower than what I had before, but it is indeed more readable now.

>
http://codereview.appspot.com/2510041/diff/32001/src/pkg/compress/lzw/reader....
> src/pkg/compress/lzw/reader.go:191: b := make([]byte, toRead)
> Allocating a new buffer each time is needless garbage. Just allocate one
buffer
> (possibly a bytes.Buffer) and re-use it.

done.
>
http://codereview.appspot.com/2510041/diff/32001/src/pkg/compress/lzw/reader....
> src/pkg/compress/lzw/reader.go:245: entry = prev + temp[0:1]
> Growing a string one character at a time is O(N^2), which seems needlessly
> expensive. You should really be using []byte instead of string, since each
> conversion between one and the other involves an allocation, a copy, and a
> garbage cost. In fact, I don't think you should be using strings at all.

done. I'm now using []byte whenever possible. Please let me know if it can be
improved further regarding speed and allocations.
 
>
http://codereview.appspot.com/2510041/diff/32001/src/pkg/compress/lzw/reader_...
> File src/pkg/compress/lzw/reader_test.go (right):
> 
>
http://codereview.appspot.com/2510041/diff/32001/src/pkg/compress/lzw/reader_...
> src/pkg/compress/lzw/reader_test.go:13: func TestDecompressorLSB_11_whole(t
> *testing.T) {
> Rather than having a separate TestFooBar function for each different case,
with
> a lot of copy-and-pasted code, I would prefer a data-driven test with one
> TestReader function that ranged over a test suite. Look at the tests in
> compress/gzip and compress/zlib for what I mean.
> 
>
http://codereview.appspot.com/2510041/diff/32001/src/pkg/compress/lzw/reader_...
>
http://codereview.appspot.com/2510041/diff/32001/src/pkg/compress/lzw/writer_...
> File src/pkg/compress/lzw/writer_test.go (right):
> 
>
http://codereview.appspot.com/2510041/diff/32001/src/pkg/compress/lzw/writer_...
> src/pkg/compress/lzw/writer_test.go:12: func TestCompressorLSB_11_whole(t
> *testing.T) {
> IIUC there can be more than one correct encoding of any particular string.
> Rather than testing that this implementation returns a particular golden
value,
> I'd rather test that wrapping a lzw.Writer and lzw.Reader around the two ends
of
> an io.Pipe is a no-op.

done. and I also mimicked a lot of the behavior of the other compression
packages tests as you advised.

mpl

Hello nigeltao_gnome, nigeltao (cc: golang-dev@googlegroups.com), Please take another look.

13 years, 3 months ago (2011-02-10 16:16:52 UTC) #27

nigeltao

http://codereview.appspot.com/2510041/diff/57001/src/pkg/compress/lzw/Makefile File src/pkg/compress/lzw/Makefile (right): http://codereview.appspot.com/2510041/diff/57001/src/pkg/compress/lzw/Makefile#newcode1 src/pkg/compress/lzw/Makefile:1: # Copyright 2010 The Go Authors. All rights reserved. ...

13 years, 3 months ago (2011-02-15 07:57:09 UTC) #28

nigeltao_gnome

After a bit of reading, it appears that both GIF and PDF use a maximum ...

13 years, 3 months ago (2011-02-15 11:56:01 UTC) #29

mpl

On 2011/02/15 11:56:01, nigeltao_gnome wrote: > After a bit of reading, it appears that both ...

13 years, 3 months ago (2011-02-15 13:45:44 UTC) #30

bsiegert

13 years, 3 months ago (2011-02-16 14:27:58 UTC) #31

On Tue, Feb 15, 2011 at 14:45,  <mathieu.lonjaret@gmail.com> wrote:
> On 2011/02/15 11:56:01, nigeltao_gnome wrote:
> This is especially true in that case since I don't see lzw being used
> for many things other than pdf and gif. (although I saw at least one
> audio format I think, I'll try to dig it back).

tiff! I am writing a tiff decoder, and I am waiting for the LZW
package to be added so that we will be able to read LZW-compressed
tiff images.

> http://codereview.appspot.com/2510041/

--Benny.

-- 
The first essential in chemistry is that you should perform practical
work and conduct experiments, for he who performs not practical work
nor makes experiments will never attain the least degree of mastery.
        -- Abu Musa Jabir ibn Hayyan (721-815)

Expand All Messages | Collapse All Messages