Issue 36540043: utils: added Tailer for tailing of logs in API

rog

This is a good start, but I have a few comments and suggestions. I wonder ...

11 years, 8 months ago (2013-12-05 16:20:37 UTC) #3

This is a good start, but I have a few comments and suggestions.

I wonder if it might sit well inside its own package rather being added to the
grab-bag of stuff in utils.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go
File utils/tailer.go (right):

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode30
utils/tailer.go:30: func StartFileTailer(filename string, lines int, filter
TailerFilterFunc,
I think this function is unnecessary. It's trivial for other code to open a
file.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode40
utils/tailer.go:40: func StartTailer(readSeeker io.ReadSeeker, lines int, filter
TailerFilterFunc,
Please document this function properly.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode87
utils/tailer.go:87: println("> error:", err.Error())
d

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode98
utils/tailer.go:98: if buffer[i] == '\n' {
This isn't strictly accurate (what happens when we have an unterminated line at
the end of the file?) but it's probably ok if we never filter unterminated
lines. A comment about why it's ok might prevent future puzzling over the
correctness of the code.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode100
utils/tailer.go:100: if foundNewlines-1 == t.lines {
s/==/>=/

defensively.

Also, if we move the if statement before the increment,
then you can lose the "-1".

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode127
utils/tailer.go:127: line, err := reader.ReadString('\n')
I think I'd be tempted to use ReadSlice here (changing the type of filter to
func([]byte)bool.
That way the tailer won't need to generate any garbage at all AFAICS.

Unfortunately that limits the line size to the size of the bufio buffer, which
may be acceptable. An alternative is to use ReadLine and allocate only if the
line is too long.

Here's some code I wrote a while ago to do that, in case it might be useful:

// readLine reads a line from r.
// The returned byte slice is only valid until the
// next read call on r.
func readLine(r *bufio.Reader) ([]byte, error) {
	line, isPrefix, err := r.ReadLine()
	if !isPrefix {
		return line, err
	}
	buf := append([]byte(nil), line...)
	for isPrefix && err == nil {
		line, isPrefix, err = r.ReadLine()
		buf = append(buf, line...)
	}
	return buf, err
}

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode147
utils/tailer.go:147: for {
This code is exactly the same as the code above.
Can't we do with just this? (if we just delete the
loop above and use NewTimer(0), I think it might
just work).

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode149
utils/tailer.go:149: if len(line) > 0 {
I think we should ignore the line at this point if it is unterminated, something
that's quite possible if the file is being written in arbitrary chunks. If we
find an unterminated line, we should seek back to the start of the line before
we start reading again.

That means we could pass the line to filter without the trailing \n, meaning we
can be more resilient if something starts producing lines ending in \r\n.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode151
utils/tailer.go:151: writer.WriteString(line)
We should definitely check the error here and in Flush below - I don't think we
want to carry on trying to write to a dead network connection or full
filesystem.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode162
utils/tailer.go:162: t.readSeeker.Seek(0, os.SEEK_END)
This is wrong, I think - we could miss data if some has been
written in between seeing the EOF and this seek.
I think the only time we want to seek is if we get an unterminated line.

That said, there's another tricky case - what happens if the file gets
truncated? I wonder if that should be done with separate logic to detect whether
the file has suddenly reduced in size. We'll need separate logic anyway to
decide when to reopen the file when log files roll over, so that might work out
ok.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer_test.go
File utils/tailer_test.go (right):

https://codereview.appspot.com/36540043/diff/20001/utils/tailer_test.go#newco...
utils/tailer_test.go:25: buffer := bytes.NewBuffer([]byte{})
var buffer bytes.Buffer

is simpler and does almost the same thing.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer_test.go#newco...
utils/tailer_test.go:124: func assertCollected(c *gc.C, buffer *bytes.Buffer,
collected []string, addon func([]string), timeout bool) {
Please could we have a comment describing what this function is doing, please?

https://codereview.appspot.com/36540043/diff/20001/utils/tailer_test.go#newco...
utils/tailer_test.go:162: var rs *readSeeker = new(readSeeker)
var rs readSeeker

https://codereview.appspot.com/36540043/diff/20001/utils/tailer_test.go#newco...
utils/tailer_test.go:206: if r.pos == len(r.buffer) {
I think we can usefully use the result of copy here:

if r.pos >= len(r.buffer) {
   return 0, io.EOF
}

n := copy(p, r.buffer[r.pos:])
r.pos += n
return n, nil

https://codereview.appspot.com/36540043/diff/20001/utils/tailer_test.go#newco...
utils/tailer_test.go:244: func signal(c *gc.C, sigc chan struct{}) {
Given that the signal channel is buffered, can't we just send on the channel?

Sign in to reply to this message.

mue

Please take a look. https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go File utils/tailer.go (right): https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode30 utils/tailer.go:30: func StartFileTailer(filename string, lines int, ...

11 years, 7 months ago (2013-12-10 12:56:32 UTC) #4

Please take a look.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go
File utils/tailer.go (right):

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode30
utils/tailer.go:30: func StartFileTailer(filename string, lines int, filter
TailerFilterFunc,
On 2013/12/05 16:20:37, rog wrote:
> I think this function is unnecessary. It's trivial for other code to open a
> file.

Done.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode40
utils/tailer.go:40: func StartTailer(readSeeker io.ReadSeeker, lines int, filter
TailerFilterFunc,
On 2013/12/05 16:20:37, rog wrote:
> Please document this function properly.

Done.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode87
utils/tailer.go:87: println("> error:", err.Error())
On 2013/12/05 16:20:37, rog wrote:
> d

Done.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode98
utils/tailer.go:98: if buffer[i] == '\n' {
On 2013/12/05 16:20:37, rog wrote:
> This isn't strictly accurate (what happens when we have an unterminated line
at
> the end of the file?) but it's probably ok if we never filter unterminated
> lines. A comment about why it's ok might prevent future puzzling over the
> correctness of the code.

Added a comment at the type declaration.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode100
utils/tailer.go:100: if foundNewlines-1 == t.lines {
On 2013/12/05 16:20:37, rog wrote:
> s/==/>=/
> 
> defensively.
> 
> Also, if we move the if statement before the increment,
> then you can lose the "-1".

Done.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode127
utils/tailer.go:127: line, err := reader.ReadString('\n')
On 2013/12/05 16:20:37, rog wrote:
> I think I'd be tempted to use ReadSlice here (changing the type of filter to
> func([]byte)bool.
> That way the tailer won't need to generate any garbage at all AFAICS.
> 
> Unfortunately that limits the line size to the size of the bufio buffer, which
> may be acceptable. An alternative is to use ReadLine and allocate only if the
> line is too long.
> 
> Here's some code I wrote a while ago to do that, in case it might be useful:
> 
> // readLine reads a line from r.
> // The returned byte slice is only valid until the
> // next read call on r.
> func readLine(r *bufio.Reader) ([]byte, error) {
> 	line, isPrefix, err := r.ReadLine()
> 	if !isPrefix {
> 		return line, err
> 	}
> 	buf := append([]byte(nil), line...)
> 	for isPrefix && err == nil {
> 		line, isPrefix, err = r.ReadLine()
> 		buf = append(buf, line...)
> 	}
> 	return buf, err
> }

Good hint, using it.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode147
utils/tailer.go:147: for {
On 2013/12/05 16:20:37, rog wrote:
> This code is exactly the same as the code above.
> Can't we do with just this? (if we just delete the
> loop above and use NewTimer(0), I think it might
> just work).

Yep, the split has been due to the initial approach of scanning all up to the
end first. Removed. Thanks.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode149
utils/tailer.go:149: if len(line) > 0 {
On 2013/12/05 16:20:37, rog wrote:
> I think we should ignore the line at this point if it is unterminated,
something
> that's quite possible if the file is being written in arbitrary chunks. If we
> find an unterminated line, we should seek back to the start of the line before
> we start reading again.
> 
> That means we could pass the line to filter without the trailing \n, meaning
we
> can be more resilient if something starts producing lines ending in \r\n.

Found a solution for unterminated lines, but currently ignoring possible \r\n
and passing the termination to the filter too. For the work with other
delimiters than \n more changes are needed.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode151
utils/tailer.go:151: writer.WriteString(line)
On 2013/12/05 16:20:37, rog wrote:
> We should definitely check the error here and in Flush below - I don't think
we
> want to carry on trying to write to a dead network connection or full
> filesystem.

Done.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer.go#newcode162
utils/tailer.go:162: t.readSeeker.Seek(0, os.SEEK_END)
On 2013/12/05 16:20:37, rog wrote:
> This is wrong, I think - we could miss data if some has been
> written in between seeing the EOF and this seek.
> I think the only time we want to seek is if we get an unterminated line.
> 
> That said, there's another tricky case - what happens if the file gets
> truncated? I wonder if that should be done with separate logic to detect
whether
> the file has suddenly reduced in size. We'll need separate logic anyway to
> decide when to reopen the file when log files roll over, so that might work
out
> ok.

Removed the seek and keeping the truncation in mind.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer_test.go
File utils/tailer_test.go (right):

https://codereview.appspot.com/36540043/diff/20001/utils/tailer_test.go#newco...
utils/tailer_test.go:25: buffer := bytes.NewBuffer([]byte{})
On 2013/12/05 16:20:37, rog wrote:
> var buffer bytes.Buffer
> 
> is simpler and does almost the same thing.

Almost, but I need the pointer (for io.Writer in NewTailer() and in the
assertion).

https://codereview.appspot.com/36540043/diff/20001/utils/tailer_test.go#newco...
utils/tailer_test.go:124: func assertCollected(c *gc.C, buffer *bytes.Buffer,
collected []string, addon func([]string), timeout bool) {
On 2013/12/05 16:20:37, rog wrote:
> Please could we have a comment describing what this function is doing, please?

Done.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer_test.go#newco...
utils/tailer_test.go:162: var rs *readSeeker = new(readSeeker)
On 2013/12/05 16:20:37, rog wrote:
> var rs readSeeker

Done.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer_test.go#newco...
utils/tailer_test.go:206: if r.pos == len(r.buffer) {
On 2013/12/05 16:20:37, rog wrote:
> I think we can usefully use the result of copy here:
> 
> if r.pos >= len(r.buffer) {
>    return 0, io.EOF
> }
> 
> n := copy(p, r.buffer[r.pos:])
> r.pos += n
> return n, nil

Awesome, totally forgot the return value of copy. *sigh* Have to get better
regarding slice operations.

https://codereview.appspot.com/36540043/diff/20001/utils/tailer_test.go#newco...
utils/tailer_test.go:244: func signal(c *gc.C, sigc chan struct{}) {
On 2013/12/05 16:20:37, rog wrote:
> Given that the signal channel is buffered, can't we just send on the channel?

Oh, yes, that's left from an older approach.

Sign in to reply to this message.

rog

Getting there! A few more thoughts and suggestions below. https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go File utils/tailer/tailer.go (right): https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#newcode41 utils/tailer/tailer.go:41: ...

11 years, 7 months ago (2013-12-10 15:14:24 UTC) #5

Getting there! A few more thoughts and suggestions below.

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go
File utils/tailer/tailer.go (right):

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:41: // Writer. The reading beginns the specified number
of matching lines
s/beginns/begins/

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:43: func NewStandardTailer(readSeeker io.ReadSeeker,
writer io.Writer, lines int, filter TailerFilterFunc) *Tailer {
Given that this function is going to be the one that everyone calls, I think I'd
name it NewTailer, but do we actually need to expose buffer size and poll
interval publicly except to the tailer tests?

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:126: readBuffer := make([]byte, t.bufsize)
This seems rather complex to me - I don't fully trust myself to vet the logic.
Also, I don't think we need to have a separate line buffer and read buffer - we
can use a single buffer and read directly into it. That way we can avoid
allocating a new buffer every time round the loop too, although it is necessary
to copy something from the start to the end.

I started trying to explain a better possiblity, but got carried away...
Here's the kind of thing I mean. Still somewhat complex, but I've tried to keep
the invariants simple. It needs testing (I've left in
one deliberate mistake that I hope to see a test for). I suggest
some internal tests that test this function in isolation with
quite a few different inputs.

// seekLastLines sets the read position of the ReadSeeker to the
// wanted number of filtered lines before the end.
func (t *Tailer) seekLastLines() error {
	offset, err := t.readSeeker.Seek(0, os.SEEK_END)
	if err != nil {
		return err
	}
	seekPos := int64(0)
	found := 0
	buf := make([]byte, minRead)
SeekLoop:
	for offset > 0 {
		// buf contains the data left over from the
		// previous iteration.
		space := cap(buf) - len(buf)
		if space < minRead {
			// grow buffer
			newBuf := make([]byte, len(buf), cap(buf)*2)
			copy(newBuf, buf)
			buf = newBuf
			space = cap(buf) - len(buf)

		}
		if int64(space) > offset {
			// Use exactly the right amount of space if there's
			// only a small amount remaining.
			space = int(offset)
		}
		// copy data remaining from last time to the end of the buffer,
		// so we can read into the right place.
		copy(buf[space:cap(buf)], buf)
		buf = buf[0 : len(buf)+space]

		offset -= int64(space)
		_, err := t.readSeeker.Seek(offset, os.SEEK_SET)
		if err != nil {
			return err
		}
		_, err = io.ReadFull(t.readSeeker, buf[0:space])
		if err != nil {
			return err
		}
		// Find the end of the last line in the buffer.
		// This will discard any unterminated line at the end
		// of the file.
		end := bytes.LastIndex(buf, delim)
		if end == -1 {
			// No end of line found - discard incomplete
			// line and continue looking. If this happens
			// at the beginning of the file, we don't care
			// because we're going to stop anyway.
			buf = buf[:0]
			continue
		}
		end++
		for {
			start := bytes.LastIndex(buf[0:end-1], delim)
			if start == -1 {
				break
			}
			start++
			if t.isValid(buf[start:end]) {
				found++
				if found >= t.lines {
					seekPos = offset + int64(start)
					break SeekLoop
				}
			}
			end = start - 1
		}
		// Leave the last line in buf, as we don't know whether
		// it's complete or not.
		buf = buf[0:end]
	}
	// Final positioning.
	t.readSeeker.Seek(seekPos, os.SEEK_SET)
	return nil
}

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:198: // Reached beginnig of data.
s/beginnig/beginning/

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:212: buffer, err := t.reader.ReadBytes(delimiter)
ReadBytes can't return ErrBufferFull.

I think you want to use ReadSlice.
Also, if the first call succeeds, we should just return the slice that we got
from ReadSlice - that way in the usual case that the line is shorter than the
bufio buffer we can avoid any allocation.

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:226: if len(line) == 0 || line[len(line)-1] == delimiter
{
This can't happen. (from the docs: "returns err != nil if and only if the
returned data does not end in delim")

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:234: t.readSeeker.Seek(offset, os.SEEK_END)
This is racy. We can't seek from the end because the end might be constantly
changing. I think we need to keep track of where we're reading in the file and
seek back to the absolute offset of the start of the partial line.

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:247: return false
Why?

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:252: return t.filter(line)
Consider trimming \r?\n from the end of the line before calling filter?

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer_test.go
File utils/tailer/tailer_test.go (right):

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:4: package tailer_test
I'd like to see some unit tests of seekLastLines in here. There's a lot of logic
in there that is untested here (with respect to long lines, line boundaries,
read errors etc)

Sign in to reply to this message.

mue

Please take a look. https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go File utils/tailer/tailer.go (right): https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#newcode41 utils/tailer/tailer.go:41: // Writer. The reading beginns ...

11 years, 7 months ago (2013-12-11 17:13:05 UTC) #6

Please take a look.

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go
File utils/tailer/tailer.go (right):

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:41: // Writer. The reading beginns the specified number
of matching lines
On 2013/12/10 15:14:25, rog wrote:
> s/beginns/begins/

Done.

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:43: func NewStandardTailer(readSeeker io.ReadSeeker,
writer io.Writer, lines int, filter TailerFilterFunc) *Tailer {
On 2013/12/10 15:14:25, rog wrote:
> Given that this function is going to be the one that everyone calls, I think
I'd
> name it NewTailer, but do we actually need to expose buffer size and poll
> interval publicly except to the tailer tests?

Done.

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:126: readBuffer := make([]byte, t.bufsize)
On 2013/12/10 15:14:25, rog wrote:
> This seems rather complex to me - I don't fully trust myself to vet the logic.

Used it, even if I have the same troubles following your array logic and the
needed debugging. Thankfully the comments helped to follow the idea. And at
least it has less lines. ;) Thx.

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:198: // Reached beginnig of data.
On 2013/12/10 15:14:25, rog wrote:
> s/beginnig/beginning/

Not needed anymore due to new function.

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:212: buffer, err := t.reader.ReadBytes(delimiter)
On 2013/12/10 15:14:25, rog wrote:
> ReadBytes can't return ErrBufferFull.
> 
> I think you want to use ReadSlice.
> Also, if the first call succeeds, we should just return the slice that we got
> from ReadSlice - that way in the usual case that the line is shorter than the
> bufio buffer we can avoid any allocation.

Done.

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:226: if len(line) == 0 || line[len(line)-1] == delimiter
{
On 2013/12/10 15:14:25, rog wrote:
> This can't happen. (from the docs: "returns err != nil if and only if the
> returned data does not end in delim")

Done.

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:234: t.readSeeker.Seek(offset, os.SEEK_END)
On 2013/12/10 15:14:25, rog wrote:
> This is racy. We can't seek from the end because the end might be constantly
> changing. I think we need to keep track of where we're reading in the file and
> seek back to the absolute offset of the start of the partial line.

Done.

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:247: return false
On 2013/12/10 15:14:25, rog wrote:
> Why?

Yeah, too much automatic filtering here. The receiver of the data should decide.
Removed it.

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:252: return t.filter(line)
On 2013/12/10 15:14:25, rog wrote:
> Consider trimming \r?\n from the end of the line before calling filter?

Why?

Sign in to reply to this message.

rog

Another round of suggestions, with one or two things still to fix. Thanks for bearing ...

11 years, 7 months ago (2013-12-11 18:27:41 UTC) #7

Another round of suggestions, with one or two things still to fix. Thanks for
bearing with me!

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go
File utils/tailer/tailer.go (right):

https://codereview.appspot.com/36540043/diff/40001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:252: return t.filter(line)
On 2013/12/11 17:13:05, mue wrote:
> On 2013/12/10 15:14:25, rog wrote:
> > Consider trimming \r?\n from the end of the line before calling filter?
> 
> Why?

Then the filter functions don't need to worry about line termination characters
at all and we know that we're safe even when we have windows stuff generating
\r\n lines.

https://codereview.appspot.com/36540043/diff/60001/utils/tailer/tailer.go
File utils/tailer/tailer.go (right):

https://codereview.appspot.com/36540043/diff/60001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:131: buffer := make([]byte, t.bufsize)
If we've got t.bufsize, "buf" would seem logical as a name.
Not that it matters much though.

https://codereview.appspot.com/36540043/diff/60001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:211: line = append(line, slice...)
Unfortunately I think this is wrong, as we may be appending to the internal
bufio.Reader buffer, which may be overwritten by the ReadSlice call.

How about separating the first-slice case from the appending logic?

Something like this, perhaps?

// readLine reads the next valid line from the reader, even if it is
// larger than the reader buffer.
func (t *Tailer) readLine() ([]byte, error) {
	for {
		slice, err := t.reader.ReadSlice(delimiter)
		if err == nil {
			if t.isValid(slice) {
				return slice, nil
			}
			continue
		}
		line := append([]byte(nil), slice...)
		for err == bufio.ErrBufferFull {
			slice, err = t.reader.ReadSlice(delimiter)
			line = append(line, slice...)
		}
		switch err {
		case nil:
			if t.isValid(line) {
				return line, nil
			}
		case io.EOF:
			// EOF without delimiter, step back.
			t.readSeeker.Seek(-int64(len(line)), os.SEEK_CUR)
			return nil, err
		default:
			return nil, err
		}
	}
}

https://codereview.appspot.com/36540043/diff/60001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:224: t.readSeeker.Seek(-offset, os.SEEK_CUR)
Ah, that's nicer than my suggestion, thanks.

https://codereview.appspot.com/36540043/diff/60001/utils/tailer/tailer_test.go
File utils/tailer/tailer_test.go (right):

https://codereview.appspot.com/36540043/diff/60001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:30: buffer := bytes.NewBuffer(nil)
I'm afraid the way you're using this is racy (and in all the other tests too)
because you're reading from the buffer (with assertCollected) concurrently with
the tailer writing to it.

You could use io.Pipe instead of bytes.Buffer to avoid the problem.

BTW whenever you've got goroutine-based code, it's worth running go test -race
to check this kind of stuff.

https://codereview.appspot.com/36540043/diff/60001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:47: func (tailerSuite) TestLaggedTermination(c
*gc.C) {
I see 10 of these tests are almost identical.
This suggests to me that a table-based test might work well here, and make it
easier to see what's actually being tested.

Then we could easily add quite a few more tests (for example, I'd like to see
tests with blank lines in various places, and probably some more too).

https://codereview.appspot.com/36540043/diff/60001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:264: disturber := func(lines []string) {
This should probably be defined just before it's used.

https://codereview.appspot.com/36540043/diff/60001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:331: sigc <- struct{}{}
What's the point of this?
We know the data has been written when this function returns.

Sign in to reply to this message.

mue

Please take a look. https://codereview.appspot.com/36540043/diff/60001/utils/tailer/tailer.go File utils/tailer/tailer.go (right): https://codereview.appspot.com/36540043/diff/60001/utils/tailer/tailer.go#newcode131 utils/tailer/tailer.go:131: buffer := make([]byte, t.bufsize) On ...

11 years, 7 months ago (2013-12-12 14:15:40 UTC) #8

Please take a look.

https://codereview.appspot.com/36540043/diff/60001/utils/tailer/tailer.go
File utils/tailer/tailer.go (right):

https://codereview.appspot.com/36540043/diff/60001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:131: buffer := make([]byte, t.bufsize)
On 2013/12/11 18:27:41, rog wrote:
> If we've got t.bufsize, "buf" would seem logical as a name.
> Not that it matters much though.

Ack, but in the other direction. ;)

https://codereview.appspot.com/36540043/diff/60001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:211: line = append(line, slice...)
On 2013/12/11 18:27:41, rog wrote:
> Unfortunately I think this is wrong, as we may be appending to the internal
> bufio.Reader buffer, which may be overwritten by the ReadSlice call.
> 
> How about separating the first-slice case from the appending logic?
> 
> Something like this, perhaps?
> 
> // readLine reads the next valid line from the reader, even if it is
> // larger than the reader buffer.
> func (t *Tailer) readLine() ([]byte, error) {
> 	for {
> 		slice, err := t.reader.ReadSlice(delimiter)
> 		if err == nil {
> 			if t.isValid(slice) {
> 				return slice, nil
> 			}
> 			continue
> 		}
> 		line := append([]byte(nil), slice...)
> 		for err == bufio.ErrBufferFull {
> 			slice, err = t.reader.ReadSlice(delimiter)
> 			line = append(line, slice...)
> 		}
> 		switch err {
> 		case nil:
> 			if t.isValid(line) {
> 				return line, nil
> 			}
> 		case io.EOF:
> 			// EOF without delimiter, step back.
> 			t.readSeeker.Seek(-int64(len(line)), os.SEEK_CUR)
> 			return nil, err
> 		default:
> 			return nil, err
> 		}
> 	}
> }

Yeah, that's better, thx.

https://codereview.appspot.com/36540043/diff/60001/utils/tailer/tailer_test.go
File utils/tailer/tailer_test.go (right):

https://codereview.appspot.com/36540043/diff/60001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:30: buffer := bytes.NewBuffer(nil)
On 2013/12/11 18:27:41, rog wrote:
> I'm afraid the way you're using this is racy (and in all the other tests too)
> because you're reading from the buffer (with assertCollected) concurrently
with
> the tailer writing to it.
> 
> You could use io.Pipe instead of bytes.Buffer to avoid the problem.
> 
> BTW whenever you've got goroutine-based code, it's worth running go test -race
> to check this kind of stuff.

Done.

https://codereview.appspot.com/36540043/diff/60001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:47: func (tailerSuite) TestLaggedTermination(c
*gc.C) {
On 2013/12/11 18:27:41, rog wrote:
> I see 10 of these tests are almost identical.
> This suggests to me that a table-based test might work well here, and make it
> easier to see what's actually being tested.
> 
> Then we could easily add quite a few more tests (for example, I'd like to see
> tests with blank lines in various places, and probably some more too).

Done.

https://codereview.appspot.com/36540043/diff/60001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:264: disturber := func(lines []string) {
On 2013/12/11 18:27:41, rog wrote:
> This should probably be defined just before it's used.

Now different approach with table driven tests.

https://codereview.appspot.com/36540043/diff/60001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:331: sigc <- struct{}{}
On 2013/12/11 18:27:41, rog wrote:
> What's the point of this?
> We know the data has been written when this function returns.

Ouch, yes.

Sign in to reply to this message.

rog

Another round. Hopefully done after this one! https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer.go File utils/tailer/tailer.go (right): https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer.go#newcode49 utils/tailer/tailer.go:49: func NewTailer(readSeeker ...

11 years, 7 months ago (2013-12-12 14:46:39 UTC) #9

Another round. Hopefully done after this one!

https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer.go
File utils/tailer/tailer.go (right):

https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:49: func NewTailer(readSeeker io.ReadSeeker, writeCloser
io.WriteCloser, lines int, filter TailerFilterFunc) *Tailer {
I don't think there's any particular reason we want to give this code the
responsibility for closing the writer. It's quite possible we might have several
tailers writing to the same writer, for example.

Better would be a Wait method (and perhaps a Dead method) to wait until
it's finished. Then the caller has the option of closing the writer.

https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer_test.go
File utils/tailer/tailer_test.go (right):

https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:40: {
If we move this brace onto the previous line, we can save a level of indentation
in all these tests.

https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:42: data:                  data[26:29],
I like the new tests much better now. One thought though:

I'm not convinced that referring to "data" in all these tests makes them more
readable - I have no idea what's in data[26:29] without counting or using grep
-n.

Something like the below shows me exactly what's going
on without me needing to refer to multiple places,
and is independent of any changes else in the code.

It would make it considerably easier for me to scan down
the tests to ensure that each one represents reasonable
behaviour.

{
    description: "lines are longer than buffer size",
    data: []string{
        "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz\n",
        "0123456789012345678901234567890123456789012345678901\n",
        "the quick brown fox ",
    },
    initialLinesWritten:   1,
    initialLinesRequested: 1,
    bufferSize:            5,
    initialCollectedData:  []string{
        "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz\n",
    },
    appendedCollectedData: []string{
        "0123456789012345678901234567890123456789012345678901\n",
    },
},

https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:215: line, err := buffer.ReadString('\n')
If the tailer doesn't produce enough data, we'll block here forever I think,
regardless of the timeout.

I think you probably want something more like the below:

func assertCollected(c *gc.C, reader io.Reader, compare []string, injection
func([]string)
     ) {
   lineChan := make(chan string)
   go func() {
       defer close(lineChan)
       reader := bufio.NewReader(reader)
       for {
           line, err := reader.ReadString('\n')
           if !c.Check(err, gc.IsNil) {
               return
           }
           lineChan <- line
       }
   }()
   timeout := time.After(testing.LongWait)
   for {
        select {
        case line, ok := <-lineChan:
             c.Assert(ok, jc.IsTrue)
             deal with line as currently
        case <-timeout: 
              timed out
        }
    }
}

https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:222: for i := 0; i < len(lines); i++ {
c.Assert(lines, gc.DeepEquals, compare) ?

https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:228: if err == io.EOF {
Once we've got EOF once on a pipe, I don't think we can get any more data, can
we?

Sign in to reply to this message.

mue

Please take a look. https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer.go File utils/tailer/tailer.go (right): https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer.go#newcode49 utils/tailer/tailer.go:49: func NewTailer(readSeeker io.ReadSeeker, writeCloser io.WriteCloser, ...

11 years, 7 months ago (2013-12-13 11:23:12 UTC) #10

Please take a look.

https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer.go
File utils/tailer/tailer.go (right):

https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer.go#new...
utils/tailer/tailer.go:49: func NewTailer(readSeeker io.ReadSeeker, writeCloser
io.WriteCloser, lines int, filter TailerFilterFunc) *Tailer {
On 2013/12/12 14:46:39, rog wrote:
> I don't think there's any particular reason we want to give this code the
> responsibility for closing the writer. It's quite possible we might have
several
> tailers writing to the same writer, for example.
> 
> Better would be a Wait method (and perhaps a Dead method) to wait until
> it's finished. Then the caller has the option of closing the writer.

Done as discussed.

https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer_test.go
File utils/tailer/tailer_test.go (right):

https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:40: {
On 2013/12/12 14:46:39, rog wrote:
> If we move this brace onto the previous line, we can save a level of
indentation
> in all these tests.

Done.

https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:42: data:                  data[26:29],
On 2013/12/12 14:46:39, rog wrote:
> I like the new tests much better now. One thought though:
> 
> I'm not convinced that referring to "data" in all these tests makes them more
> readable - I have no idea what's in data[26:29] without counting or using grep
> -n.
> 
> Something like the below shows me exactly what's going
> on without me needing to refer to multiple places,
> and is independent of any changes else in the code.
> 
> It would make it considerably easier for me to scan down
> the tests to ensure that each one represents reasonable
> behaviour.
> 
> {
>     description: "lines are longer than buffer size",
>     data: []string{
>         "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz\n",
>         "0123456789012345678901234567890123456789012345678901\n",
>         "the quick brown fox ",
>     },
>     initialLinesWritten:   1,
>     initialLinesRequested: 1,
>     bufferSize:            5,
>     initialCollectedData:  []string{
>         "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz\n",
>     },
>     appendedCollectedData: []string{
>         "0123456789012345678901234567890123456789012345678901\n",
>     },
> },
> 

This case the test table would blow up again and also would be inconsistent
(e.g. when using the standard larger group of lines). So now using three
different data variables with more speaking names as a compromise. Also located
them near to the table.

https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:215: line, err := buffer.ReadString('\n')
On 2013/12/12 14:46:39, rog wrote:
> If the tailer doesn't produce enough data, we'll block here forever I think,
> regardless of the timeout.
> 
> I think you probably want something more like the below:
> 
> func assertCollected(c *gc.C, reader io.Reader, compare []string, injection
> func([]string)
>      ) {
>    lineChan := make(chan string)
>    go func() {
>        defer close(lineChan)
>        reader := bufio.NewReader(reader)
>        for {
>            line, err := reader.ReadString('\n')
>            if !c.Check(err, gc.IsNil) {
>                return
>            }
>            lineChan <- line
>        }
>    }()
>    timeout := time.After(testing.LongWait)
>    for {
>         select {
>         case line, ok := <-lineChan:
>              c.Assert(ok, jc.IsTrue)
>              deal with line as currently
>         case <-timeout: 
>               timed out
>         }
>     }
> }

Done with external reading goroutine.

https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:222: for i := 0; i < len(lines); i++ {
On 2013/12/12 14:46:39, rog wrote:
> c.Assert(lines, gc.DeepEquals, compare) ?

Done.

https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:228: if err == io.EOF {
On 2013/12/12 14:46:39, rog wrote:
> Once we've got EOF once on a pipe, I don't think we can get any more data, can
> we?

Yeah, missed it when moving from buffer to Pipe.

Sign in to reply to this message.

rog

LGTM with the tests fixed, thanks! https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer_test.go File utils/tailer/tailer_test.go (right): https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer_test.go#newcode42 utils/tailer/tailer_test.go:42: data: data[26:29], On ...

11 years, 7 months ago (2013-12-13 12:52:14 UTC) #11

LGTM with the tests fixed, thanks!

https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer_test.go
File utils/tailer/tailer_test.go (right):

https://codereview.appspot.com/36540043/diff/80001/utils/tailer/tailer_test.g...
utils/tailer/tailer_test.go:42: data:                  data[26:29],
On 2013/12/13 11:23:12, mue wrote:
> On 2013/12/12 14:46:39, rog wrote:
> > I like the new tests much better now. One thought though:
> > 
> > I'm not convinced that referring to "data" in all these tests makes them
more
> > readable - I have no idea what's in data[26:29] without counting or using
grep
> > -n.
> > 
> > Something like the below shows me exactly what's going
> > on without me needing to refer to multiple places,
> > and is independent of any changes else in the code.
> > 
> > It would make it considerably easier for me to scan down
> > the tests to ensure that each one represents reasonable
> > behaviour.
> > 
> > {
> >     description: "lines are longer than buffer size",
> >     data: []string{
> >         "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz\n",
> >         "0123456789012345678901234567890123456789012345678901\n",
> >         "the quick brown fox ",
> >     },
> >     initialLinesWritten:   1,
> >     initialLinesRequested: 1,
> >     bufferSize:            5,
> >     initialCollectedData:  []string{
> >         "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz\n",
> >     },
> >     appendedCollectedData: []string{
> >         "0123456789012345678901234567890123456789012345678901\n",
> >     },
> > },
> > 
> 
> This case the test table would blow up again and also would be inconsistent
> (e.g. when using the standard larger group of lines). So now using three
> different data variables with more speaking names as a compromise. Also
located
> them near to the table.

I'm with you about alphabetData (which is also nicely memorable, which helps),
but most of them do really benefit from having the data visible in the test.

I processed my local copy to move much of the test data into the tests, and two
issues became immediately obvious, though I'd missed them when
cross-referencing, which I think is a reasonable indication that the tests are
not currently that clear.

Here's my modified version of the test. Only alphabetData remains. I'd prefer it
if we could use this form.

http://paste.ubuntu.com/6566524/

https://codereview.appspot.com/36540043/diff/100001/utils/tailer/tailer_test.go
File utils/tailer/tailer_test.go (right):

https://codereview.appspot.com/36540043/diff/100001/utils/tailer/tailer_test....
utils/tailer/tailer_test.go:100: description:           "lines are longer than
buffer size, missing termination of last line",
How is the last line missing termination here?

https://codereview.appspot.com/36540043/diff/100001/utils/tailer/tailer_test....
utils/tailer/tailer_test.go:117: data:                  unterminatedData[0:2],
The last line doesn't seem to be missing termination here.

Sign in to reply to this message.

mue

11 years, 7 months ago (2013-12-13 13:25:07 UTC) #12

Please take a look.

https://codereview.appspot.com/36540043/diff/100001/utils/tailer/tailer_test.go
File utils/tailer/tailer_test.go (right):

https://codereview.appspot.com/36540043/diff/100001/utils/tailer/tailer_test....
utils/tailer/tailer_test.go:100: description:           "lines are longer than
buffer size, missing termination of last line",
On 2013/12/13 12:52:14, rog wrote:
> How is the last line missing termination here?

Done.

https://codereview.appspot.com/36540043/diff/100001/utils/tailer/tailer_test....
utils/tailer/tailer_test.go:117: data:                  unterminatedData[0:2],
On 2013/12/13 12:52:14, rog wrote:
> The last line doesn't seem to be missing termination here.

Done.

Sign in to reply to this message.

Issue 36540043: utils: added Tailer for tailing of logs in API

Description

Patch Set 1 #

Patch Set 2 : utils: added Tailer for tailing of logs in API #

Patch Set 3 : utils: added Tailer for tailing of logs in API #

Patch Set 4 : utils: added Tailer for tailing of logs in API #

Patch Set 5 : utils: added Tailer for tailing of logs in API #

Patch Set 6 : utils: added Tailer for tailing of logs in API #

Patch Set 7 : utils: added Tailer for tailing of logs in API #

Messages