|
|
Descriptionbufio: new Scanner interface
Add a new, simple interface for scanning (probably textual) data,
based on a new type called Scanner. It does its own internal buffering,
so should be plausibly efficient even without injecting a bufio.Reader.
The format of the input is defined by a "split function", by default
splitting into lines. Other implemented split functions include single
bytes, single runes, and space-separated words.
Here's the loop to scan stdin as a file of lines:
s := bufio.NewScanner(os.Stdin)
for s.Scan() {
fmt.Printf("%s\n", s.Bytes())
}
if s.Err() != nil {
log.Fatal(s.Err())
}
While we're dealing with spaces, define what space means to strings.Fields.
Fixes issue 4802.
Patch Set 1 #
Total comments: 39
Patch Set 2 : diff -r 0adf91947752 https://code.google.com/p/go/ #
Total comments: 12
Patch Set 3 : diff -r 8b51b7f3d2a6 https://code.google.com/p/go #
Total comments: 26
Patch Set 4 : diff -r 8b51b7f3d2a6 https://code.google.com/p/go #Patch Set 5 : diff -r 8b51b7f3d2a6 https://code.google.com/p/go #Patch Set 6 : diff -r 8b51b7f3d2a6 https://code.google.com/p/go #Patch Set 7 : diff -r 9c3930413c1b https://code.google.com/p/go #Patch Set 8 : diff -r 9c3930413c1b https://code.google.com/p/go #Patch Set 9 : diff -r 9c3930413c1b https://code.google.com/p/go #Patch Set 10 : diff -r a81ef8e0cc05 https://code.google.com/p/go #Patch Set 11 : diff -r abd0feffdac3 https://code.google.com/p/go #
MessagesTotal messages: 23
Hello golang-dev@googlegroups.com, I'd like you to review this change to https://code.google.com/p/go/
Sign in to reply to this message.
https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go File src/pkg/bufio/scan.go (right): https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode23 src/pkg/bufio/scan.go:23: // Scanning stops, unrecoverably, at EOF, the first I/O error, or a remove commas around "unrecoverably"? confuses with the list that follows https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode29 src/pkg/bufio/scan.go:29: // A typical scan loop over text lines looks like this: add a testable example too? (or instead) https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode68 src/pkg/bufio/scan.go:68: ErrNegativeAdvance = errors.New("Scanner: SplitFunc returns negative advance count") This error and the next feel like programmer error and worthy of a panic. Would people want to catch these errors programmatically? I can't think of a reason why. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode113 src/pkg/bufio/scan.go:113: // occurred during scanning, except that if it was a plain EOF, Err s/a plain EOF/an io.EOF/ https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode113 src/pkg/bufio/scan.go:113: // occurred during scanning, except that if it was a plain EOF, Err This doc comment says more about Err than the comment on Err. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode163 src/pkg/bufio/scan.go:163: if n == 0 { // Don't loop forever if Reader doesn't deliver EOF. This seems an unconventional precaution. Is there precedent? https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode174 src/pkg/bufio/scan.go:174: s.setErr(ErrNegativeAdvance) Yeah, I really reckon these should be panics. The only reason they'll happen is if the SplitFunc is busted. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode233 src/pkg/bufio/scan.go:233: // We know it's an error: we have width==1and implicitly r==utf8.RuneError. s/1and/1 and/ https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode256 src/pkg/bufio/scan.go:256: } instead of checking for the carriage return twice, you could write this here defer func() { if len(token) > 0 && token[len(token)-1] == '\r' { token = token[:len(token)-1] } }() or make that a function splitCr(token []byte) []byte and return like return i + 1, splitCr(data[:i]) https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode262 src/pkg/bufio/scan.go:262: return i + 1, data[0 : i-1] this line could be just data = data[:i-1] https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode271 src/pkg/bufio/scan.go:271: return len(data), data[0 : len(data)-1] ditto https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode309 src/pkg/bufio/scan.go:309: var width, start int delete https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode310 src/pkg/bufio/scan.go:310: for width, start = 0, 0; start < len(data); start += width { s/=/:=/
Sign in to reply to this message.
https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go File src/pkg/bufio/scan.go (right): https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode282 src/pkg/bufio/scan.go:282: func isWhiteSpace(r rune) bool { isSpace? https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan_test.go File src/pkg/bufio/scan_test.go (right): https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan_test.go#newc... src/pkg/bufio/scan_test.go:43: if len(s.Bytes()) != 1 || s.Bytes()[0] != test[i] { b := s.Bytes() ? https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan_test.go#newc... src/pkg/bufio/scan_test.go:98: " abc\tdef\nghi\rjkl\fmno\vpqr\u0085stu\u00a0\n", how about a test with more than one space character? https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan_test.go#newc... src/pkg/bufio/scan_test.go:138: const max = 7 it would be good to test even lower numbers, like 1 could make max a field of slowReader
Sign in to reply to this message.
https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/export_test.go File src/pkg/bufio/export_test.go (right): https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/export_test.go#ne... src/pkg/bufio/export_test.go:7: // Exported for testing only. Imported? https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/export_test.go#ne... src/pkg/bufio/export_test.go:12: func (s *Scanner) MaxTokenSize(n int) { But why are these exported? https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/export_test.go#ne... src/pkg/bufio/export_test.go:23: func IsWhiteSpace(r rune) bool { IsSpace?
Sign in to reply to this message.
https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/export_test.go File src/pkg/bufio/export_test.go (right): https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/export_test.go#ne... src/pkg/bufio/export_test.go:7: // Exported for testing only. On 2013/02/17 01:36:37, adg wrote: > Imported? no, exported. this is export_test.go. its purpose is to export internals. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/export_test.go#ne... src/pkg/bufio/export_test.go:12: func (s *Scanner) MaxTokenSize(n int) { On 2013/02/17 01:36:37, adg wrote: > But why are these exported? so they can be used in the test. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go File src/pkg/bufio/scan.go (right): https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode23 src/pkg/bufio/scan.go:23: // Scanning stops, unrecoverably, at EOF, the first I/O error, or a On 2013/02/17 01:27:32, adg wrote: > remove commas around "unrecoverably"? confuses with the list that follows Done. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode29 src/pkg/bufio/scan.go:29: // A typical scan loop over text lines looks like this: On 2013/02/17 01:27:32, adg wrote: > add a testable example too? (or instead) in another CL. there should be a simple intro here, and complete examples elsewhere. the CL is long enough already. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode68 src/pkg/bufio/scan.go:68: ErrNegativeAdvance = errors.New("Scanner: SplitFunc returns negative advance count") On 2013/02/17 01:27:32, adg wrote: > This error and the next feel like programmer error and worthy of a panic. Would > people want to catch these errors programmatically? I can't think of a reason > why. i thought of that, but then thought panics were out of place when there's a good error mechanism in place. i could be talked into changing my mind again. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode113 src/pkg/bufio/scan.go:113: // occurred during scanning, except that if it was a plain EOF, Err On 2013/02/17 01:27:32, adg wrote: > s/a plain EOF/an io.EOF/ Done. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode113 src/pkg/bufio/scan.go:113: // occurred during scanning, except that if it was a plain EOF, Err On 2013/02/17 01:27:32, adg wrote: > This doc comment says more about Err than the comment on Err. Done. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode174 src/pkg/bufio/scan.go:174: s.setErr(ErrNegativeAdvance) On 2013/02/17 01:27:32, adg wrote: > Yeah, I really reckon these should be panics. The only reason they'll happen is > if the SplitFunc is busted. i'll let others weigh in. i don't like panics. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode233 src/pkg/bufio/scan.go:233: // We know it's an error: we have width==1and implicitly r==utf8.RuneError. On 2013/02/17 01:27:32, adg wrote: > s/1and/1 and/ Done. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode256 src/pkg/bufio/scan.go:256: } i certainly don't want a defer. a function might be nice. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode262 src/pkg/bufio/scan.go:262: return i + 1, data[0 : i-1] On 2013/02/17 01:27:32, adg wrote: > this line could be just > data = data[:i-1] Done. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode271 src/pkg/bufio/scan.go:271: return len(data), data[0 : len(data)-1] On 2013/02/17 01:27:32, adg wrote: > ditto Done. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode282 src/pkg/bufio/scan.go:282: func isWhiteSpace(r rune) bool { On 2013/02/17 01:35:10, adg wrote: > isSpace? it is Unicode White_Space, so no. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode310 src/pkg/bufio/scan.go:310: for width, start = 0, 0; start < len(data); start += width { On 2013/02/17 01:27:32, adg wrote: > s/=/:=/ start is used after the loop. in fact, the purpose of this loop is to set start. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan_test.go File src/pkg/bufio/scan_test.go (right): https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan_test.go#newc... src/pkg/bufio/scan_test.go:43: if len(s.Bytes()) != 1 || s.Bytes()[0] != test[i] { On 2013/02/17 01:35:10, adg wrote: > b := s.Bytes() ? Done. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan_test.go#newc... src/pkg/bufio/scan_test.go:98: " abc\tdef\nghi\rjkl\fmno\vpqr\u0085stu\u00a0\n", On 2013/02/17 01:35:10, adg wrote: > how about a test with more than one space character? Done. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan_test.go#newc... src/pkg/bufio/scan_test.go:138: const max = 7 On 2013/02/17 01:35:10, adg wrote: > it would be good to test even lower numbers, like 1 > could make max a field of slowReader Done.
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, adg@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
LGTM https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go File src/pkg/bufio/scan.go (right): https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode282 src/pkg/bufio/scan.go:282: func isWhiteSpace(r rune) bool { On 2013/02/17 02:23:22, r wrote: > On 2013/02/17 01:35:10, adg wrote: > > isSpace? > > it is Unicode White_Space, so no. But in the unicode package it's IsSpace... I thought it was your call originally to avoid the "white" part. No big deal though. https://codereview.appspot.com/7322088/diff/1/src/pkg/bufio/scan.go#newcode310 src/pkg/bufio/scan.go:310: for width, start = 0, 0; start < len(data); start += width { On 2013/02/17 02:23:22, r wrote: > On 2013/02/17 01:27:32, adg wrote: > > s/=/:=/ > > start is used after the loop. in fact, the purpose of this loop is to set start. ah, I missed that. In that case, it might be clearer to write: start := 0 for width := 0; start < len(data); start += width { ...
Sign in to reply to this message.
LGTM with a few remarks below. https://codereview.appspot.com/7322088/diff/3002/src/pkg/bufio/export_test.go File src/pkg/bufio/export_test.go (right): https://codereview.appspot.com/7322088/diff/3002/src/pkg/bufio/export_test.go... src/pkg/bufio/export_test.go:15: } else { lose the else? https://codereview.appspot.com/7322088/diff/3002/src/pkg/bufio/scan.go File src/pkg/bufio/scan.go (right): https://codereview.appspot.com/7322088/diff/3002/src/pkg/bufio/scan.go#newcode84 src/pkg/bufio/scan.go:84: buf: make([]byte, 4096), // Plausible starting size; needn't be large. it would be nice if we could use the reader's buffer when it's a bufio.Reader, but perhaps that's too much chumminess between types. https://codereview.appspot.com/7322088/diff/3002/src/pkg/bufio/scan.go#newcod... src/pkg/bufio/scan.go:152: newBuf := make([]byte, len(s.buf)*2) i wonder if the new buffer size should be limited to maxTokenSize. https://codereview.appspot.com/7322088/diff/3002/src/pkg/bufio/scan.go#newcod... src/pkg/bufio/scan.go:266: for i, c := range data { i'm concerned that for very long lines this will do more work than necessary. for instance, if we have a 1MB-long line and the underlying reader is returning data in 4K chunks, we'll end up scanning 128MB rather than 1MB. if there was an argument to SplitFunc that provided the amount of previously seen data, this would be easy to avoid without changing the rest of the interface. also we should probably use bytes.IndexByte. https://codereview.appspot.com/7322088/diff/3002/src/pkg/bufio/scan.go#newcod... src/pkg/bufio/scan.go:311: for width, start = 0, 0; start < len(data); start += width { you could lose the initialization if you wanted.
Sign in to reply to this message.
https://codereview.appspot.com/7322088/diff/3002/src/pkg/bufio/scan.go File src/pkg/bufio/scan.go (right): https://codereview.appspot.com/7322088/diff/3002/src/pkg/bufio/scan.go#newcod... src/pkg/bufio/scan.go:201: // Bytes is a split function that returns each byte as a token. This will look weird in godoc for pkg/bufio. If this is going to be named like this, the docs should at least reference the Scanner type.
Sign in to reply to this message.
https://codereview.appspot.com/7322088/diff/3002/src/pkg/bufio/export_test.go File src/pkg/bufio/export_test.go (right): https://codereview.appspot.com/7322088/diff/3002/src/pkg/bufio/export_test.go... src/pkg/bufio/export_test.go:15: } else { On 2013/02/18 15:04:11, rog wrote: > lose the else? Done. https://codereview.appspot.com/7322088/diff/3002/src/pkg/bufio/scan.go File src/pkg/bufio/scan.go (right): https://codereview.appspot.com/7322088/diff/3002/src/pkg/bufio/scan.go#newcode84 src/pkg/bufio/scan.go:84: buf: make([]byte, 4096), // Plausible starting size; needn't be large. the purpose of this buffer is very different from bufio's. i'm not even sure how to share. https://codereview.appspot.com/7322088/diff/3002/src/pkg/bufio/scan.go#newcod... src/pkg/bufio/scan.go:152: newBuf := make([]byte, len(s.buf)*2) On 2013/02/18 15:04:11, rog wrote: > i wonder if the new buffer size should be limited to maxTokenSize. Done. https://codereview.appspot.com/7322088/diff/3002/src/pkg/bufio/scan.go#newcod... src/pkg/bufio/scan.go:201: // Bytes is a split function that returns each byte as a token. On 2013/02/18 20:09:25, bradfitz wrote: > This will look weird in godoc for pkg/bufio. If this is going to be named like > this, the docs should at least reference the Scanner type. Done. https://codereview.appspot.com/7322088/diff/3002/src/pkg/bufio/scan.go#newcod... src/pkg/bufio/scan.go:266: for i, c := range data { no we won't, because the maximum token size is 64K. i thought about this issue but decided for a reasonable token maximum, the cost is affordable and dealing with the problem adds significant complexity to the signature and implementation of split functions. i'm comfortable with the current design. the IndexByte idea is good, though. i hadn't noticed that the package already depends on "bytes".
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, adg@golang.org, rogpeppe@gmail.com, bradfitz@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
I think your recent update came in while I was looking at this older copy. https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/export_test.go File src/pkg/bufio/export_test.go (right): https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/export_test.go... src/pkg/bufio/export_test.go:22: func IsSpace(r rune) bool { fine but even easier is: var IsSpace = isSpace https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go File src/pkg/bufio/scan.go (right): https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcode16 src/pkg/bufio/scan.go:16: // Scanner.Scan will step through the 'tokens' of a file, skipping s/Scanner.// There is no Scanner.Scan; there's a (*Scanner).Scan. https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcode25 src/pkg/bufio/scan.go:25: // token too large to fit in the buffer. When a scan stops, the Reader Unqualified Reader here should refer to bufio.Reader but it refers to io.Reader. I think you can sidestep the issue by s/R/r/. https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcode28 src/pkg/bufio/scan.go:28: // scans on a Reader, should use bufio.Reader instead. Same problem. s/ on a Reader// Technically, should s/bufio.// too but I guess it's fine. https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcode30 src/pkg/bufio/scan.go:30: // A typical scan loop over text lines looks like this: Move this into a func ExampleScanner. https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcode64 src/pkg/bufio/scan.go:64: type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte) I wonder if SplitFunc should be allowed to return an error. https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcode68 src/pkg/bufio/scan.go:68: ErrTooLong = errors.New("Scanner: token too long") s/Scanner/bufio/? Usually our errors say the package name. https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcode89 src/pkg/bufio/scan.go:89: // Err returns the first error encountered by the Scanner unless that error is io.EOF, // Err returns the first non-EOF error encountered by the Scanner. https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcod... src/pkg/bufio/scan.go:105: // String returns the most recent token generated by a call to Scan s/String/Text/ https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcod... src/pkg/bufio/scan.go:214: var errorRune = []byte{'\xEF', '\xBF', '\xBD'} // []byte(string(utf8.RuneError)) I think the compiler will do var errorRune = []byte(string(utf8.RuneError)) just fine. https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcod... src/pkg/bufio/scan.go:216: // Runes is a split function for a Scanner that returns each UTF-8-encoded The names Runes, Lines, etc seem like they might be a bit too short, at least in the context of package bufio as opposed to a separate package just for scanning. Perhaps ScanRunes, ScanLines, ScanWords? That has two benefits: it suggests looking for Scan (found in Scanner) and it will put the three functions next to each other in the documentation. https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcod... src/pkg/bufio/scan.go:265: // one mandatory newline. In regular expression notation, it is `\r?\n', s/,/./ next line s/although the/The/ I don't believe the end-of-input behavior is related to the regexp.
Sign in to reply to this message.
Actually, looks like I did comment on the right (latest) version.
Sign in to reply to this message.
https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/export_test.go File src/pkg/bufio/export_test.go (right): https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/export_test.go... src/pkg/bufio/export_test.go:22: func IsSpace(r rune) bool { done, but seems tricky https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go File src/pkg/bufio/scan.go (right): https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcode16 src/pkg/bufio/scan.go:16: // Scanner.Scan will step through the 'tokens' of a file, skipping On 2013/02/19 17:53:44, rsc wrote: > s/Scanner.// > > There is no Scanner.Scan; there's a (*Scanner).Scan. Done. https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcode25 src/pkg/bufio/scan.go:25: // token too large to fit in the buffer. When a scan stops, the Reader On 2013/02/19 17:53:44, rsc wrote: > Unqualified Reader here should refer to bufio.Reader but it refers to io.Reader. > I think you can sidestep the issue by s/R/r/. Done. https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcode28 src/pkg/bufio/scan.go:28: // scans on a Reader, should use bufio.Reader instead. On 2013/02/19 17:53:44, rsc wrote: > Same problem. s/ on a Reader// > Technically, should s/bufio.// too but I guess it's fine. > Done. https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcode30 src/pkg/bufio/scan.go:30: // A typical scan loop over text lines looks like this: i weakly disagree. i will provide examples later, but since this simple interface is for simpletons, and for the vast majority of uses this loop has all the usage information you need, is it worth putting it elsewhere? https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcode64 src/pkg/bufio/scan.go:64: type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte) On 2013/02/19 17:53:44, rsc wrote: > I wonder if SplitFunc should be allowed to return an error. it's more complicated, and in the four i wrote the issue never came up. is it important? https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcode68 src/pkg/bufio/scan.go:68: ErrTooLong = errors.New("Scanner: token too long") On 2013/02/19 17:53:44, rsc wrote: > s/Scanner/bufio/? > Usually our errors say the package name. Done. https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcode89 src/pkg/bufio/scan.go:89: // Err returns the first error encountered by the Scanner unless that error is io.EOF, On 2013/02/19 17:53:44, rsc wrote: > // Err returns the first non-EOF error encountered by the Scanner. Done. That's exactly the text I had before adg complained it was too short. https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcod... src/pkg/bufio/scan.go:105: // String returns the most recent token generated by a call to Scan On 2013/02/19 17:53:44, rsc wrote: > s/String/Text/ Done. https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcod... src/pkg/bufio/scan.go:214: var errorRune = []byte{'\xEF', '\xBF', '\xBD'} // []byte(string(utf8.RuneError)) On 2013/02/19 17:53:44, rsc wrote: > I think the compiler will do var errorRune = []byte(string(utf8.RuneError)) just > fine. Done. https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcod... src/pkg/bufio/scan.go:216: // Runes is a split function for a Scanner that returns each UTF-8-encoded changed to ScanRunes etc. https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcod... src/pkg/bufio/scan.go:265: // one mandatory newline. In regular expression notation, it is `\r?\n', On 2013/02/19 17:53:44, rsc wrote: > s/,/./ > next line > s/although the/The/ > > I don't believe the end-of-input behavior is related to the regexp. Done.
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, adg@golang.org, rogpeppe@gmail.com, bradfitz@golang.org, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
https://codereview.appspot.com/7322088/diff/3002/src/pkg/bufio/scan.go File src/pkg/bufio/scan.go (right): https://codereview.appspot.com/7322088/diff/3002/src/pkg/bufio/scan.go#newcod... src/pkg/bufio/scan.go:266: for i, c := range data { On 2013/02/19 17:33:05, r wrote: > no we won't, because the maximum token size is 64K. i thought about this issue > but decided for a reasonable token maximum, the cost is affordable and dealing > with the problem adds significant complexity to the signature and implementation > of split functions. i'm comfortable with the current design. > > the IndexByte idea is good, though. i hadn't noticed that the package already > depends on "bytes". it seems a pity to wire 64K as a maximum, given the general nature of the scanning. we don't have many hard limits elsewhere. i'd assumed that a later CL might add a Scanner.MaxTokenSize method or NewScannerSize function.
Sign in to reply to this message.
Again, and strongly, this is supposed to be convenient for workaday jobs. You are pushing to make it something of greater engineering power, and that requires complexity I am trying to avoid. -rob
Sign in to reply to this message.
https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go File src/pkg/bufio/scan.go (right): https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcode30 src/pkg/bufio/scan.go:30: // A typical scan loop over text lines looks like this: On 2013/02/19 18:14:36, r wrote: > i weakly disagree. i will provide examples later, but since this simple > interface is for simpletons, and for the vast majority of uses this loop has all > the usage information you need, is it worth putting it elsewhere? Yes, because if it's an executable example it will not drift out of date, and also people will be able to edit and play when viewing it on golang.org. https://codereview.appspot.com/7322088/diff/3003/src/pkg/bufio/scan.go#newcode64 src/pkg/bufio/scan.go:64: type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte) On 2013/02/19 18:14:36, r wrote: > On 2013/02/19 17:53:44, rsc wrote: > > I wonder if SplitFunc should be allowed to return an error. > > it's more complicated, and in the four i wrote the issue never came up. is it > important? no, i think it's fine to leave out.
Sign in to reply to this message.
now i'm leaning towards putting an error in. it's trivial to handle in Scan and makes it possible to do things like scan numerical data but catch errors neatly. any error would terminate the scan immediately.
Sign in to reply to this message.
I don't have a strong opinion on whether SplitFunc returns an error or doesn't. I just thought it was worth bringing up. Either way is fine with me.
Sign in to reply to this message.
Hello adg@golang.org, rogpeppe@gmail.com, bradfitz@golang.org, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
*** Submitted as https://code.google.com/p/go/source/detail?r=f685026a2d38 *** bufio: new Scanner interface Add a new, simple interface for scanning (probably textual) data, based on a new type called Scanner. It does its own internal buffering, so should be plausibly efficient even without injecting a bufio.Reader. The format of the input is defined by a "split function", by default splitting into lines. Other implemented split functions include single bytes, single runes, and space-separated words. Here's the loop to scan stdin as a file of lines: s := bufio.NewScanner(os.Stdin) for s.Scan() { fmt.Printf("%s\n", s.Bytes()) } if s.Err() != nil { log.Fatal(s.Err()) } While we're dealing with spaces, define what space means to strings.Fields. Fixes issue 4802. R=adg, rogpeppe, bradfitz, rsc CC=golang-dev https://codereview.appspot.com/7322088
Sign in to reply to this message.
|