|
|
Created:
15 years, 3 months ago by aam Modified:
15 years, 3 months ago Reviewers:
CC:
rsc, r, phf, golang-dev Visibility:
Public. |
Descriptionbytes, strings: add new function Fields
Patch Set 1 #Patch Set 2 : code review 170046: Add SplitWhite() to strings; splits its argument into a... #
Total comments: 6
Patch Set 3 : code review 170046: Add SplitSpace() to strings; splits its argument into a... #
Total comments: 6
Patch Set 4 : code review 170046: Add SplitSpace() to strings; splits its argument into a... #
Total comments: 5
Patch Set 5 : code review 170046: Add SplitSpace() to strings and bytes; splits its argum... #Patch Set 6 : code review 170046: Add SplitSpace() to strings and bytes; splits its argum... #
Total comments: 2
Patch Set 7 : code review 170046: Changed SplitFields() to Fields(); Re-implemented strin... #Patch Set 8 : code review 170046: Added a few more tests suggested by rsc to strings_test... #Patch Set 9 : code review 170046: bytes, strings: add new function Fields #
Total comments: 5
Patch Set 10 : code review 170046: bytes, strings: add new function Fields #
MessagesTotal messages: 29
Hello rsc, I'd like you to review the following change.
Sign in to reply to this message.
Once this one is ready we'll want to put the same code in bytes. http://codereview.appspot.com/170046/diff/1001/6 File src/pkg/strings/strings.go (right): http://codereview.appspot.com/170046/diff/1001/6#newcode52 src/pkg/strings/strings.go:52: /* unicode whitespace characters: should call unicode.IsSpace instead. the fundamental property, by the way, is that they are spaces, not that they are white, so if you only pick half the word, the right half is "space". http://codereview.appspot.com/170046/diff/1001/6#newcode77 src/pkg/strings/strings.go:77: // CountWhite counts the number of non-overlapping, non-consecutive whitespace characters in s. this seems like a very special-purpose function to export. i suggest making it func countSpace or inlining it below. inlining it is probably fine. http://codereview.appspot.com/170046/diff/1001/6#newcode170 src/pkg/strings/strings.go:170: // SplitWhite splits the string s around each instance of one or more consecutive whitespace characters, returning an array of substrings of s. This function should be moved down after SplitAfter, so that Split and SplitAfter, which are very closely related, stay adjacent. It would be SplitSpace, not SplitWhite (see above), but I'm not sure that's the right name either. Note that Split("", 0) returns []string{""}. Perhaps this function should return []string{} and have a different name entirely. (In Python, ''.split('x') is [''] but ''.split() is []. I think that's a bit confusing.) Maybe Fields: // Fields splits s into sequences of non-space characters, // returning an array of substrings of s func Fields(s string) { http://codereview.appspot.com/170046/diff/1001/6#newcode176 src/pkg/strings/strings.go:176: whiting := false; inSpace := false; http://codereview.appspot.com/170046/diff/1001/6#newcode179 src/pkg/strings/strings.go:179: if iswhite(rune) && !whiting { I think the loop gets cleaner by tracking whether it is currently in a field: fieldStart := -1; for i := 0; i <= len(s); { rune, size := ... if fieldStart < 0 && size > 0 && !isSpace(rune) { fieldStart = i; continue; } if fieldStart >= 0 && (size == 0 || isSpace(rune)) { a[na] = s[fieldStart:i]; na++; fieldStart = -1; } if size == 0 { break; } i += size; } http://codereview.appspot.com/170046/diff/1001/7 File src/pkg/strings/strings_test.go (right): http://codereview.appspot.com/170046/diff/1001/7#newcode164 src/pkg/strings/strings_test.go:164: SplitWhiteTest{faces, []string{faces}}, Add a test for "" and []string{}.
Sign in to reply to this message.
+golang-dev
Sign in to reply to this message.
Hello rsc (cc: golang-dev@googlegroups.com), I'd like you to review the following change.
Sign in to reply to this message.
http://codereview.appspot.com/170046/diff/2004/3002 File src/pkg/strings/strings.go (right): http://codereview.appspot.com/170046/diff/2004/3002#newcode140 src/pkg/strings/strings.go:140: inSpace := false; I suggest making this loop the same sense as the one below. inField := false; for _, rune := range s { // no need for parens wasInField := inField; inField := !unicode.IsSpace(rune); if inField && !wasInField { n++; } } http://codereview.appspot.com/170046/diff/2004/3002#newcode155 src/pkg/strings/strings.go:155: for i := 0; i < len(s) && na < n; { using i <= len(s) means you can drop the if after the for loop. also you should know the exact number of fields and thus drop && na < n. http://codereview.appspot.com/170046/diff/2004/3003 File src/pkg/strings/strings_test.go (right): http://codereview.appspot.com/170046/diff/2004/3003#newcode165 src/pkg/strings/strings_test.go:165: SplitSpaceTest{faces, []string{faces}}, test empty string, leading spaces, only spaces.
Sign in to reply to this message.
Hello rsc (cc: golang-dev@googlegroups.com), I'd like you to review the following change.
Sign in to reply to this message.
http://codereview.appspot.com/170046/diff/2004/3002 File src/pkg/strings/strings.go (right): http://codereview.appspot.com/170046/diff/2004/3002#newcode155 src/pkg/strings/strings.go:155: for i := 0; i < len(s) && na < n; { On 2009/12/11 20:40:33, rsc wrote: > using i <= len(s) means you can > drop the if after the for loop. > also you should know the exact number of > fields and thus drop && na < n. not addressed: i <= len(s). http://codereview.appspot.com/170046/diff/2004/3003 File src/pkg/strings/strings_test.go (right): http://codereview.appspot.com/170046/diff/2004/3003#newcode165 src/pkg/strings/strings_test.go:165: SplitSpaceTest{faces, []string{faces}}, On 2009/12/11 20:40:33, rsc wrote: > test empty string, leading spaces, only spaces. not addressed http://codereview.appspot.com/170046/diff/3008/3009 File src/pkg/strings/strings.go (right): http://codereview.appspot.com/170046/diff/3008/3009#newcode137 src/pkg/strings/strings.go:137: // SplitSpace splits the string s around each instance of one or more consecutive whitespace characters, returning an array of substrings of s. i really think this should be Fields, so that you can return an empty slice for an empty string. http://codereview.appspot.com/170046/diff/3008/3009#newcode139 src/pkg/strings/strings.go:139: n := 1; then this is n = 0. http://codereview.appspot.com/170046/diff/3008/3009#newcode141 src/pkg/strings/strings.go:141: wasInField := false; then this goes away http://codereview.appspot.com/170046/diff/3008/3009#newcode149 src/pkg/strings/strings.go:149: if !inField && wasInField { then this goes away http://codereview.appspot.com/170046/diff/3008/3009#newcode173 src/pkg/strings/strings.go:173: if fieldStart != -1 { then this goes away
Sign in to reply to this message.
Hello rsc (cc: golang-dev@googlegroups.com), I'd like you to review the following change.
Sign in to reply to this message.
http://codereview.appspot.com/170046/diff/2009/2012 File src/pkg/strings/strings.go (right): http://codereview.appspot.com/170046/diff/2009/2012#newcode139 src/pkg/strings/strings.go:139: func SplitFields(s string) []string { i would call this Fields. strings.Fields() seems like a fine name. yes, it's a splitter but the shorter name is good enough http://codereview.appspot.com/170046/diff/2009/2012#newcode153 src/pkg/strings/strings.go:153: for i := 0; i <= len(s); { did you try writing this using a range over the string? it might not be better but it might.
Sign in to reply to this message.
I think "Fields" will be misleading to some extent. This is only for whitespace-separated strings, right? So you can't, for instance, use it for CSV files, which certainly consist of "fields". I was writing a more general splitter that you could pass the set of characters to split on to, that one could be called "fields" a little more appropriately. Sadly I got distracted, but I'd recommend either adding a parameter with all runes to split on, or implementing a function that does that and then calling it from SplitSpace(). Actually, come to think of it, I'd probably call it SplitOnSpaces() or something.
Sign in to reply to this message.
Hello rsc, r, phf (cc: golang-dev@googlegroups.com), I'd like you to review the following change.
Sign in to reply to this message.
I've made the changes requested by Rob. I don't feel strongly about the name either way, but the functionality may be an issue. I had written an earlier version that accepted a string as an argument and split across one or more consecutive instances of the string, then I thought that the string could be treated as a set of characters to use when splitting (with maybe a flag to indicate preferences about repetition) and things started sounding complicated. How to distill all this functionality into a couple of functions without copying python's split nasties (the function acts one way when presented with arguments, but completely differently when called without)? The current version is simple enough and does the job well enough that I am fine with it -- i'm using it to parse tabulated streams of text similar to what tabwriter outputs.
Sign in to reply to this message.
On Mon, Dec 14, 2009 at 11:46 AM, andrey mirtchovski <mirtchovski@gmail.com> wrote: > split across one or more consecutive instances of the string, then I > thought that the string could be treated as a set of characters to use > when splitting That's what I had too. I also translated that string into a map (hoping to gain speed for large sets) but that's about where I got distracted. :-/ I think your function is good the way it is, and it doesn't preclude adding the more general one latter with yours being a wrapper (or just a more efficient version).
Sign in to reply to this message.
This looks fine but I still want to define and test the behavior when there is nothing but spaces in the input. Fields is a good name. The doc comment probably needs to be more different from Split, because (I hope) Fields will return an empty list for an empty string but Split returns a 1-element list. // Fields returns a slice containing the sequences of // non-space characters in s. http://codereview.appspot.com/170046/diff/2004/3003 File src/pkg/strings/strings_test.go (right): http://codereview.appspot.com/170046/diff/2004/3003#newcode165 src/pkg/strings/strings_test.go:165: SplitSpaceTest{faces, []string{faces}}, On 2009/12/11 21:40:07, rsc wrote: > On 2009/12/11 20:40:33, rsc wrote: > > test empty string, leading spaces, only spaces. > > not addressed still not addressed
Sign in to reply to this message.
On 15/12/2009, at 2:58 AM, peter.hans.froehlich@gmail.com wrote: > I think "Fields" will be misleading to some extent. This is only for > whitespace-separated strings, right? So you can't, for instance, use > it > for CSV files, which certainly consist of "fields". I was writing a > more > general splitter that you could pass the set of characters to split on > to, that one could be called "fields" a little more appropriately. > Sadly > I got distracted, but I'd recommend either adding a parameter with all > runes to split on, or implementing a function that does that and then > calling it from SplitSpace(). Actually, come to think of it, I'd > probably call it SplitOnSpaces() or something. > > http://codereview.appspot.com/170046 I'd rather see something like Plan 9's tokenize. A more general interface like getfields's would be useful too, but in practice tokenize was almost always the solution. http://plan9.bell-labs.com/magic/man2html/2/getfields -rob
Sign in to reply to this message.
> This looks fine but I still want to define and test > the behavior when there is nothing but spaces > in the input. I thought I adressed this. The latest revision shows the following test for 'all spaces' string, leading and trailing spaces: http://codereview.appspot.com/170046/diff2/2009:3030/2021 FieldsTest{"\u2000\u2001\u2002", []string{}}, FieldsTest{"\n™\t™", []string{"™", "™"}}, FieldsTest{"™\n™\n™\n", []string{"™", "™", "™"}}, Am I not submitting something correctly? (The space characters in the case of an all-whitespace string are a bit unorthodox, but this also checks handling of unicode spaces.) I've changed the comment to say that an empty list is returned for an all-spaces string/byte array (in the case of bytes.go) but will hold off from submitting it in case you want the tests above done differently.
Sign in to reply to this message.
> I'd rather see something like Plan 9's tokenize. A more general interface > like getfields's would be useful too, but in practice tokenize was almost > always the solution. > > http://plan9.bell-labs.com/magic/man2html/2/getfields > > -rob > I realized that we're moving closer to tokenize too. I'll get on it. andrey
Sign in to reply to this message.
Please add some simpler tests too: "" " " " \t " " abc " It's good to test Unicode too, but there's no harm in tests we can read. ;-) The specific test I was looking for and not finding was "". Thanks. Russ
Sign in to reply to this message.
Fields as currently defined handles most inputs well. Moving to tokenize forces a particular quoting syntax, and I don't think Plan 9's quoting syntax makes any sense here. It might make sense to use Go's quoting syntax. If we do add something more like tokenize I think it should be a separate function, like QuotedFields. I'd like to keep Fields as it is, with no interpretation of quotes. Russ
Sign in to reply to this message.
On 15/12/2009, at 7:03 AM, Russ Cox wrote: > Fields as currently defined handles most inputs well. > Moving to tokenize forces a particular quoting syntax, > and I don't think Plan 9's quoting syntax makes > any sense here. It might make sense to use Go's > quoting syntax. > > If we do add something more like tokenize I think it should > be a separate function, like QuotedFields. > > I'd like to keep Fields as it is, with no interpretation of quotes. Sure. I meant the quote-free form anyway. I'd forgotten about rc's influence on tokenization. -rob
Sign in to reply to this message.
Hello rsc, r, phf (cc: golang-dev@googlegroups.com), I'd like you to review the following change.
Sign in to reply to this message.
By the way, the change description is supposed to be what the change log entry will be, not just what happened since the last upload. Every time it changes, the subject of the mails being sent out changes, which breaks threading in gmail and other mail readers. Instead of changing the description and running hg mail, you can just reply to the review messages and say something like PTAL Added a few more tests suggested by rsc. Please do run hg change one more time to edit the description to be more like a typical change log description (see http://golang.org/doc/contribute.html), something like bytes, strings: add new function Fields Thanks.
Sign in to reply to this message.
> Instead of changing the description and running > hg mail, you can just reply to the review messages > and say something like > > PTAL > > Added a few more tests suggested by rsc. (And, since you're not running hg mail, you'd run "hg upload" to get the new patch uploaded.) Russ
Sign in to reply to this message.
> Please do run hg change one more time to edit > the description to be more like a typical change > log description (see http://golang.org/doc/contribute.html), > something like > > bytes, strings: add new function Fields > > Thanks. That is done now. I should've paid slightly more attention to all the other messages flying through golang-dev to get a hand on the conventions, sorry.
Sign in to reply to this message.
http://codereview.appspot.com/170046/diff/8001/8002 File src/pkg/bytes/bytes.go (right): http://codereview.appspot.com/170046/diff/8001/8002#newcode166 src/pkg/bytes/bytes.go:166: // Fields splits the array s around each instance of one or more consecutive whitespace s/whitespace/white space/ http://codereview.appspot.com/170046/diff/8001/8002#newcode167 src/pkg/bytes/bytes.go:167: // characters, returning an array of subarrays of s or an empty list if s contains only spaces. s/array/slice/g s/spaces/white space/ http://codereview.appspot.com/170046/diff/8001/8003 File src/pkg/bytes/bytes_test.go (right): http://codereview.appspot.com/170046/diff/8001/8003#newcode267 src/pkg/bytes/bytes_test.go:267: FieldsTest{" \t ", []string{}}, put the trivial cases first http://codereview.appspot.com/170046/diff/8001/8004 File src/pkg/strings/strings.go (right): http://codereview.appspot.com/170046/diff/8001/8004#newcode138 src/pkg/strings/strings.go:138: // characters, returning an array of substrings of s or an empty list if s contains only spaces. s/whitespace/white space/ s/spaces/white space/ http://codereview.appspot.com/170046/diff/8001/8005 File src/pkg/strings/strings_test.go (right): http://codereview.appspot.com/170046/diff/8001/8005#newcode194 src/pkg/strings/strings_test.go:194: FieldsTest{" abc ", []string{"abc"}}, put trivial cases first.
Sign in to reply to this message.
PTAL Incorporated latest comments, no other changes.
Sign in to reply to this message.
the code still seems too complicated to me but logically ok. i'll let rsc decide how to proceed.
Sign in to reply to this message.
LGTM thanks You will probably want to run cd $GOROOT/src/pkg hg revert bytes strings hg sync to update; the revert will keep from getting a merge conflict when you sync, because of the semicolon stripping.
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=251d2be5c71e *** bytes, strings: add new function Fields R=rsc, r, phf CC=golang-dev http://codereview.appspot.com/170046 Committer: Russ Cox <rsc@golang.org>
Sign in to reply to this message.
|