|
|
Descriptionencoding/xml: Allow comments in DOCTYPE
I mentioned on the golang-nuts mailing list that comments in directives like DOCTYPE can break XML parsing: https://groups.google.com/forum/?fromgroups#!topic/golang-nuts/DgThEhl8fv0
That thread also explains how to recreate the problem I encountered.
This patch hopefully fixes that problem, as well as adds a test for making sure directives with comments in them get parsed correctly.
Patch Set 1 #Patch Set 2 : diff -r 7abccc974354 https://code.google.com/p/go #Patch Set 3 : diff -r d8bd45866999 https://code.google.com/p/go #Patch Set 4 : diff -r 63d9b9b595c5 https://code.google.com/p/go #
Total comments: 8
Patch Set 5 : diff -r 422b4a8160bc https://code.google.com/p/go #Patch Set 6 : diff -r 422b4a8160bc https://code.google.com/p/go #Patch Set 7 : diff -r 422b4a8160bc https://code.google.com/p/go #MessagesTotal messages: 34
Hello golang-dev@googlegroups.com, I'd like you to review this change to https://code.google.com/p/go
Sign in to reply to this message.
I'll leave reviewing this change to someone more familiar with the xml package, but for now could you please update the change description to be better in line with our guidelines? http://golang.org/doc/contribute.html#tmp_65
Sign in to reply to this message.
On 2012/07/02 18:52:08, adg wrote: > I'll leave reviewing this change to someone more familiar with the xml package, > but for now could you please update the change description to be better in line > with our guidelines? > > http://golang.org/doc/contribute.html#tmp_65 Updated, sorry I missed that.
Sign in to reply to this message.
On 3 July 2012 04:52, <adg@golang.org> wrote: > I'll leave reviewing this change to someone more familiar with the xml > package, I think that rsc is the most familiar with package xml, but he's on vaction. Shawn, can you ping this mail thread again in a couple of weeks?
Sign in to reply to this message.
On 2012/07/02 23:00:01, nigeltao wrote: > On 3 July 2012 04:52, <mailto:adg@golang.org> wrote: > > I'll leave reviewing this change to someone more familiar with the xml > > package, > > I think that rsc is the most familiar with package xml, but he's on vaction. > > Shawn, can you ping this mail thread again in a couple of weeks? I pinged Gustavo Niemeyer about this a few days ago, since I found his name in this particular block of code using hg annotate -u, and he agreed to give me some feedback within the next couple of days, if that's all right. Otherwise, sure, I'll open up the thread again in a couple of weeks.
Sign in to reply to this message.
If Gustavo wants to take a look, that's also fine. Andrew On 2 July 2012 16:14, <shawn.p.smith@gmail.com> wrote: > On 2012/07/02 23:00:01, nigeltao wrote: > > On 3 July 2012 04:52, <mailto:adg@golang.org> wrote: >> > I'll leave reviewing this change to someone more familiar with the >> > xml > >> > package, >> > > I think that rsc is the most familiar with package xml, but he's on >> > vaction. > > Shawn, can you ping this mail thread again in a couple of weeks? >> > > I pinged Gustavo Niemeyer about this a few days ago, since I found his > name in this particular block of code using hg annotate -u, and he > agreed to give me some feedback within the next couple of days, if > that's all right. > > Otherwise, sure, I'll open up the thread again in a couple of weeks. > > http://codereview.appspot.com/**6330061/<http://codereview.appspot.com/6330061/> >
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, adg@golang.org, nigeltao@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
On 2012/07/03 00:36:03, adg wrote: > If Gustavo wants to take a look, that's also fine. > > Andrew > > > On 2 July 2012 16:14, <mailto:shawn.p.smith@gmail.com> wrote: > > > On 2012/07/02 23:00:01, nigeltao wrote: > > > > On 3 July 2012 04:52, <mailto:adg@golang.org> wrote: > >> > I'll leave reviewing this change to someone more familiar with the > >> > > xml > > > >> > package, > >> > > > > I think that rsc is the most familiar with package xml, but he's on > >> > > vaction. > > > > Shawn, can you ping this mail thread again in a couple of weeks? > >> > > > > I pinged Gustavo Niemeyer about this a few days ago, since I found his > > name in this particular block of code using hg annotate -u, and he > > agreed to give me some feedback within the next couple of days, if > > that's all right. > > > > Otherwise, sure, I'll open up the thread again in a couple of weeks. > > > > > http://codereview.appspot.com/**6330061/%3Chttp://codereview.appspot.com/6330...> > > Removed unnecessary "var b byte" declaration from line 617 that I had added previously.
Sign in to reply to this message.
On 2012/07/11 07:16:01, shawnps wrote: > On 2012/07/03 00:36:03, adg wrote: > > If Gustavo wants to take a look, that's also fine. > > > > Andrew > > > > > > On 2 July 2012 16:14, <mailto:shawn.p.smith@gmail.com> wrote: > > > > > On 2012/07/02 23:00:01, nigeltao wrote: > > > > > > On 3 July 2012 04:52, <mailto:adg@golang.org> wrote: > > >> > I'll leave reviewing this change to someone more familiar with the > > >> > > > xml > > > > > >> > package, > > >> > > > > > > I think that rsc is the most familiar with package xml, but he's on > > >> > > > vaction. > > > > > > Shawn, can you ping this mail thread again in a couple of weeks? > > >> > > > > > > I pinged Gustavo Niemeyer about this a few days ago, since I found his > > > name in this particular block of code using hg annotate -u, and he > > > agreed to give me some feedback within the next couple of days, if > > > that's all right. > > > > > > Otherwise, sure, I'll open up the thread again in a couple of weeks. > > > > > > > > > http://codereview.appspot.com/**6330061/%253Chttp://codereview.appspot.com/63...> > > > > > Removed unnecessary "var b byte" declaration from line 617 that I had added > previously. By the way, sorry for a potential mercurial newbie mistake, but it's a bit alarming to me that there are lines in the diff that I never added or removed...I've been running "hg sync" but haven't gotten any conflicts.
Sign in to reply to this message.
On 2012/07/11 07:15:21, shawnps wrote: > Hello mailto:golang-dev@googlegroups.com, mailto:adg@golang.org, mailto:nigeltao@golang.org (cc: > mailto:golang-dev@googlegroups.com), > > Please take another look. rsc is still on vacation.
Sign in to reply to this message.
http://codereview.appspot.com/6330061/diff/11002/src/pkg/encoding/xml/xml.go File src/pkg/encoding/xml/xml.go (left): http://codereview.appspot.com/6330061/diff/11002/src/pkg/encoding/xml/xml.go#... src/pkg/encoding/xml/xml.go:853: var semicolon bool This does not appear to have anything to do with the CL's description. I wonder if you overwrote someone else's update? Please revert these changes. http://codereview.appspot.com/6330061/diff/11002/src/pkg/encoding/xml/xml.go File src/pkg/encoding/xml/xml.go (right): http://codereview.appspot.com/6330061/diff/11002/src/pkg/encoding/xml/xml.go#... src/pkg/encoding/xml/xml.go:587: d.buf.WriteByte(b) HandleB: d.buf.WriteByte(b) http://codereview.appspot.com/6330061/diff/11002/src/pkg/encoding/xml/xml.go#... src/pkg/encoding/xml/xml.go:605: d.buf.WriteByte(b) This code appears to be mishandling quotes and > and < that are encountered while trying to peek ahead to find the <!--. I don't know what that would mean, but it's something the code used to allow and no longer does. case b == '<' && inquote == 0: // Look for <!-- to begin comment. s := "!--" for i := 0; i < len(s); i++ { if b, ok = d.mustgetc(); !ok { return nil, d.err } if b != s[i] { for j := 0; j < i; j++ { d.buf.WriteByte(s[j]) } depth++ goto HandleB } } // Remove < that was written above. d.buf.Truncate(d.buf.Len() - 1) // Look for terminator. var b0, b1 byte ... (no depth++, no need for continue at end) http://codereview.appspot.com/6330061/diff/11002/src/pkg/encoding/xml/xml_tes... File src/pkg/encoding/xml/xml_test.go (left): http://codereview.appspot.com/6330061/diff/11002/src/pkg/encoding/xml/xml_tes... src/pkg/encoding/xml/xml_test.go:160: } More apparent overwriting of someone else's changes. http://codereview.appspot.com/6330061/diff/11002/src/pkg/encoding/xml/xml_tes... File src/pkg/encoding/xml/xml_test.go (right): http://codereview.appspot.com/6330061/diff/11002/src/pkg/encoding/xml/xml_tes... src/pkg/encoding/xml/xml_test.go:585: <!DOCTYPE [<!ENTITY go "Golang"><!-- a comment-->]> <!DOCTYPE <!-> <!> <!----> <!-->--> <!--->--> [<!ENTITY...
Sign in to reply to this message.
http://codereview.appspot.com/6330061/diff/11002/src/pkg/encoding/xml/xml.go File src/pkg/encoding/xml/xml.go (left): http://codereview.appspot.com/6330061/diff/11002/src/pkg/encoding/xml/xml.go#... src/pkg/encoding/xml/xml.go:853: var semicolon bool On 2012/07/29 22:31:41, rsc wrote: > This does not appear to have anything to do with the CL's description. I wonder > if you overwrote someone else's update? Please revert these changes. Thanks for taking a look. This is probably due to a misunderstanding I'm experiencing regarding Mercurial. None of this shows up locally when I do "hg diff", but for some reason it is in here. If I save a copy of my modified xml.go, then do "hg revert xml.go", and diff the two, the only lines that show up in the diff are the lines that I added. Is it reasonable for me to do "hg mail" after that, and expect it to work? Or might this have something to do with the fact that there are multiple Patch Sets, and I should revert all of them and start over?
Sign in to reply to this message.
On 2012/07/30 01:30:29, shawnps wrote: > http://codereview.appspot.com/6330061/diff/11002/src/pkg/encoding/xml/xml.go > File src/pkg/encoding/xml/xml.go (left): > > http://codereview.appspot.com/6330061/diff/11002/src/pkg/encoding/xml/xml.go#... > src/pkg/encoding/xml/xml.go:853: var semicolon bool > On 2012/07/29 22:31:41, rsc wrote: > > This does not appear to have anything to do with the CL's description. I > wonder > > if you overwrote someone else's update? Please revert these changes. > > Thanks for taking a look. > > This is probably due to a misunderstanding I'm experiencing regarding Mercurial. > > None of this shows up locally when I do "hg diff", but for some reason it is in > here. > > If I save a copy of my modified xml.go, then do "hg revert xml.go", and diff the > two, the only lines that show up in the diff are the lines that I added. > > Is it reasonable for me to do "hg mail" after that, and expect it to work? Or > might this have something to do with the fact that there are multiple Patch > Sets, and I should revert all of them and start over? Sorry, what I meant to say was, is it reasonable for me to do "hg mail" after moving my saved copy of xml.go into go/src/pkg/encoding/xml and seeing that "hg diff" only shows my changes?
Sign in to reply to this message.
> This is probably due to a misunderstanding I'm experiencing regarding > Mercurial. > > None of this shows up locally when I do "hg diff", but for some reason > it is in here. What does 'hg sum' say? It sounds like you are synced to an earlier point in the tree, or perhaps you are sitting on a release branch. But I don't understand why those would result in what we're seeing. Russ
Sign in to reply to this message.
> Sorry, what I meant to say was, is it reasonable for me to do "hg mail" > after moving my saved copy of xml.go into go/src/pkg/encoding/xml and > seeing that "hg diff" only shows my changes? I would suggest using hg upload and then view the changes on the web to see what they look like. Once they look good, hg mail. Russ
Sign in to reply to this message.
Here is the output of 'hg sum' after copying my modified xml.go and xml_test.go back into the encoding dir: parent: 13230:5e806355a9e1 go1.0.2 release [release-branch.go1] syscall: fix windows build branch: release-branch.go1 commit: 2 modified update: (current) If I hg revert xml.go and xml_test.go, the output for 'hg sum' is the same except for this line: commit: (clean) On Sun, Jul 29, 2012 at 6:32 PM, Russ Cox <rsc@golang.org> wrote: >> This is probably due to a misunderstanding I'm experiencing regarding >> Mercurial. >> >> None of this shows up locally when I do "hg diff", but for some reason >> it is in here. > > What does 'hg sum' say? It sounds like you are synced to an earlier > point in the tree, or perhaps you are sitting on a release branch. But > I don't understand why those would result in what we're seeing. > > Russ
Sign in to reply to this message.
Ah, should I have done "hg update weekly" instead of "hg update release"? On Sun, Jul 29, 2012 at 6:37 PM, Shawn Smith <shawn.p.smith@gmail.com> wrote: > Here is the output of 'hg sum' after copying my modified xml.go and > xml_test.go back into the encoding dir: > > parent: 13230:5e806355a9e1 go1.0.2 release > [release-branch.go1] syscall: fix windows build > branch: release-branch.go1 > commit: 2 modified > update: (current) > > If I hg revert xml.go and xml_test.go, the output for 'hg sum' is the > same except for this line: > > commit: (clean) > > On Sun, Jul 29, 2012 at 6:32 PM, Russ Cox <rsc@golang.org> wrote: >>> This is probably due to a misunderstanding I'm experiencing regarding >>> Mercurial. >>> >>> None of this shows up locally when I do "hg diff", but for some reason >>> it is in here. >> >> What does 'hg sum' say? It sounds like you are synced to an earlier >> point in the tree, or perhaps you are sitting on a release branch. But >> I don't understand why those would result in what we're seeing. >> >> Russ
Sign in to reply to this message.
Please copy your xml.go changes somewhere, and then 'hg update default' to switch to the development branch. Then reapply your changes. You'll see the diffs now. Russ
Sign in to reply to this message.
I see them now, yes. Should I delete the old patch sets, reapply my changes, and then resubmit? On Sun, Jul 29, 2012 at 6:38 PM, Russ Cox <rsc@golang.org> wrote: > Please copy your xml.go changes somewhere, and then 'hg update > default' to switch to the development branch. Then reapply your > changes. You'll see the diffs now. > > Russ
Sign in to reply to this message.
On Sun, Jul 29, 2012 at 9:45 PM, Shawn Smith <shawn.p.smith@gmail.com> wrote: > I see them now, yes. Should I delete the old patch sets, reapply my > changes, and then resubmit? You don't need to worry about the old patch sets. They're good for history since they have the old comments on them. Just adjust the files until hg diff is happy and then hg mail again. Russ
Sign in to reply to this message.
http://codereview.appspot.com/6330061/diff/11002/src/pkg/encoding/xml/xml.go File src/pkg/encoding/xml/xml.go (right): http://codereview.appspot.com/6330061/diff/11002/src/pkg/encoding/xml/xml.go#... src/pkg/encoding/xml/xml.go:587: d.buf.WriteByte(b) On 2012/07/29 22:31:41, rsc wrote: > HandleB: > d.buf.WriteByte(b) Sorry, I'm a bit confused by this comment - could you please explain?
Sign in to reply to this message.
http://codereview.appspot.com/6330061/diff/11002/src/pkg/encoding/xml/xml.go File src/pkg/encoding/xml/xml.go (right): http://codereview.appspot.com/6330061/diff/11002/src/pkg/encoding/xml/xml.go#... src/pkg/encoding/xml/xml.go:587: d.buf.WriteByte(b) On 2012/07/29 22:31:41, rsc wrote: > HandleB: > d.buf.WriteByte(b) Ah, I see, it's a label for the goto.
Sign in to reply to this message.
On 2012/07/31 04:40:07, shawnps wrote: > http://codereview.appspot.com/6330061/diff/11002/src/pkg/encoding/xml/xml.go > File src/pkg/encoding/xml/xml.go (right): > > http://codereview.appspot.com/6330061/diff/11002/src/pkg/encoding/xml/xml.go#... > src/pkg/encoding/xml/xml.go:587: d.buf.WriteByte(b) > On 2012/07/29 22:31:41, rsc wrote: > > HandleB: > > d.buf.WriteByte(b) > > Ah, I see, it's a label for the goto. Indeed, sorry for the delayed response.
Sign in to reply to this message.
Hello rsc@golang.org, n13m3y3r@gmail.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
On 2012/08/04 08:12:01, shawnps wrote: > Hello mailto:rsc@golang.org, mailto:n13m3y3r@gmail.com (cc: mailto:golang-dev@googlegroups.com), > > Please take another look. I'm not sure what I'm doing wrong, but I made changes to xml.go as well and they didn't show up in the patch set. This is the output from hg diff: diff -r 422b4a8160bc src/pkg/encoding/xml/xml.go --- a/src/pkg/encoding/xml/xml.go Fri Aug 03 18:08:43 2012 -0700 +++ b/src/pkg/encoding/xml/xml.go Sat Aug 04 01:15:50 2012 -0700 @@ -584,6 +584,7 @@ if inquote == 0 && b == '>' && depth == 0 { break } + HandleB: d.buf.WriteByte(b) switch { case b == inquote: @@ -599,7 +600,35 @@ depth-- case b == '<' && inquote == 0: - depth++ + // Look for <!-- to begin comment. + s := "!--" + for i := 0; i < len(s); i++ { + if b, ok = d.mustgetc(); !ok { + return nil, d.err + } + if b != s[i] { + for j := 0; j < i; j++ { + d.buf.WriteByte(s[j]) + } + depth++ + goto HandleB + } + } + + // Remove < that was written above. + d.buf.Truncate(d.buf.Len() - 1) + + // Look for terminator. + var b0, b1 byte + for { + if b, ok = d.mustgetc(); !ok { + return nil, d.err + } + if b0 == '-' && b1 == '-' && b == '>' { + break + } + b0, b1 = b1, b + } } } return Directive(d.buf.Bytes()), nil diff -r 422b4a8160bc src/pkg/encoding/xml/xml_test.go --- a/src/pkg/encoding/xml/xml_test.go Fri Aug 03 18:08:43 2012 -0700 +++ b/src/pkg/encoding/xml/xml_test.go Sat Aug 04 01:15:50 2012 -0700 @@ -621,3 +621,36 @@ } } } + +// Ensure that directives with comments include the complete +// text of any nested directives. + +var directivesWithCommentsInput = ` +<!DOCTYPE [<!-- a comment --><!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#">]> +<!DOCTYPE [<!ENTITY go "Golang"><!-- a comment-->]> +<!DOCTYPE <!-> <!> <!----> <!-->--> <!--->--> [<!ENTITY go "Golang"><!-- a comment-->]> +` + +var directivesWithCommentsTokens = []Token{ + CharData("\n"), + Directive(`DOCTYPE [<!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#">]`), + CharData("\n"), + Directive(`DOCTYPE [<!ENTITY go "Golang">]`), + CharData("\n"), + Directive(`DOCTYPE <!-> <!> [<!ENTITY go "Golang">]`), + CharData("\n"), +} + +func TestDirectivesWithComments(t *testing.T) { + d := NewDecoder(strings.NewReader(directivesWithCommentsInput)) + + for i, want := range directivesWithCommentsTokens { + have, err := d.Token() + if err != nil { + t.Fatalf("token %d: unexpected error: %s", i, err) + } + if !reflect.DeepEqual(have, want) { + t.Errorf("token %d = %#v want %#v", i, have, want) + } + } +} Should I try again..?
Sign in to reply to this message.
Hello rsc@golang.org, n13m3y3r@gmail.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
On 2012/08/04 08:20:20, shawnps wrote: > Hello mailto:rsc@golang.org, mailto:n13m3y3r@gmail.com (cc: mailto:golang-dev@googlegroups.com), > > Please take another look. Thought I'd try it again. Guessing I'm doing something wrong, sorry about that.
Sign in to reply to this message.
When you ran hg revert xml.go it removed it from this CL. Now that it's modified again, you need to reassign the modified file to this CL, either by running 'hg change 6330061' and editing the file list in your editor, or by running 'hg file 6330061 xml.go'.
Sign in to reply to this message.
Hello rsc@golang.org, n13m3y3r@gmail.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
LGTM
Sign in to reply to this message.
Sorry for the delay here. This looks good but please complete a CLA as described at http://golang.org/doc/contribute.html#copyright. Thanks. Russ
Sign in to reply to this message.
On 2012/08/30 14:39:34, rsc wrote: > Sorry for the delay here. This looks good but please complete a CLA as > described at http://golang.org/doc/contribute.html#copyright. > > Thanks. > Russ No problem, thanks for taking another look. I submitted an individual CLA a few minutes ago. Thanks, Shawn
Sign in to reply to this message.
LGTM Will submit shortly.
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=875f1254c382 *** encoding/xml: parse comments in DOCTYPE R=rsc, n13m3y3r CC=golang-dev http://codereview.appspot.com/6330061 Committer: Russ Cox <rsc@golang.org>
Sign in to reply to this message.
|