http://codereview.appspot.com/5218041/diff/4006/src/pkg/html/render.go File src/pkg/html/render.go (right): http://codereview.appspot.com/5218041/diff/4006/src/pkg/html/render.go#newcode76 src/pkg/html/render.go:76: // TODO: figure out what to do with <script> ...
13 years, 10 months ago
(2011-10-06 12:16:49 UTC)
#2
http://codereview.appspot.com/5218041/diff/4006/src/pkg/html/render.go
File src/pkg/html/render.go (right):
http://codereview.appspot.com/5218041/diff/4006/src/pkg/html/render.go#newcode76
src/pkg/html/render.go:76: // TODO: figure out what to do with <script>
elements.
For script, style, no... elements I would
1. render the <xxx> opening tag as normal.
2. maybe error out if any child is not a text node.
3. escape the text nodes.
4. maybe error out if `</xxx` is a case-insensitive substring of the
concatenation of the children's data.
5. maybe error out if the concatenation of the children's data contains
unbalanced an escaping text span start ("<!--") not followed by an end ("-->").
6. render the closing tag as normal.
http://codereview.appspot.com/5218041/diff/4006/src/pkg/html/render.go#newcode82
src/pkg/html/render.go:82: if err := escape(w, n.Data); err != nil {
The tag name states don't allow entities. It may be safest, but there could be
problems if someone decided to use a custom tag name that contained a non-latin
letter and escape were changed to use numeric entities for non-latin codepoints.
http://codereview.appspot.com/5218041/diff/4006/src/pkg/html/render.go#newcod...
src/pkg/html/render.go:103: _, err := w.WriteString("/>")
<blockquote />foo
displays foo indented in Chrome.
<script src="bogus.js"/>Hello, World!
displays a blank page
x<select />y<select><option>foo</option></select>
displays x and a select with no options because the HTML5 tree building algo
treats the second <select> as a close tag for the first select.
13 years, 10 months ago
(2011-10-06 12:23:11 UTC)
#3
http://codereview.appspot.com/5218041/diff/4006/src/pkg/html/render.go
File src/pkg/html/render.go (right):
http://codereview.appspot.com/5218041/diff/4006/src/pkg/html/render.go#newcode82
src/pkg/html/render.go:82: if err := escape(w, n.Data); err != nil {
On 2011/10/06 12:16:49, MikeSamuel wrote:
> The tag name states don't allow entities. It may be safest, but there could
be
> problems if someone decided to use a custom tag name that contained a
non-latin
> letter and escape were changed to use numeric entities for non-latin
codepoints.
IIUC, you're saying to use "w.WriteString(n.Data)", right? Or is it more subtle
than that?
http://codereview.appspot.com/5218041/diff/4006/src/pkg/html/render.go#newcod...
src/pkg/html/render.go:103: _, err := w.WriteString("/>")
On 2011/10/06 12:16:49, MikeSamuel wrote:
> <blockquote />foo
> displays foo indented in Chrome.
>
> <script src="bogus.js"/>Hello, World!
> displays a blank page
>
> x<select />y<select><option>foo</option></select>
> displays x and a select with no options because the HTML5 tree building algo
> treats the second <select> as a close tag for the first select.
Do you know of a blacklist of tags that cannot be self-closing, then? Or perhaps
a whitelist of tags that should be (like <hr/>)?
13 years, 10 months ago
(2011-10-06 12:51:41 UTC)
#4
http://codereview.appspot.com/5218041/diff/4006/src/pkg/html/render.go
File src/pkg/html/render.go (right):
http://codereview.appspot.com/5218041/diff/4006/src/pkg/html/render.go#newcode82
src/pkg/html/render.go:82: if err := escape(w, n.Data); err != nil {
On 2011/10/06 12:23:11, nigeltao wrote:
> On 2011/10/06 12:16:49, MikeSamuel wrote:
> > The tag name states don't allow entities. It may be safest, but there could
> be
> > problems if someone decided to use a custom tag name that contained a
> non-latin
> > letter and escape were changed to use numeric entities for non-latin
> codepoints.
>
> IIUC, you're saying to use "w.WriteString(n.Data)", right? Or is it more
subtle
> than that?
I'd just do w.WriteString(n.Data) here, for attribute names, and closing tag
names, but I don't feel very strongly about it if you want to leave it in as a
safety measure.
http://codereview.appspot.com/5218041/diff/4006/src/pkg/html/render.go#newcod...
src/pkg/html/render.go:103: _, err := w.WriteString("/>")
On 2011/10/06 12:23:11, nigeltao wrote:
> On 2011/10/06 12:16:49, MikeSamuel wrote:
> > <blockquote />foo
> > displays foo indented in Chrome.
> >
> > <script src="bogus.js"/>Hello, World!
> > displays a blank page
> >
> > x<select />y<select><option>foo</option></select>
> > displays x and a select with no options because the HTML5 tree building algo
> > treats the second <select> as a close tag for the first select.
>
> Do you know of a blacklist of tags that cannot be self-closing, then? Or
perhaps
> a whitelist of tags that should be (like <hr/>)?
I do not know of a blacklist. It's possible to derive one from the tokenization
algo.
But 8.1.2.1 ( http://www.w3.org/TR/html5/syntax.html#start-tags ) says
"""
6. Then, if the element is one of the void elements, or if the element is a
foreign element, then there may be a single U+002F SOLIDUS character (/). This
character has no effect on void elements, but on foreign elements it marks the
start tag as self-closing.
"""
So I'm not sure the / does anything for non-foreign elements.
For a whitelist, http://www.w3.org/TR/html5/syntax.html#void-elements
area, base, br, col, command, embed, hr, img, input, keygen, link, meta, param,
source, track, wbr
which is all the element names from
http://www.w3.org/TR/html5/index.html#elements-1 where the children column is
"empty".
PTAL. As for naming, I see three options: Render, WriteTo, and Marshal. I'm uncomfortable with ...
13 years, 10 months ago
(2011-10-07 05:49:11 UTC)
#5
PTAL.
As for naming, I see three options: Render, WriteTo, and Marshal.
I'm uncomfortable with WriteTo because a Node is not an io.Reader or an
io.Writer the way a bytes.Buffer is, and if it was WriteTo, it would be odd to
not then rename Parse to ReadFrom, but that just sounds weird.
Package xml has Marshal and Unmarshal, although Unmarshal takes a destination
argument, but html.Parse returns a new value. xml.Marshal is also a function,
not a method. Maybe html.Render should be a function.
This is all just thinking out loud.
http://codereview.appspot.com/5218041/diff/8001/src/pkg/html/render.go File src/pkg/html/render.go (right): http://codereview.appspot.com/5218041/diff/8001/src/pkg/html/render.go#newcode54 src/pkg/html/render.go:54: case DocumentNode: Should scopeMarkerNode be handled in the same ...
13 years, 10 months ago
(2011-10-07 06:45:14 UTC)
#6
On 2011/10/07 05:49:11, nigeltao wrote: > PTAL. > > As for naming, I see three ...
13 years, 10 months ago
(2011-10-07 08:04:05 UTC)
#7
On 2011/10/07 05:49:11, nigeltao wrote:
> PTAL.
>
> As for naming, I see three options: Render, WriteTo, and Marshal.
I like render.
> I'm uncomfortable with WriteTo because a Node is not an io.Reader or an
> io.Writer the way a bytes.Buffer is, and if it was WriteTo, it would be odd to
> not then rename Parse to ReadFrom, but that just sounds weird.
>
> Package xml has Marshal and Unmarshal, although Unmarshal takes a destination
> argument, but html.Parse returns a new value. xml.Marshal is also a function,
> not a method. Maybe html.Render should be a function.
I don't think Node can be changed to an interface type without API-breaking
changes, so we don't have much greater flexibility because it is a method.
Making Render a function sounds fine to me.
> This is all just thinking out loud.
Render is now a function, not a method. http://codereview.appspot.com/5218041/diff/8001/src/pkg/html/render.go File src/pkg/html/render.go (right): http://codereview.appspot.com/5218041/diff/8001/src/pkg/html/render.go#newcode54 src/pkg/html/render.go:54: case ...
13 years, 10 months ago
(2011-10-07 08:15:44 UTC)
#8
http://codereview.appspot.com/5218041/diff/14001/src/pkg/html/render.go File src/pkg/html/render.go (right): http://codereview.appspot.com/5218041/diff/14001/src/pkg/html/render.go#newcode35 src/pkg/html/render.go:35: // any <!--comment--> input. I'd like to see comments ...
13 years, 10 months ago
(2011-10-07 21:58:39 UTC)
#11
I'm thinking about renaming Parse/Render to Unmarshal/Marshal, as part of making the standard package library ...
13 years, 10 months ago
(2011-10-08 00:44:02 UTC)
#12
I'm thinking about renaming Parse/Render to Unmarshal/Marshal, as part of making
the standard package library consistent in the way it uses the terms Write,
Marshal, Encode, Put and Render. But a rename would require a gofix, and even if
it was just for http, it would still be outside the scope of this CL.
http://codereview.appspot.com/5218041/diff/14001/src/pkg/html/render.go
File src/pkg/html/render.go (right):
http://codereview.appspot.com/5218041/diff/14001/src/pkg/html/render.go#newco...
src/pkg/html/render.go:35: // any <!--comment--> input.
On 2011/10/07 21:58:39, andybalholm wrote:
> I'd like to see comments included in the output if they're in the tree. They
> could be sanitized by making sure the content begins and ends with spaces and
> replacing any "--" sequences with something else (maybe em dashes?).
Parse elides comments, and it's simpler if Render does likewise.
http://codereview.appspot.com/5218041/diff/14001/src/pkg/html/render.go#newco...
src/pkg/html/render.go:53: return escape(w, n.Data)
On 2011/10/07 16:22:13, andybalholm wrote:
> If the parent node is a <script> or <style> element, the text should not be
> escaped. Character references are not interpreted inside those elements. (By
> compliant parsers, that is.)
I think that that's covered by the script/style TODO below.
http://codereview.appspot.com/5218041/diff/14001/src/pkg/html/render.go#newco...
src/pkg/html/render.go:69: if err := escape(w, n.Data); err != nil {
On 2011/10/07 16:22:13, andybalholm wrote:
> I don't think entities are interpreted in doctypes either.
Done.
http://codereview.appspot.com/5218041/diff/14001/src/pkg/html/render.go#newco...
src/pkg/html/render.go:99: if err := escape(w, a.Key); err != nil {
On 2011/10/07 16:22:13, andybalholm wrote:
> another place where entities won't be interpreted
Done.
http://codereview.appspot.com/5218041/diff/14001/src/pkg/html/render.go#newco...
src/pkg/html/render.go:102: if _, err := w.WriteString(`="`); err != nil {
On 2011/10/07 16:22:13, andybalholm wrote:
> In my opinion, it would look a little cleaner to leave off the ="" for empty
> attributes. But it's not a big deal.
I think it's more regular if every attribute has an '=', even if the attribute
value is empty.
On Oct 7, 2011, at 5:44 PM, nigeltao@golang.org wrote: > I'm thinking about renaming Parse/Render ...
13 years, 10 months ago
(2011-10-08 15:02:00 UTC)
#14
On Oct 7, 2011, at 5:44 PM, nigeltao@golang.org wrote:
> I'm thinking about renaming Parse/Render to Unmarshal/Marshal, as part
> of making the standard package library consistent in the way it uses the
> terms Write, Marshal, Encode, Put and Render. But a rename would require
> a gofix, and even if it was just for http, it would still be outside the
> scope of this CL.
I see a psychological difference between HTML and some of the formats that use
Marshal/Unmarshal. Marshal makes sense to me when the in-memory data structure
is seen as primary, and the serialization is seen as a way to store or
communicate it. But in the case of HTML, the textual form is seen as primary,
and the element tree is seen as a way to process it in memory.
There is no technical difference, but I think there is a difference in how
people think about the formats. So using the same terminology might not be
appropriate.
(Render does have the problem, though, that people might think about rendering
the page to an image file.)
Andy
This isn't really about Render, but it's relevant to a comment on this review. http://codereview.appspot.com/5218041/diff/14001/src/pkg/html/render.go ...
13 years, 10 months ago
(2011-10-08 20:41:43 UTC)
#15
This isn't really about Render, but it's relevant to a comment on this review.
http://codereview.appspot.com/5218041/diff/14001/src/pkg/html/render.go
File src/pkg/html/render.go (right):
http://codereview.appspot.com/5218041/diff/14001/src/pkg/html/render.go#newco...
src/pkg/html/render.go:35: // any <!--comment--> input.
On 2011/10/08 00:44:02, nigeltao wrote:
> Parse elides comments, and it's simpler if Render does likewise.
I just discovered that the test data we're using for Parse assumes that comments
aren't elided. I've been working on parser improvements, and when I got to test
27, here is what I got:
--- FAIL: html.TestParser (0.00 seconds)
tests1.dat test #27 "<!--><div>--<!-->", actual vs expected:
----
| <html>
| <head>
| <body>
| <div>
| "--"
----
| <!-- -->
| <html>
| <head>
| <body>
| <div>
| "--"
| <!-- -->
----
On 2011/10/08 00:44:02, nigeltao wrote: > I'm thinking about renaming Parse/Render to Unmarshal/Marshal, as part ...
13 years, 10 months ago
(2011-10-09 06:57:00 UTC)
#16
On 2011/10/08 00:44:02, nigeltao wrote:
> I'm thinking about renaming Parse/Render to Unmarshal/Marshal, as part of
making
> the standard package library consistent in the way it uses the terms Write,
> Marshal, Encode, Put and Render. But a rename would require a gofix, and even
if
> it was just for http, it would still be outside the scope of this CL.
I think that using common terms makes for a more learnable API and more readable
code so should be preferred unless there are good definitions for both terms and
the obscure term's definition is a significantly better match.
If it's any help, searching codesearch for the pair parse and render yields
approx. 1M hits..
http://www.google.com/codesearch#search/&q=%5Cbparse.*%5Cbrender%7C%5Cbrender...
A similar search for marshal and unmarshal yields 150k hits.
\bmarshal.*\bunmarshal|\bunmarshal.*\bmarshal
Standard disclaimer: estimated result counts are heuristic.
Issue 5218041: code review 5218041: html: add a Node.Render method.
(Closed)
Created 13 years, 10 months ago by nigeltao
Modified 13 years, 10 months ago
Reviewers:
Base URL:
Comments: 23