Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(964)

Issue 4964045: code review 4964045: exp/template/html: Grammar rules for HTML comments and ... (Closed)

Can't Edit
Can't Publish+Mail
Start Review
Created:
12 years, 8 months ago by MikeSamuel
Modified:
12 years, 8 months ago
Reviewers:
CC:
nigeltao, golang-dev
Visibility:
Public.

Description

exp/template/html: Grammar rules for HTML comments and special tags. Augments type context and adds grammatical rules to handle special HTML constructs: <!-- comments --> <script>raw text</script> <textarea>no tags here</textarea> This CL does not elide comment content. I recommend we do that but have not done it in this CL. I used a codesearch tool over a codebase in another template language. Based on the below I think we should definitely recognize <script>, <style>, <textarea>, and <title> as each of these appears frequently enough that there are few template using apps that do not use most of them. Of the other special tags, <xmp>, <noscript> are used but infrequently, and <noframe> and friend, <listing> do not appear at all. We could support <xmp> even though it is obsolete in HTML5 because we already have the machinery, but I suggest we do not support noscript since it is a normal tag in some browser configurations. I suggest recognizing and eliding <!-- comments --> (but not escaping text spans) as they are widely used to embed comments in template source. Not eliding them increases the size of content sent over the network, and risks leaking code and project internal details. The template language I tested elides them so there are no instance of IE conditional compilation directives in the codebase but that could be a source of confusion. The codesearch does the equivalent of $ find . -name \*.file-extension \ | perl -ne 'print "\L$1\n" while s@<([a-z][a-z0-9])@@i' \ | sort | uniq -c | sort The 5 uses of <plaintext> seem to be in tricky code and can be ignored. The 2 uses of <xmp> appear in the same tricky code and can be ignored. I also ignored end tags to avoid biasing against unary elements and threw out some nonsense names since since the long tail is dominated by uses of < as a comparison operator in the template languages expression language. I have added asterisks next to abnormal elements. 26765 div 7432 span 7414 td 4233 a 3730 tr 3238 input 2102 br 1756 li 1755 img 1674 table 1388 p 1311 th 1064 option 992 b 891 label 714 script * 519 ul 446 tbody 412 button 381 form 377 h2 358 select 353 strong 318 h3 314 body 303 html 266 link 262 textarea * 261 head 258 meta 225 title * 189 h1 176 col 156 style * 151 hr 119 iframe 103 h4 101 pre 100 dt 98 thead 90 dd 83 map 80 i 69 object 66 ol 65 em 60 param 60 font 57 fieldset 51 string 51 field 51 center 44 bidi 37 kbd 35 legend 30 nobr 29 dl 28 var 26 small 21 cite 21 base 20 embed 19 colgroup 12 u 12 canvas 10 sup 10 rect 10 optgroup 10 noscript * 9 wbr 9 blockquote 8 tfoot 8 code 8 caption 8 abbr 7 msg 6 tt 6 text 6 h5 5 svg 5 plaintext * 5 article 4 shortquote 4 number 4 menu 4 ins 3 progress 3 header 3 content 3 bool 3 audio 3 attribute 3 acronym 2 xmp * 2 overwrite 2 objects 2 nobreak 2 metadata 2 description 2 datasource 2 category 2 action

Patch Set 1 #

Patch Set 2 : diff -r 7697a84ee19f https://go.googlecode.com/hg/ #

Patch Set 3 : diff -r e21f17f809f1 https://go.googlecode.com/hg/ #

Patch Set 4 : diff -r e21f17f809f1 https://go.googlecode.com/hg/ #

Patch Set 5 : diff -r e21f17f809f1 https://go.googlecode.com/hg/ #

Patch Set 6 : diff -r 4189b98b96cf https://go.googlecode.com/hg/ #

Patch Set 7 : diff -r 68c83c6fed16 https://go.googlecode.com/hg/ #

Patch Set 8 : diff -r 2a748f320ba5 https://go.googlecode.com/hg/ #

Patch Set 9 : diff -r 504f4e9b079c https://go.googlecode.com/hg/ #

Total comments: 12

Patch Set 10 : diff -r 2892d1e242e6 https://go.googlecode.com/hg/ #

Patch Set 11 : diff -r 2892d1e242e6 https://go.googlecode.com/hg/ #

Unified diffs Side-by-side diffs Delta from patch set Stats (+268 lines, -39 lines) Patch
M src/pkg/exp/template/html/context.go View 1 2 3 4 5 6 7 8 9 4 chunks +50 lines, -1 line 0 comments Download
M src/pkg/exp/template/html/escape.go View 1 2 3 4 5 6 7 8 9 16 chunks +95 lines, -32 lines 0 comments Download
M src/pkg/exp/template/html/escape_test.go View 1 2 3 4 5 6 7 8 7 chunks +123 lines, -6 lines 0 comments Download

Messages

Total messages: 6
MikeSamuel
Hello nigeltao@golang.org (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/
12 years, 8 months ago (2011-08-26 22:02:32 UTC) #1
MikeSamuel
On 2011/08/26 22:02:32, MikeSamuel wrote: > Hello mailto:nigeltao@golang.org (cc: mailto:golang-dev@googlegroups.com), > > I'd like you ...
12 years, 8 months ago (2011-08-30 02:32:11 UTC) #2
MikeSamuel
Merged, updated CL description, and uploaded.
12 years, 8 months ago (2011-09-08 21:53:22 UTC) #3
nigeltao
LGTM. http://codereview.appspot.com/4964045/diff/19001/src/pkg/exp/template/html/context.go File src/pkg/exp/template/html/context.go (right): http://codereview.appspot.com/4964045/diff/19001/src/pkg/exp/template/html/context.go#newcode30 src/pkg/exp/template/html/context.go:30: return c.state == d.state && c.delim == d.delim ...
12 years, 8 months ago (2011-09-09 03:00:07 UTC) #4
MikeSamuel
http://codereview.appspot.com/4964045/diff/19001/src/pkg/exp/template/html/context.go File src/pkg/exp/template/html/context.go (right): http://codereview.appspot.com/4964045/diff/19001/src/pkg/exp/template/html/context.go#newcode30 src/pkg/exp/template/html/context.go:30: return c.state == d.state && c.delim == d.delim && ...
12 years, 8 months ago (2011-09-09 07:02:40 UTC) #5
MikeSamuel
12 years, 8 months ago (2011-09-09 07:07:46 UTC) #6
*** Submitted as http://code.google.com/p/go/source/detail?r=9054e6fb57bc ***

exp/template/html: Grammar rules for HTML comments and special tags.

Augments type context and adds grammatical rules to handle special HTML
constructs:
    <!-- comments -->
    <script>raw text</script>
    <textarea>no tags here</textarea>

This CL does not elide comment content.  I recommend we do that but
have not done it in this CL.

I used a codesearch tool over a codebase in another template language.

Based on the below I think we should definitely recognize
  <script>, <style>, <textarea>, and <title>
as each of these appears frequently enough that there are few
template using apps that do not use most of them.

Of the other special tags,
  <xmp>, <noscript>
are used but infrequently, and
  <noframe> and friend, <listing>
do not appear at all.

We could support <xmp> even though it is obsolete in HTML5
because we already have the machinery, but I suggest we do not
support noscript since it is a normal tag in some browser
configurations.

I suggest recognizing and eliding <!-- comments -->
(but not escaping text spans) as they are widely used to
embed comments in template source.  Not eliding them increases
the size of content sent over the network, and risks leaking
code and project internal details.
The template language I tested elides them so there are
no instance of IE conditional compilation directives in the
codebase but that could be a source of confusion.

The codesearch does the equivalent of
$ find . -name \*.file-extension \
  | perl -ne 'print "\L$1\n" while s@<([a-z][a-z0-9])@@i' \
  | sort | uniq -c | sort

The 5 uses of <plaintext> seem to be in tricky code and can be ignored.
The 2 uses of <xmp> appear in the same tricky code and can be ignored.
I also ignored end tags to avoid biasing against unary
elements and threw out some nonsense names since since the
long tail is dominated by uses of < as a comparison operator
in the template languages expression language.

I have added asterisks next to abnormal elements.

  26765 div
   7432 span
   7414 td
   4233 a
   3730 tr
   3238 input
   2102 br
   1756 li
   1755 img
   1674 table
   1388 p
   1311 th
   1064 option
    992 b
    891 label
    714 script *
    519 ul
    446 tbody
    412 button
    381 form
    377 h2
    358 select
    353 strong
    318 h3
    314 body
    303 html
    266 link
    262 textarea *
    261 head
    258 meta
    225 title *
    189 h1
    176 col
    156 style *
    151 hr
    119 iframe
    103 h4
    101 pre
    100 dt
     98 thead
     90 dd
     83 map
     80 i
     69 object
     66 ol
     65 em
     60 param
     60 font
     57 fieldset
     51 string
     51 field
     51 center
     44 bidi
     37 kbd
     35 legend
     30 nobr
     29 dl
     28 var
     26 small
     21 cite
     21 base
     20 embed
     19 colgroup
     12 u
     12 canvas
     10 sup
     10 rect
     10 optgroup
     10 noscript *
      9 wbr
      9 blockquote
      8 tfoot
      8 code
      8 caption
      8 abbr
      7 msg
      6 tt
      6 text
      6 h5
      5 svg
      5 plaintext *
      5 article
      4 shortquote
      4 number
      4 menu
      4 ins
      3 progress
      3 header
      3 content
      3 bool
      3 audio
      3 attribute
      3 acronym
      2 xmp *
      2 overwrite
      2 objects
      2 nobreak
      2 metadata
      2 description
      2 datasource
      2 category
      2 action

R=nigeltao
CC=golang-dev
http://codereview.appspot.com/4964045
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b