|
|
|
Created:
14 years, 3 months ago by mpvl Modified:
14 years, 2 months ago Reviewers:
CC:
r, bsiegert, r2, brainman, golang-dev Visibility:
Public. |
Descriptionexp/norm: maketables tool for generating tables for normalization.
Patch Set 1 #Patch Set 2 : diff -r 743cfe1095ed https://go.googlecode.com/hg/go-pub #Patch Set 3 : diff -r 743cfe1095ed https://go.googlecode.com/hg/go-pub #
Total comments: 3
Patch Set 4 : diff -r 743cfe1095ed https://go.googlecode.com/hg/go-pub #Patch Set 5 : diff -r 743cfe1095ed https://go.googlecode.com/hg/go-pub #
Total comments: 2
Patch Set 6 : diff -r 743cfe1095ed https://go.googlecode.com/hg/go-pub #Patch Set 7 : diff -r 743cfe1095ed https://go.googlecode.com/hg/go-pub #Patch Set 8 : diff -r 743cfe1095ed https://go.googlecode.com/hg/go-pub #Patch Set 9 : diff -r 743cfe1095ed https://go.googlecode.com/hg/go-pub #Patch Set 10 : diff -r 220cd3510c65 https://go.googlecode.com/hg/go-pub #Patch Set 11 : diff -r 89c7137ea35c https://go.googlecode.com/hg #Patch Set 12 : diff -r 89c7137ea35c https://go.googlecode.com/hg #Patch Set 13 : diff -r 89c7137ea35c https://go.googlecode.com/hg #Patch Set 14 : diff -r 89c7137ea35c https://go.googlecode.com/hg #Patch Set 15 : diff -r 89c7137ea35c https://go.googlecode.com/hg #Patch Set 16 : diff -r 89c7137ea35c https://go.googlecode.com/hg #
Total comments: 109
Patch Set 17 : diff -r 99e4d069d03f https://go.googlecode.com/hg #Patch Set 18 : diff -r 99e4d069d03f https://go.googlecode.com/hg #Patch Set 19 : diff -r 99e4d069d03f https://go.googlecode.com/hg #
Total comments: 1
Patch Set 20 : diff -r 2165d97e8e19 https://go.googlecode.com/hg #Patch Set 21 : diff -r 2165d97e8e19 https://go.googlecode.com/hg #Patch Set 22 : diff -r 2165d97e8e19 https://go.googlecode.com/hg #Patch Set 23 : diff -r 2165d97e8e19 https://go.googlecode.com/hg #Patch Set 24 : diff -r 2165d97e8e19 https://go.googlecode.com/hg #
MessagesTotal messages: 20
Hello r@golang.org (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/go-pub
Sign in to reply to this message.
I am not making any comments on the usefulness of this CL. Here are just a few comments that crossed my mind while reading this. http://codereview.appspot.com/4662080/diff/4001/src/pkg/exp/form/Makefile File src/pkg/exp/form/Makefile (right): http://codereview.appspot.com/4662080/diff/4001/src/pkg/exp/form/Makefile#new... src/pkg/exp/form/Makefile:1: # Copyright 2009 The Go Authors. All rights reserved. 2011 http://codereview.appspot.com/4662080/diff/4001/src/pkg/exp/form/Makefile#new... src/pkg/exp/form/Makefile:20: ./maketables > tables.go This will break under GOOS=windows, you need to account for the extension of the executable. http://codereview.appspot.com/4662080/diff/4001/src/pkg/exp/form/maketables.go File src/pkg/exp/form/maketables.go (right): http://codereview.appspot.com/4662080/diff/4001/src/pkg/exp/form/maketables.g... src/pkg/exp/form/maketables.go:1: // Copyright 2009 The Go Authors. All rights reserved. 2011 The whole program looks to me as if it could be written in more concise form as one or more awk scripts.
Sign in to reply to this message.
http://codereview.appspot.com/4662080/diff/4002/src/pkg/exp/form/tables.go File src/pkg/exp/form/tables.go (right): http://codereview.appspot.com/4662080/diff/4002/src/pkg/exp/form/tables.go#ne... src/pkg/exp/form/tables.go:11: var _Dec00000 = []int{118, 105} this way of making data using lots of little variables puts tons of noise into the symbol table and bloats the binary. the only reason to declare these is if they're used more than once, and i think most are used only once. if the reuses are few, it might not be worth making any of these. also int is probably not what you want. int32 maybe? uint32? maybe even uint16
Sign in to reply to this message.
On Jul 7, 2011, at 12:57 AM, bsiegert@gmail.com wrote: > I am not making any comments on the usefulness of this CL. Here are just > a few comments that crossed my mind while reading this. > > > http://codereview.appspot.com/4662080/diff/4001/src/pkg/exp/form/Makefile > File src/pkg/exp/form/Makefile (right): > > http://codereview.appspot.com/4662080/diff/4001/src/pkg/exp/form/Makefile#new... > src/pkg/exp/form/Makefile:1: # Copyright 2009 The Go Authors. All rights > reserved. > 2011 > > http://codereview.appspot.com/4662080/diff/4001/src/pkg/exp/form/Makefile#new... > src/pkg/exp/form/Makefile:20: ./maketables > tables.go > This will break under GOOS=windows, you need to account for the > extension of the executable. > > http://codereview.appspot.com/4662080/diff/4001/src/pkg/exp/form/maketables.go > File src/pkg/exp/form/maketables.go (right): > > http://codereview.appspot.com/4662080/diff/4001/src/pkg/exp/form/maketables.g... > src/pkg/exp/form/maketables.go:1: // Copyright 2009 The Go Authors. All > rights reserved. > 2011 > > The whole program looks to me as if it could be written in more concise > form as one or more awk scripts. But it's also a perfectly fine as a Go program. -rob
Sign in to reply to this message.
On 2011/07/06 14:57:13, bsiegert wrote: > > src/pkg/exp/form/Makefile:20: ./maketables > tables.go > This will break under GOOS=windows, you need to account for the extension of the > executable. > It should be fine inside Makefile. The recipe gets executed by bash, and the code should run fine, even with Windows version of bash. In fact, we already doing something like that, see pkg/unicode/Makefile. Alex
Sign in to reply to this message.
http://codereview.appspot.com/4662080/diff/4002/src/pkg/exp/form/tables.go File src/pkg/exp/form/tables.go (right): http://codereview.appspot.com/4662080/diff/4002/src/pkg/exp/form/tables.go#ne... src/pkg/exp/form/tables.go:11: var _Dec00000 = []int{118, 105} There is quite a bit of reuse, as these are used by both the decompNFC and decompNFKC maps. I'll change it to an array or arrays and store an index. The use of int is consistent with usage of int for rune throughout the code. uint32 would be more appropriate. I plan to change this table to hold utf8 character, after which this point becomes moot. Leaving it to be int for now.
Sign in to reply to this message.
On 07/07/2011, at 9:06 PM, mpvl@golang.org wrote: > > http://codereview.appspot.com/4662080/diff/4002/src/pkg/exp/form/tables.go > File src/pkg/exp/form/tables.go (right): > > http://codereview.appspot.com/4662080/diff/4002/src/pkg/exp/form/tables.go#ne... > src/pkg/exp/form/tables.go:11: var _Dec00000 = []int{118, 105} > There is quite a bit of reuse, as these are used by both the decompNFC > and decompNFKC maps. I'll change it to an array or arrays and store an > index. > > The use of int is consistent with usage of int for rune throughout the > code. uint32 would be more appropriate. I plan to change this table to > hold utf8 character, after which this point becomes moot. Leaving it to > be int for now. ok, but a change to 64-bit ints on amd64 will happen one day so for data like this it's good to keep the size under control if possible. since this is a work in progress it's not a worry at the moment. still, have a look at unicode.RangeTable, which uses sized ints for a similar purpose. yes, you need casts, but they're hidden under the API. -rob
Sign in to reply to this message.
> ok, but a change to 64-bit ints on amd64 will happen one day so for data like > this it's good to keep the size under control if possible. since this is a work > in progress it's not a worry at the moment. still, have a look at > unicode.RangeTable, which uses sized ints for a similar purpose. yes, you need > casts, but they're hidden under the API. Yes that makes sense. As I may change these lists to utf8 anyway, I prefer to leave it as is for now.
Sign in to reply to this message.
Okay, as it is easy enough, I've modified it to be int32. Marcel On 2011/07/20 09:57:30, mpvl wrote: > > ok, but a change to 64-bit ints on amd64 will happen one day so for data like > > this it's good to keep the size under control if possible. since this is a > work > > in progress it's not a worry at the moment. still, have a look at > > unicode.RangeTable, which uses sized ints for a similar purpose. yes, you need > > casts, but they're hidden under the API. > Yes that makes sense. As I may change these lists to utf8 anyway, I prefer to > leave it as is for now.
Sign in to reply to this message.
Hello r@golang.org, bsiegert@gmail.com, r@google.com, alex.brainman@gmail.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Hello r@golang.org, bsiegert@gmail.com, r@google.com, alex.brainman@gmail.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
i'm at OSCON this week without much code review bandwidth. the tables you're generating are enormous. not just the numbers; each slice header is two more words. i think it all adds up to several hundred kilobytes. can you find a more compact representation?
Sign in to reply to this message.
Yes, I was already working on this. This was just an intermediate working solution. I'm now implementing the decomposition data as a trie akin the trie for the qc and ccc data. I'm also storing the decomposition data as utf8 instead of utf32, which further reduces the size. The total structure size for decomposition for all forms is about 50K, of which 17K is decomposition data and the rest is index. Later, this can be further optimized, for example by eliminating values that could be algorithmically determined, merging the trie indexes, or optimizing the trie structure. On Wed, Jul 27, 2011 at 6:57 PM, <r@golang.org> wrote: > i'm at OSCON this week without much code review bandwidth. > > the tables you're generating are enormous. not just the numbers; each > slice header is two more words. i think it all adds up to several > hundred kilobytes. can you find a more compact representation? > > > http://codereview.appspot.com/**4662080/<http://codereview.appspot.com/4662080/> >
Sign in to reply to this message.
Hello r@golang.org, bsiegert@gmail.com, r@google.com, alex.brainman@gmail.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Made a few changes to the tables: - Decompositions are now pre-expanded. - Decompositions are now represented in utf8 and without slices. - CCC table is removed (no longer needed; data is already in charInfo table) These new representations also result in some significant performance boosts for some of the benchmarks. The total size of all tables is now roughly 80K. Note that there is still plenty of room for further compaction, but I prefer to do this incrementally, so I can keep tabs on performance . On 2011/07/27 16:57:30, r wrote: > i'm at OSCON this week without much code review bandwidth. > > the tables you're generating are enormous. not just the numbers; each slice > header is two more words. i think it all adds up to several hundred kilobytes. > can you find a more compact representation?
Sign in to reply to this message.
apologies for the slow response. i'll be quicker in future. it's coming together nicely. http://codereview.appspot.com/4662080/diff/30007/src/pkg/Makefile File src/pkg/Makefile (right): http://codereview.appspot.com/4662080/diff/30007/src/pkg/Makefile#newcode85 src/pkg/Makefile:85: >>>>>>> other resolve http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/Makefile File src/pkg/exp/norm/Makefile (right): http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/Makefile#ne... src/pkg/exp/norm/Makefile:25: test: maketables make this be testshort and the full one be test. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.go File src/pkg/exp/norm/maketables.go (right): http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:53: var logger = log.New(os.Stderr, "", log.Lshortfile) you might want to mimic the -local flag i put into unicode/maketables http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:81: type QCResult int comment explaining this type. not obvious when encountered that QC means Quick Check, or even what that means http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:84: QCUnknown = iota shouldn't this be QCUnknown QCResult = iota ? http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:102: const ( comment - what piece to these refer to? i think they correspond to NF and NFK but it's worth saying - and using them below http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:115: forms [2]FormInfo // 0: NF[CD], 1: NFK[CD] that 2 could be a constant defined in the const block above and the 0 and 1 could be names http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:123: buf := bytes.NewBuffer(make([]byte, 0)) var b = new(bytes.Buf) NewBuffer is for initializing for read http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:158: MComposed = iota again, what are these for? http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:169: inDecomp bool // Some decompositons result in this char. s/decompositons/decompositions/ nice word, though - the atoms you get from decomposing? http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:199: return s[:len(s)-1] + "]" this whole loop etc. could be return fmt.Sprintf("%.4X", d) http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:203: var b [100 * utf8.UTFMax]byte where does 100 come from? http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:205: for i := 0; i < len(d); i++ { for _, c := range d { n += utf.EncodeRune(b[n:], c) } except i think this whole function can be return string([]int(d)) http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:213: if len(*url) == 0 || (*url)[0] == '.' || (*url)[0] == '/' { oh, i see, you're already close to my local flag. the -url doc string should say that, or just use my -local notion. you might take a look at it anyway. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:233: if len(s) == 0 { this test is fine but i'm certain you don't need it unless you're worried about performance for a common empty case http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:263: if point > MaxChar { is this worth logging? http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:266: var state State clearer to say state := SNormal http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:290: logger.Fatalf("%U: bad decomp |%v|: \"%s\"", int(x), decmap, e) use `` for the main string and you don't need to quote the "" http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:322: line, err := input.ReadString('\n') use ReadLine and you don't need to drop the '\n' below, although the code is fine as is http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:398: return 0xAC00 <= rune && rune <= 0xD7A3 D7A3 is a magic number. is it defined by unicode as a permanent cutoff? in any case i'd name both these numbers as constants with comments so they are easy to find and fix if necesary http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:401: func _ccc(rune int) uint8 { why the _? http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:411: // Use bubble sort. how long can d be? http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:429: dn := len(dcomp) there's no need to store a variable like this. len is extremely cheap and registerizable. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:433: for i := 0; i < dn; i++ { for _, c := range dcomp { } http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:441: // Phase 0: pre-expand decomposition s/$/./ http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:442: for i := 0; i < len(chars); i++ { for _, char := range chars { http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:449: for i := 0; i < len(d); i++ { for _. c := range d { http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:455: // Phase 1: composition exclusion, mark decomposition s/$/./ http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:456: for i := 0; i < len(chars); i++ { for _, char := range chars { http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:460: // Marks script-specific exclusions and version restricted s/$/./ and so on, for consistency http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:479: for j := 0; j < len(f.decomp); j++ { for _, decomp := range f.decomp { http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:484: // Phase 2: forward and backward combining s/$/./ http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:485: for i := 0; i < len(chars); i++ { and so on http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:502: for i := 0; i < len(chars); i++ { for i, char := range chars { http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:519: if 0x1161 <= i && i < 0x1176 { // Jamo V magic numbers http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:523: if 0x11A8 <= i && i < 0x11C3 { // Jamo T magic numbers http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:537: table map[int]*trieNode is this worth a map as opposed to an array (not slice)? if you used uint8 for the index, it's only 2K per entry on amd64 and you can avoid index checks and other unpleasantness. i think it's not much bigger and it's much much faster. if you expect mostly sparse arrays, you could make a variable-sized slice of pairs so if memory's not constrained i'd go with array. also you could make it a pointer to array and then leaf is a method that returns n.table==nil http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:564: func internal(n *trieNode) bool { method on *trieNode? also i'd call this isInternal http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:577: func insert(n *trieNode, rune int, value uint16) { method on *trieNode? http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:608: binary.Write(hasher, binary.LittleEndian, v) this is overkill. i'd just write the two bytes by hand, but there's certainly nothing wrong with what you have and leave it if you prefer. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:662: for i := 0; i < len(b); i++ { for i, c := range b { (you could use for i, b := range b { and that's good Go but it might confuse you) http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:674: // returns the size of the generated tables s/returns/printTrieTables returns/ s/$/./ more of these functions could use such comments. (http://go/gocomments#TOC-CommentSentences) http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:718: var e uint16 = 0 s/ = 0// not wrong but unnecessary another idiomatic way (either way is fine) is e := uint16(0) that makes it clear you want a zero http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:720: e = e | 0x8 e |= 0x8 etc. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:732: log.Fatal("Illegal quickcheck value.") drop the period, give more information about what went wrong (which value?) http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:751: for i := 0; i < len(chars); i++ { for i, char := range chars { http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:769: for i := 0; i < len(chars); i++ { for _, c := range chars { http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:805: for i := 0; i < len(sa); i++ { for _, a := range sa { http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:816: fields := strings.Split(*url, "/") isn't it better defined than that? third-last field? http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:835: "// Generated by running\n"+ use `` and this gets much nicer http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:871: for i := 0; i < len(chars); i++ { for i, char := range chars { http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:886: fmt.Printf("// Total size of tables: %dKB (%d bytes)\n", size/1024, size) (size + 512)/1024 http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:891: for i := 0; i < len(chars); i++ { for _, c := range chars { http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:971: ftype, mode = FCompatibility, MDecomposed default? http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:980: qr = QCMaybe default? http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:1007: m := "%U: FAIL F:%d M:%d (was %v need Yes) %s\n" why? anyway it's a constant, not a var
Sign in to reply to this message.
http://codereview.appspot.com/4662080/diff/30007/src/pkg/Makefile File src/pkg/Makefile (right): http://codereview.appspot.com/4662080/diff/30007/src/pkg/Makefile#newcode85 src/pkg/Makefile:85: >>>>>>> other On 2011/08/04 03:20:41, r wrote: > resolve Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/Makefile File src/pkg/exp/norm/Makefile (right): http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/Makefile#ne... src/pkg/exp/norm/Makefile:25: test: maketables On 2011/08/04 03:20:41, r wrote: > make this be testshort and the full one be test. Done. I'm now getting this error, though: Makefile:30: warning: overriding commands for target `test' ../../../Make.pkg:65: warning: ignoring old commands for target `test' Rings a bell? http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.go File src/pkg/exp/norm/maketables.go (right): http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:53: var logger = log.New(os.Stderr, "", log.Lshortfile) Using -local flag for consistency. On 2011/08/04 03:20:41, r wrote: > you might want to mimic the -local flag i put into unicode/maketables http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:81: type QCResult int On 2011/08/04 03:20:41, r wrote: > comment explaining this type. not obvious when encountered that QC means Quick > Check, or even what that means Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:84: QCUnknown = iota Indeed. Done. On 2011/08/04 03:20:41, r wrote: > shouldn't this be > QCUnknown QCResult = iota > ? http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:102: const ( On 2011/08/04 03:20:41, r wrote: > comment - what piece to these refer to? i think they correspond to NF and NFK > but it's worth saying - and using them below Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:115: forms [2]FormInfo // 0: NF[CD], 1: NFK[CD] On 2011/08/04 03:20:41, r wrote: > that 2 could be a constant defined in the const block above and the 0 and 1 > could be names Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:123: buf := bytes.NewBuffer(make([]byte, 0)) On 2011/08/04 03:20:41, r wrote: > var b = new(bytes.Buf) > > NewBuffer is for initializing for read Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:158: MComposed = iota Explained and moved up with the form types. On 2011/08/04 03:20:41, r wrote: > again, what are these for? http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:169: inDecomp bool // Some decompositons result in this char. Thanks! Maybe they'll soon be discovered in the LHC nearby here in Geneva. Done. On 2011/08/04 03:20:41, r wrote: > s/decompositons/decompositions/ > nice word, though - the atoms you get from decomposing? http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:199: return s[:len(s)-1] + "]" Cool! Done. On 2011/08/04 03:20:41, r wrote: > this whole loop etc. could be > return fmt.Sprintf("%.4X", d) http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:203: var b [100 * utf8.UTFMax]byte Removed. On 2011/08/04 03:20:41, r wrote: > where does 100 come from? http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:205: for i := 0; i < len(d); i++ { Cool. Removed function altogether. On 2011/08/04 03:20:41, r wrote: > for _, c := range d { > n += utf.EncodeRune(b[n:], c) > } > except i think this whole function can be > return string([]int(d)) http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:213: if len(*url) == 0 || (*url)[0] == '.' || (*url)[0] == '/' { Used your -local flag for consistency. On 2011/08/04 03:20:41, r wrote: > oh, i see, you're already close to my local flag. > the -url doc string should say that, or just use my -local notion. > you might take a look at it anyway. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:233: if len(s) == 0 { Not worried about it. Removed. On 2011/08/04 03:20:41, r wrote: > this test is fine but i'm certain you don't need it unless you're worried about > performance for a common empty case http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:263: if point > MaxChar { Yes. On 2011/08/04 03:20:41, r wrote: > is this worth logging? http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:266: var state State On 2011/08/04 03:20:41, r wrote: > clearer to say state := SNormal Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:290: logger.Fatalf("%U: bad decomp |%v|: \"%s\"", int(x), decmap, e) On 2011/08/04 03:20:41, r wrote: > use `` for the main string and you don't need to quote the "" Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:322: line, err := input.ReadString('\n') On 2011/08/04 03:20:41, r wrote: > use ReadLine and you don't need to drop the '\n' below, although the code is > fine as is Leaving it as is, as it will result in various changes because of the type change of the returned values ([]byte instead of string). http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:398: return 0xAC00 <= rune && rune <= 0xD7A3 It is based on HangulBase + the number of Jamo combinations. Pretty much constant, unless Hangul is changed. Changed to constants. On 2011/08/04 03:20:41, r wrote: > D7A3 is a magic number. is it defined by unicode as a permanent cutoff? > in any case i'd name both these numbers as constants with comments so they are > easy to find and fix if necesary http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:401: func _ccc(rune int) uint8 { Legacy from when I copied it in. Removed. On 2011/08/04 03:20:41, r wrote: > why the _? http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:411: // Use bubble sort. On 2011/08/04 03:20:41, r wrote: > how long can d be? Maximum is 18, but that is highly exceptional. Typical is 1 or 2. For NFC|D it is no longer than 3. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:429: dn := len(dcomp) On 2011/08/04 03:20:41, r wrote: > there's no need to store a variable like this. len is extremely cheap and > registerizable. Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:433: for i := 0; i < dn; i++ { On 2011/08/04 03:20:41, r wrote: > for _, c := range dcomp { > } Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:441: // Phase 0: pre-expand decomposition On 2011/08/04 03:20:41, r wrote: > s/$/./ Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:442: for i := 0; i < len(chars); i++ { On 2011/08/04 03:20:41, r wrote: > for _, char := range chars { This doesn't work, though, as it assigns char by value. Using this form using the first iteration variable. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:449: for i := 0; i < len(d); i++ { On 2011/08/04 03:20:41, r wrote: > for _. c := range d { Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:456: for i := 0; i < len(chars); i++ { On 2011/08/04 03:20:41, r wrote: > for _, char := range chars { Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:460: // Marks script-specific exclusions and version restricted On 2011/08/04 03:20:41, r wrote: > s/$/./ > and so on, for consistency Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:479: for j := 0; j < len(f.decomp); j++ { On 2011/08/04 03:20:41, r wrote: > for _, decomp := range f.decomp { Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:484: // Phase 2: forward and backward combining On 2011/08/04 03:20:41, r wrote: > s/$/./ Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:485: for i := 0; i < len(chars); i++ { On 2011/08/04 03:20:41, r wrote: > and so on Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:502: for i := 0; i < len(chars); i++ { On 2011/08/04 03:20:41, r wrote: > for i, char := range chars { Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:519: if 0x1161 <= i && i < 0x1176 { // Jamo V On 2011/08/04 03:20:41, r wrote: > magic numbers Changed to constants. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:523: if 0x11A8 <= i && i < 0x11C3 { // Jamo T On 2011/08/04 03:20:41, r wrote: > magic numbers Changed to constants. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:537: table map[int]*trieNode On 2011/08/04 03:20:41, r wrote: > is this worth a map as opposed to an array (not slice)? if you used uint8 for > the index, it's only 2K per entry on amd64 and you can avoid index checks and > other unpleasantness. i think it's not much bigger and it's much much faster. > > if you expect mostly sparse arrays, you could make a variable-sized slice of > pairs > > so if memory's not constrained i'd go with array. also you could make it a > pointer to array and then leaf is a method that returns n.table==nil Memory nor speed is a constraint here. I used tables because the resulting code looked slightly better. Changed it to array, though. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:564: func internal(n *trieNode) bool { On 2011/08/04 03:20:41, r wrote: > method on *trieNode? also i'd call this isInternal Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:577: func insert(n *trieNode, rune int, value uint16) { On 2011/08/04 03:20:41, r wrote: > method on *trieNode? Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:608: binary.Write(hasher, binary.LittleEndian, v) On 2011/08/04 03:20:41, r wrote: > this is overkill. i'd just write the two bytes by hand, but there's certainly > nothing wrong with what you have and leave it if you prefer. Replaced. The result is 1 char longer, but I can remove the import. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:662: for i := 0; i < len(b); i++ { On 2011/08/04 03:20:41, r wrote: > for i, c := range b { > > (you could use > > for i, b := range b { > > and that's good Go but it might confuse you) Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:718: var e uint16 = 0 On 2011/08/04 03:20:41, r wrote: > s/ = 0// > not wrong but unnecessary > another idiomatic way (either way is fine) is > e := uint16(0) > that makes it clear you want a zero I like the latter. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:720: e = e | 0x8 On 2011/08/04 03:20:41, r wrote: > e |= 0x8 > etc. Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:732: log.Fatal("Illegal quickcheck value.") On 2011/08/04 03:20:41, r wrote: > drop the period, give more information about what went wrong (which value?) Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:769: for i := 0; i < len(chars); i++ { On 2011/08/04 03:20:41, r wrote: > for _, c := range chars { Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:805: for i := 0; i < len(sa); i++ { On 2011/08/04 03:20:41, r wrote: > for _, a := range sa { Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:816: fields := strings.Split(*url, "/") On 2011/08/04 03:20:41, r wrote: > isn't it better defined than that? third-last field? Almost verbatim copy from unicode/maketables.go. Changed the code to match the spec. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:835: "// Generated by running\n"+ On 2011/08/04 03:20:41, r wrote: > use `` and this gets much nicer Cool! Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:871: for i := 0; i < len(chars); i++ { On 2011/08/04 03:20:41, r wrote: > for i, char := range chars { Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:886: fmt.Printf("// Total size of tables: %dKB (%d bytes)\n", size/1024, size) On 2011/08/04 03:20:41, r wrote: > (size + 512)/1024 Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:891: for i := 0; i < len(chars); i++ { On 2011/08/04 03:20:41, r wrote: > for _, c := range chars { Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:971: ftype, mode = FCompatibility, MDecomposed On 2011/08/04 03:20:41, r wrote: > default? Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:980: qr = QCMaybe On 2011/08/04 03:20:41, r wrote: > default? Done. http://codereview.appspot.com/4662080/diff/30007/src/pkg/exp/norm/maketables.... src/pkg/exp/norm/maketables.go:1007: m := "%U: FAIL F:%d M:%d (was %v need Yes) %s\n" On 2011/08/04 03:20:41, r wrote: > why? > anyway it's a constant, not a var If an entry for the rune was missing in DerivedNormalizationProps.txt, it must be QCYes.
Sign in to reply to this message.
Hello r@golang.org, bsiegert@gmail.com, r@google.com, alex.brainman@gmail.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
LGTM this is a fine checkpoint http://codereview.appspot.com/4662080/diff/53001/src/pkg/Makefile File src/pkg/Makefile (right): http://codereview.appspot.com/4662080/diff/53001/src/pkg/Makefile#newcode81 src/pkg/Makefile:81: exp/eval\ d (the package is gone)
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=456c89dd1321 *** exp/norm: maketables tool for generating tables for normalization. R=r, bsiegert, r, alex.brainman CC=golang-dev http://codereview.appspot.com/4662080
Sign in to reply to this message.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
