for regexp. most of the space is for the Ll and Lu exception tables. i ...
13 years, 10 months ago
(2011-06-14 22:27:33 UTC)
#2
for regexp.
most of the space is for the Ll and Lu exception tables.
i tried a bunch of different things but this was the
simplest i could come up with.
define fold(x) to be the runes equivalent to x.
incredibly, fold(Ll) != Ll+Lu and fold(Lu) != Lu+Ll.
in both cases the fold set is ever so slightly smaller.
russ
quick comments. i'll give another pass a little later http://codereview.appspot.com/4571074/diff/2011/src/pkg/unicode/letter.go File src/pkg/unicode/letter.go (right): http://codereview.appspot.com/4571074/diff/2011/src/pkg/unicode/letter.go#newcode281 src/pkg/unicode/letter.go:281: ...
13 years, 10 months ago
(2011-06-14 22:31:33 UTC)
#3
code looks good but needs the self-check test http://codereview.appspot.com/4571074/diff/2011/src/pkg/unicode/letter.go File src/pkg/unicode/letter.go (right): http://codereview.appspot.com/4571074/diff/2011/src/pkg/unicode/letter.go#newcode282 src/pkg/unicode/letter.go:282: func ...
13 years, 10 months ago
(2011-06-15 02:01:02 UTC)
#4
On Tue, Jun 14, 2011 at 22:01, <r@golang.org> wrote: > code looks good but needs ...
13 years, 10 months ago
(2011-06-15 02:16:10 UTC)
#5
On Tue, Jun 14, 2011 at 22:01, <r@golang.org> wrote:
> code looks good but needs the self-check test
will add
>
http://codereview.appspot.com/4571074/diff/2011/src/pkg/unicode/letter.go#new...
> src/pkg/unicode/letter.go:282: func SimpleFold(dst []int, rune int)
> []int {
> can you explain the api? why are dst and a return value both needed? why
> not just return a []int of the folding? also, what about duplicates?
it appends to a slice so that a call does not imply an allocation.
in case-insensitive mode this is going to get called on every
character in a regular expression; i want to be able to reuse
the same output buffer.
SimpleFold does not consider the existing values in the slice
when appending new ones. among the new runes appended,
there are no duplicates.
On Jun 15, 2011, at 12:15 PM, Russ Cox wrote: > On Tue, Jun 14, ...
13 years, 10 months ago
(2011-06-15 03:13:40 UTC)
#6
On Jun 15, 2011, at 12:15 PM, Russ Cox wrote:
> On Tue, Jun 14, 2011 at 22:01, <r@golang.org> wrote:
>> code looks good but needs the self-check test
>
> will add
>
>>
http://codereview.appspot.com/4571074/diff/2011/src/pkg/unicode/letter.go#new...
>> src/pkg/unicode/letter.go:282: func SimpleFold(dst []int, rune int)
>> []int {
>> can you explain the api? why are dst and a return value both needed? why
>> not just return a []int of the folding? also, what about duplicates?
>
> it appends to a slice so that a call does not imply an allocation.
> in case-insensitive mode this is going to get called on every
> character in a regular expression; i want to be able to reuse
> the same output buffer.
that's what i figured, but it's surprising enough to be mentioned in the
comment.
> SimpleFold does not consider the existing values in the slice
> when appending new ones. among the new runes appended,
> there are no duplicates.
ditto
LGTM http://codereview.appspot.com/4571074/diff/15001/src/pkg/unicode/letter.go File src/pkg/unicode/letter.go (right): http://codereview.appspot.com/4571074/diff/15001/src/pkg/unicode/letter.go#newcode289 src/pkg/unicode/letter.go:289: // the ``simple case folding.'' Among the code ...
13 years, 10 months ago
(2011-06-16 21:23:24 UTC)
#9
On Thu, Jun 16, 2011 at 17:23, <r@golang.org> wrote: > LGTM > > > > ...
13 years, 10 months ago
(2011-06-16 21:40:04 UTC)
#10
On Thu, Jun 16, 2011 at 17:23, <r@golang.org> wrote:
> LGTM
>
>
>
> http://codereview.appspot.com/4571074/diff/15001/src/pkg/unicode/letter.go
> File src/pkg/unicode/letter.go (right):
>
>
http://codereview.appspot.com/4571074/diff/15001/src/pkg/unicode/letter.go#ne...
> src/pkg/unicode/letter.go:289: // the ``simple case folding.'' Among
> the code points equivalent to
> i still think that these quotes look terrible and are unnecessary. i'm
> not even sure why they're there.
They're there because if I write
// ... according to the Unicode-defined simple case folding
I was worried that it sounded like I was the one calling
the rules simple as opposed to referring the rule whose
name is 'simple case folding'.
I will remove them.
Russ
On 17/06/2011, at 7:40 AM, Russ Cox wrote: > On Thu, Jun 16, 2011 at ...
13 years, 10 months ago
(2011-06-16 21:47:30 UTC)
#11
On 17/06/2011, at 7:40 AM, Russ Cox wrote:
> On Thu, Jun 16, 2011 at 17:23, <r@golang.org> wrote:
>> LGTM
>>
>>
>>
>> http://codereview.appspot.com/4571074/diff/15001/src/pkg/unicode/letter.go
>> File src/pkg/unicode/letter.go (right):
>>
>>
http://codereview.appspot.com/4571074/diff/15001/src/pkg/unicode/letter.go#ne...
>> src/pkg/unicode/letter.go:289: // the ``simple case folding.'' Among
>> the code points equivalent to
>> i still think that these quotes look terrible and are unnecessary. i'm
>> not even sure why they're there.
>
> They're there because if I write
>
> // ... according to the Unicode-defined simple case folding
>
> I was worried that it sounded like I was the one calling
> the rules simple as opposed to referring the rule whose
> name is 'simple case folding'.
>
> I will remove them.
the proximity makes the parse clear. you don't need them.
-rob
*** Submitted as http://code.google.com/p/go/source/detail?r=3a81409b9013 *** unicode: add case folding tables R=r, r CC=golang-dev http://codereview.appspot.com/4571074
13 years, 10 months ago
(2011-06-16 21:56:32 UTC)
#15
Issue 4571074: code review 4571074: unicode: add case folding tables
(Closed)
Created 13 years, 10 months ago by rsc
Modified 13 years, 10 months ago
Reviewers:
Base URL:
Comments: 19