Issue 8056043: code review 8056043: runtime: Implement faster equals for strings and bytes.

khr

Hello bradfitz@golang.org (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://khr%40golang.org@code.google.com/p/go/

12 years, 4 months ago (2013-03-27 22:14:12 UTC) #1

r

run go vet. i think it will ask for a few tweaks to the assembly

12 years, 4 months ago (2013-03-27 22:34:05 UTC) #2

bradfitz

LGTM but I'm not qualified to review runtime/asm_arm.s On Wed, Mar 27, 2013 at 3:14 ...

12 years, 4 months ago (2013-03-27 22:51:25 UTC) #3

bradfitz

https://codereview.appspot.com/8056043/diff/11001/src/pkg/bytes/bytes_test.go File src/pkg/bytes/bytes_test.go (right): https://codereview.appspot.com/8056043/diff/11001/src/pkg/bytes/bytes_test.go#newcode64 src/pkg/bytes/bytes_test.go:64: {[]byte("abcdefgh"), []byte("abcdefgh"), 0}, comment above this line about what ...

12 years, 4 months ago (2013-03-27 23:35:20 UTC) #4

khr1

On Wed, Mar 27, 2013 at 4:35 PM, <bradfitz@golang.org> wrote: > > https://codereview.appspot.**com/8056043/diff/11001/src/** > pkg/bytes/bytes_test.go<https://codereview.appspot.com/8056043/diff/11001/src/pkg/bytes/bytes_test.go> ...

12 years, 4 months ago (2013-03-28 02:34:48 UTC) #5

dave_cheney.net

Thanks. I'll try this on some arm systems soon and get some benchmarks. I'd like ...

12 years, 4 months ago (2013-03-28 03:17:31 UTC) #6

dave_cheney.net

https://codereview.appspot.com/8056043/diff/18001/src/pkg/bytes/asm_386.s File src/pkg/bytes/asm_386.s (right): https://codereview.appspot.com/8056043/diff/18001/src/pkg/bytes/asm_386.s#newcode19 src/pkg/bytes/asm_386.s:19: TEXT ·Equal(SB),7,$12 I would like to suggest that this ...

12 years, 4 months ago (2013-03-28 03:26:05 UTC) #7

dave_cheney.net

Some excellent speedups on arm (chromebook) for Equals, but a regression on some other benchmarks. ...

12 years, 4 months ago (2013-03-28 03:55:42 UTC) #8

Some excellent speedups on arm (chromebook) for Equals, but a
regression on some other benchmarks. A very brief look at the profile
makes me want to blame the additional branch in Index although there
doesn't appear to be any stack splitting.

benchmark                        old ns/op    new ns/op    delta
BenchmarkReadString                 125100       127392   +1.83%
BenchmarkIndexByte32                    76           90  +18.74%
BenchmarkIndexByte4K                  7605         7298   -4.04%
BenchmarkIndexByte4M               7574065      7567863   -0.08%
BenchmarkIndexByte64M            120986945    120754408   -0.19%
BenchmarkIndexBytePortable32           175          175   +0.00%
BenchmarkIndexBytePortable4K         17347        17338   -0.05%
BenchmarkIndexBytePortable4M      18353531     18341798   -0.06%
BenchmarkIndexBytePortable64M    293546342    293211533   -0.11%
BenchmarkEqual32                        77           63  -17.81%
BenchmarkEqual4K                      7289         1855  -74.55%
BenchmarkEqual4M                   8367520      3209180  -61.65%
BenchmarkEqual64M                142515029     61400849  -56.92%
BenchmarkEqualPort32                   224          224   +0.00%
BenchmarkEqualPort4K                 24263        24250   -0.05%
BenchmarkEqualPortable4M          25179603     25247419   +0.27%
BenchmarkEqualPortable64M        403745450    403779625   +0.01%
BenchmarkIndex32                       816         1302  +59.56%
BenchmarkIndex4K                    120982       192998  +59.53%
BenchmarkIndex4M                 124260183    193296808  +55.56%
BenchmarkIndex64M               1988046002   3095359752  +55.70%
BenchmarkIndexEasy32                   144          161  +11.81%
BenchmarkIndexEasy4K                  7352         7369   +0.23%
BenchmarkIndexEasy4M               7578549      7575628   -0.04%
BenchmarkIndexEasy64M            120737458    120832933   +0.08%
BenchmarkCount32                       818         1241  +51.71%
BenchmarkCount4K                    120977       188073  +55.46%
BenchmarkCount4M                 124265360    192719445  +55.09%
BenchmarkCount64M               1988792002   3085427626  +55.14%
BenchmarkCountEasy32                   142          162  +14.08%
BenchmarkCountEasy4K                  7349         7361   +0.16%
BenchmarkCountEasy4M               7574511      7574003   -0.01%
BenchmarkCountEasy64M            120729902    120718660   -0.01%
BenchmarkFields                  114759929    115037127   +0.24%
BenchmarkFieldsFunc              113932977    113976645   +0.04%
BenchmarkTrimSpace                     345          339   -1.74%

benchmark                         old MB/s     new MB/s  speedup
BenchmarkReadString                 261.93       257.22    0.98x
BenchmarkIndexByte32                419.35       353.35    0.84x
BenchmarkIndexByte4K                538.54       561.22    1.04x
BenchmarkIndexByte4M                553.77       554.23    1.00x
BenchmarkIndexByte64M               554.68       555.75    1.00x
BenchmarkIndexBytePortable32        182.72       182.41    1.00x
BenchmarkIndexBytePortable4K        236.11       236.24    1.00x
BenchmarkIndexBytePortable4M        228.53       228.67    1.00x
BenchmarkIndexBytePortable64M       228.61       228.88    1.00x
BenchmarkEqual32                    412.82       502.15    1.22x
BenchmarkEqual4K                    561.90      2207.99    3.93x
BenchmarkEqual4M                    501.26      1306.97    2.61x
BenchmarkEqual64M                   470.89      1092.96    2.32x
BenchmarkEqualPort32                142.66       142.81    1.00x
BenchmarkEqualPort4K                168.82       168.91    1.00x
BenchmarkEqualPortable4M            166.58       166.13    1.00x
BenchmarkEqualPortable64M           166.22       166.20    1.00x
BenchmarkIndex32                     39.19        24.56    0.63x
BenchmarkIndex4K                     33.86        21.22    0.63x
BenchmarkIndex4M                     33.75        21.70    0.64x
BenchmarkIndex64M                    33.76        21.68    0.64x
BenchmarkIndexEasy32                221.74       198.25    0.89x
BenchmarkIndexEasy4K                557.11       555.79    1.00x
BenchmarkIndexEasy4M                553.44       553.66    1.00x
BenchmarkIndexEasy64M               555.82       555.39    1.00x
BenchmarkCount32                     39.08        25.78    0.66x
BenchmarkCount4K                     33.86        21.78    0.64x
BenchmarkCount4M                     33.75        21.76    0.64x
BenchmarkCount64M                    33.74        21.75    0.64x
BenchmarkCountEasy32                225.27       196.43    0.87x
BenchmarkCountEasy4K                557.32       556.37    1.00x
BenchmarkCountEasy4M                553.74       553.78    1.00x
BenchmarkCountEasy64M               555.86       555.91    1.00x
BenchmarkFields                       9.14         9.12    1.00x
BenchmarkFieldsFunc                   9.20         9.20    1.00x

Sign in to reply to this message.

khr1

It looks like both the slow Index and Count ones are calling eq with 7-byte ...

12 years, 4 months ago (2013-03-28 04:25:42 UTC) #9

It looks like both the slow Index and Count ones are calling eq with 7-byte
slices.  For some reason that parameter makes it slow.  I'll see what I can
do to fix that, 7-byte eq is probably pretty common.


On Wed, Mar 27, 2013 at 8:55 PM, Dave Cheney <dave@cheney.net> wrote:

> Some excellent speedups on arm (chromebook) for Equals, but a
> regression on some other benchmarks. A very brief look at the profile
> makes me want to blame the additional branch in Index although there
> doesn't appear to be any stack splitting.
>
> benchmark                        old ns/op    new ns/op    delta
> BenchmarkReadString                 125100       127392   +1.83%
> BenchmarkIndexByte32                    76           90  +18.74%
> BenchmarkIndexByte4K                  7605         7298   -4.04%
> BenchmarkIndexByte4M               7574065      7567863   -0.08%
> BenchmarkIndexByte64M            120986945    120754408   -0.19%
> BenchmarkIndexBytePortable32           175          175   +0.00%
> BenchmarkIndexBytePortable4K         17347        17338   -0.05%
> BenchmarkIndexBytePortable4M      18353531     18341798   -0.06%
> BenchmarkIndexBytePortable64M    293546342    293211533   -0.11%
> BenchmarkEqual32                        77           63  -17.81%
> BenchmarkEqual4K                      7289         1855  -74.55%
> BenchmarkEqual4M                   8367520      3209180  -61.65%
> BenchmarkEqual64M                142515029     61400849  -56.92%
> BenchmarkEqualPort32                   224          224   +0.00%
> BenchmarkEqualPort4K                 24263        24250   -0.05%
> BenchmarkEqualPortable4M          25179603     25247419   +0.27%
> BenchmarkEqualPortable64M        403745450    403779625   +0.01%
> BenchmarkIndex32                       816         1302  +59.56%
> BenchmarkIndex4K                    120982       192998  +59.53%
> BenchmarkIndex4M                 124260183    193296808  +55.56%
> BenchmarkIndex64M               1988046002   3095359752  +55.70%
> BenchmarkIndexEasy32                   144          161  +11.81%
> BenchmarkIndexEasy4K                  7352         7369   +0.23%
> BenchmarkIndexEasy4M               7578549      7575628   -0.04%
> BenchmarkIndexEasy64M            120737458    120832933   +0.08%
> BenchmarkCount32                       818         1241  +51.71%
> BenchmarkCount4K                    120977       188073  +55.46%
> BenchmarkCount4M                 124265360    192719445  +55.09%
> BenchmarkCount64M               1988792002   3085427626  +55.14%
> BenchmarkCountEasy32                   142          162  +14.08%
> BenchmarkCountEasy4K                  7349         7361   +0.16%
> BenchmarkCountEasy4M               7574511      7574003   -0.01%
> BenchmarkCountEasy64M            120729902    120718660   -0.01%
> BenchmarkFields                  114759929    115037127   +0.24%
> BenchmarkFieldsFunc              113932977    113976645   +0.04%
> BenchmarkTrimSpace                     345          339   -1.74%
>
> benchmark                         old MB/s     new MB/s  speedup
> BenchmarkReadString                 261.93       257.22    0.98x
> BenchmarkIndexByte32                419.35       353.35    0.84x
> BenchmarkIndexByte4K                538.54       561.22    1.04x
> BenchmarkIndexByte4M                553.77       554.23    1.00x
> BenchmarkIndexByte64M               554.68       555.75    1.00x
> BenchmarkIndexBytePortable32        182.72       182.41    1.00x
> BenchmarkIndexBytePortable4K        236.11       236.24    1.00x
> BenchmarkIndexBytePortable4M        228.53       228.67    1.00x
> BenchmarkIndexBytePortable64M       228.61       228.88    1.00x
> BenchmarkEqual32                    412.82       502.15    1.22x
> BenchmarkEqual4K                    561.90      2207.99    3.93x
> BenchmarkEqual4M                    501.26      1306.97    2.61x
> BenchmarkEqual64M                   470.89      1092.96    2.32x
> BenchmarkEqualPort32                142.66       142.81    1.00x
> BenchmarkEqualPort4K                168.82       168.91    1.00x
> BenchmarkEqualPortable4M            166.58       166.13    1.00x
> BenchmarkEqualPortable64M           166.22       166.20    1.00x
> BenchmarkIndex32                     39.19        24.56    0.63x
> BenchmarkIndex4K                     33.86        21.22    0.63x
> BenchmarkIndex4M                     33.75        21.70    0.64x
> BenchmarkIndex64M                    33.76        21.68    0.64x
> BenchmarkIndexEasy32                221.74       198.25    0.89x
> BenchmarkIndexEasy4K                557.11       555.79    1.00x
> BenchmarkIndexEasy4M                553.44       553.66    1.00x
> BenchmarkIndexEasy64M               555.82       555.39    1.00x
> BenchmarkCount32                     39.08        25.78    0.66x
> BenchmarkCount4K                     33.86        21.78    0.64x
> BenchmarkCount4M                     33.75        21.76    0.64x
> BenchmarkCount64M                    33.74        21.75    0.64x
> BenchmarkCountEasy32                225.27       196.43    0.87x
> BenchmarkCountEasy4K                557.32       556.37    1.00x
> BenchmarkCountEasy4M                553.74       553.78    1.00x
> BenchmarkCountEasy64M               555.86       555.91    1.00x
> BenchmarkFields                       9.14         9.12    1.00x
> BenchmarkFieldsFunc                   9.20         9.20    1.00x
>

Sign in to reply to this message.

remyoudompheng

I have added the following benchmarks to runtime/string_test.go: func BenchmarkCompareStringBigUnaligned(b *testing.B) { bytes := make([]byte, ...

12 years, 4 months ago (2013-03-28 07:31:24 UTC) #10

remyoudompheng

https://codereview.appspot.com/8056043/diff/18001/src/pkg/bytes/bytes_test.go File src/pkg/bytes/bytes_test.go (right): https://codereview.appspot.com/8056043/diff/18001/src/pkg/bytes/bytes_test.go#newcode92 src/pkg/bytes/bytes_test.go:92: I would have added a test like this: func ...

12 years, 4 months ago (2013-03-28 07:43:58 UTC) #11

remyoudompheng

On 2013/03/28 07:31:24, remyoudompheng wrote: > I could squeeze a bit more (20%) on ARM ...

12 years, 4 months ago (2013-03-28 07:51:10 UTC) #12

khr1

Dave: I've moved bytes.Equal to the runtime files. Good idea, that puts all that code ...

12 years, 4 months ago (2013-03-29 07:06:48 UTC) #13

Dave:
I've moved bytes.Equal to the runtime files.  Good idea, that puts all that
code in one place.
Try your benchmarks again, I've special-cased <8 byte comparisons, they
should be only slightly slower than before.

Remy:
I've included your test and your optimization, thanks.
Did you check in your benchmarks, or should I include them in my patch?

Brad:
I've changed the amd64/386 code quite a bit, seems that REP;CMPSx has a
significant startup cost and we're better off just explicitly looping.


On Thu, Mar 28, 2013 at 12:51 AM, <remyoudompheng@gmail.com> wrote:

> On 2013/03/28 07:31:24, remyoudompheng wrote:
>
>> I could squeeze a bit more (20%) on ARM by using the following form
>>
> for the
>
>> unaligned case (Dave, do you want to run your benchmarks on this?):
>>
>
>          // B is not aligned.  Compare the word at A
>>         // with the value constructed from the
>>         // two words near B.
>>         // Note: this code is little-endian specific.
>>          // Beware that R9, R10 are reserved for m, g.
>>          // R11 can be used by linker.
>>         MOVW    (R4), R7
>> one_aligned:
>>         CMP     $4, R3
>>         BLO     tail
>>         MOVW    R7>>R5, R12
>>         MOVW.W  4(R4), R7
>>         ORR     R7<<R6, R12
>>         MOVW.P  4(R1), R8
>>         ADD     $4, R2
>>         SUB     $4, R3
>>         CMP     R8, R12
>>         BEQ     one_aligned
>>         MOVW    $0, R0
>>         RET
>>
>
> More precisely, I obtained this result compared to the original
> proposal:
>
>
> benchmark                             old ns/op    new ns/op    delta
> BenchmarkCompareStringBigUnali**gned      2894508      2447519  -15.44%
> BenchmarkCompareStringBig               1802048      1811376   +0.52%
>
>
> benchmark                              old MB/s     new MB/s  speedup
> BenchmarkCompareStringBigUnali**gned       362.27       428.43    1.18x
> BenchmarkCompareStringBig                581.89       578.89    0.99x
>
>
>
https://codereview.appspot.**com/8056043/<https://codereview.appspot.com/8056...
>

Sign in to reply to this message.

bradfitz

LGTM Assembly people should chime in, though. https://codereview.appspot.com/8056043/diff/39001/src/pkg/bytes/bytes_test.go File src/pkg/bytes/bytes_test.go (right): https://codereview.appspot.com/8056043/diff/39001/src/pkg/bytes/bytes_test.go#newcode335 src/pkg/bytes/bytes_test.go:335: buf := ...

12 years, 4 months ago (2013-03-29 07:26:09 UTC) #14

remyoudompheng

On 2013/03/29 07:06:48, khr1 wrote: > Dave: > I've moved bytes.Equal to the runtime files. ...

12 years, 4 months ago (2013-03-29 07:34:57 UTC) #15

khr1

On Fri, Mar 29, 2013 at 12:26 AM, <bradfitz@golang.org> wrote: > LGTM > > Assembly ...

12 years, 4 months ago (2013-03-29 07:38:26 UTC) #16

On Fri, Mar 29, 2013 at 12:26 AM, <bradfitz@golang.org> wrote:

> LGTM
>
> Assembly people should chime in, though.
>
>
>
> https://codereview.appspot.**com/8056043/diff/39001/src/**
>
pkg/bytes/bytes_test.go<https://codereview.appspot.com/8056043/diff/39001/src/pkg/bytes/bytes_test.go>
> File src/pkg/bytes/bytes_test.go (right):
>
> https://codereview.appspot.**com/8056043/diff/39001/src/**
>
pkg/bytes/bytes_test.go#**newcode335<https://codereview.appspot.com/8056043/diff/39001/src/pkg/bytes/bytes_test.go#newcode335>
> src/pkg/bytes/bytes_test.go:**335: buf := make([]byte, 4)
> or:
> var buf [4]byte
>
> https://codereview.appspot.**com/8056043/diff/39001/src/**
>
pkg/runtime/asm_386.s<https://codereview.appspot.com/8056043/diff/39001/src/pkg/runtime/asm_386.s>
> File src/pkg/runtime/asm_386.s (right):
>
> https://codereview.appspot.**com/8056043/diff/39001/src/**
>
pkg/runtime/asm_386.s#**newcode996<https://codereview.appspot.com/8056043/diff/39001/src/pkg/runtime/asm_386.s#newcode996>
> src/pkg/runtime/asm_386.s:996: TEXT bytes·Equal(SB),7,$0
> This seems like new precedent. At least leave a comment above "func
> equalPortable" in pkg/bytes/bytes.go directing people where to find
> Equal.
>
> Like:
>
> // func Equal is implemented in assembly in pkg/runtime for
> // amd64, 386, and arm.
>
> Then grepping for "func Equal" finds it.
>
> https://codereview.appspot.**com/8056043/diff/39001/src/**
>
pkg/runtime/mapspeed_test.go<https://codereview.appspot.com/8056043/diff/39001/src/pkg/runtime/mapspeed_test.go>
> File src/pkg/runtime/mapspeed_test.**go (right):
>
> https://codereview.appspot.**com/8056043/diff/39001/src/**
>
pkg/runtime/mapspeed_test.go#**newcode124<https://codereview.appspot.com/8056043/diff/39001/src/pkg/runtime/mapspeed_test.go#newcode124>
> src/pkg/runtime/mapspeed_test.**go:124: key := strings.Repeat("X", 1<<20)
> move this up a line and just say m[key] = true
>

I want two different strings so the memory comparison has to happen.  With
your change, it could (and does) detect equality by just comparing pointers.


>
>
https://codereview.appspot.**com/8056043/<https://codereview.appspot.com/8056...
>

Sign in to reply to this message.

remyoudompheng

https://codereview.appspot.com/8056043/diff/39001/src/pkg/runtime/asm_386.s File src/pkg/runtime/asm_386.s (right): https://codereview.appspot.com/8056043/diff/39001/src/pkg/runtime/asm_386.s#newcode1027 src/pkg/runtime/asm_386.s:1027: PCMPEQB X1, X0 Do we explicitly require SSE2 on ...

12 years, 4 months ago (2013-03-29 07:38:39 UTC) #17

khr1

The SSE way is about 2x as fast, at least when the strings are intermediate ...

12 years, 4 months ago (2013-03-29 07:53:58 UTC) #18

dave_cheney.net

Thanks Keith, I'll get to benchmarking. https://codereview.appspot.com/8056043/diff/39001/src/pkg/runtime/asm_386.s File src/pkg/runtime/asm_386.s (right): https://codereview.appspot.com/8056043/diff/39001/src/pkg/runtime/asm_386.s#newcode996 src/pkg/runtime/asm_386.s:996: TEXT bytes·Equal(SB),7,$0 On ...

12 years, 4 months ago (2013-03-29 09:03:03 UTC) #19

albert.strasheim

Looks very good compared to glibc 4.7.2 on a Core i7-3720QM: BenchmarkRuntimeMemequal1K 25993.11 MB/s BenchmarkRuntimeMemequal1M ...

12 years, 4 months ago (2013-03-29 14:25:00 UTC) #20

minux1

On Fri, Mar 29, 2013 at 3:53 PM, Keith Randall <khr@google.com> wrote: > The SSE ...

12 years, 4 months ago (2013-03-29 14:44:25 UTC) #21

bradfitz

https://codereview.appspot.com/8056043/diff/39001/src/pkg/runtime/mapspeed_test.go File src/pkg/runtime/mapspeed_test.go (right): https://codereview.appspot.com/8056043/diff/39001/src/pkg/runtime/mapspeed_test.go#newcode124 src/pkg/runtime/mapspeed_test.go:124: key := strings.Repeat("X", 1<<20) On 2013/03/29 07:26:09, bradfitz wrote: ...

12 years, 4 months ago (2013-03-29 15:44:03 UTC) #22

khr1

> This was my suggestion. I believe the are precedents from the time > package. ...

12 years, 4 months ago (2013-03-29 17:16:39 UTC) #23

ality

minux <minux.ma@gmail.com> once said: > On Fri, Mar 29, 2013 at 3:53 PM, Keith Randall ...

12 years, 4 months ago (2013-03-29 17:58:09 UTC) #24

r

https://codereview.appspot.com/8056043/diff/35002/src/pkg/bytes/bytes_test.go File src/pkg/bytes/bytes_test.go (right): https://codereview.appspot.com/8056043/diff/35002/src/pkg/bytes/bytes_test.go#newcode94 src/pkg/bytes/bytes_test.go:94: b1 := make([]byte, 64) const size = 64 https://codereview.appspot.com/8056043/diff/35002/src/pkg/bytes/bytes_test.go#newcode102 ...

12 years, 4 months ago (2013-03-29 18:39:46 UTC) #25

ality

Anthony Martin <ality@pbrane.org> once said: > minux <minux.ma@gmail.com> once said: > > On Fri, Mar ...

12 years, 4 months ago (2013-03-29 19:07:09 UTC) #26

minux1

On Sat, Mar 30, 2013 at 1:58 AM, Anthony Martin <ality@pbrane.org> wrote: > According to ...

12 years, 4 months ago (2013-03-30 16:15:25 UTC) #27

ality

minux <minux.ma@gmail.com> once said: > On Sat, Mar 30, 2013 at 1:58 AM, Anthony Martin ...

12 years, 4 months ago (2013-03-30 16:51:48 UTC) #28

minux1

On Sun, Mar 31, 2013 at 12:51 AM, Anthony Martin <ality@pbrane.org> wrote: > minux <minux.ma@gmail.com> ...

12 years, 4 months ago (2013-03-30 17:08:48 UTC) #29

dave_cheney.net

I benchmarked patchset 10 on my chromebook, the results are mixed. The regressions in the ...

12 years, 4 months ago (2013-03-31 00:04:11 UTC) #30

I benchmarked patchset 10 on my chromebook, the results are mixed. The
regressions in the bytes benchmarks appear to flow through in to strings and
fmt.

lucky(~) % cat bytes.txt 
benchmark                           old ns/op    new ns/op    delta
BenchmarkReadString                    112131       127869  +14.04%
BenchmarkBufferNotEmptyWriteRead      3634051      3681321   +1.30%
BenchmarkBufferFullSmallReads          440993       441639   +0.15%
BenchmarkIndexByte32                       76           76   +0.00%
BenchmarkIndexByte4K                     7279         7286   +0.10%
BenchmarkIndexByte4M                  7533005      7564464   +0.42%
BenchmarkIndexByte64M               120355191    120813462   +0.38%
BenchmarkIndexBytePortable32              174          175   +0.57%
BenchmarkIndexBytePortable4K            17302        17342   +0.23%
BenchmarkIndexBytePortable4M         18276630     18349412   +0.40%
BenchmarkIndexBytePortable64M       292254766    293383533   +0.39%
BenchmarkEqual32                           90           60  -33.22%   
BenchmarkEqual4K                         7285         1853  -74.56%
BenchmarkEqual4M                      8829644      3517534  -60.16%
BenchmarkEqual64M                   138950637     59169793  -57.42%
BenchmarkEqualPort32                      223          224   +0.45%
BenchmarkEqualPort4K                    24201        24255   +0.22%
BenchmarkEqualPortable4M             25031372     25131132   +0.40%
BenchmarkEqualPortable64M           402236591    403740442   +0.37%
BenchmarkIndex32                          800         1182  +47.75%
BenchmarkIndex4K                       118306       178704  +51.05%
BenchmarkIndex4M                    121457245    183889629  +51.40%
BenchmarkIndex64M                  1943517375   2939630294  +51.25%
BenchmarkIndexEasy32                      142          142   +0.00%
BenchmarkIndexEasy4K                     7331         7336   +0.07%
BenchmarkIndexEasy4M                  7548436      7543304   -0.07%
BenchmarkIndexEasy64M               120458129    120302727   -0.13%
BenchmarkCount32                          806         1182  +46.65%
BenchmarkCount4K                       118344       178649  +50.96%
BenchmarkCount4M                    121673656    183400741  +50.73%
BenchmarkCount64M                  1944794002   2934607085  +50.90%
BenchmarkCountEasy32                      144          141   -2.08%
BenchmarkCountEasy4K                     7360         7335   -0.34%
BenchmarkCountEasy4M                  7554226      7564146   +0.13%
BenchmarkCountEasy64M               120317272    120715047   +0.33%
BenchmarkFields                     116213231    114222702   -1.71%
BenchmarkFieldsFunc                 114475948    113263035   -1.06%
BenchmarkTrimSpace                        342          336   -1.75%

benchmark                            old MB/s     new MB/s  speedup
BenchmarkReadString                    292.23       256.26    0.88x
BenchmarkIndexByte32                   419.49       419.49    1.00x
BenchmarkIndexByte4K                   562.70       562.14    1.00x
BenchmarkIndexByte4M                   556.79       554.47    1.00x
BenchmarkIndexByte64M                  557.59       555.48    1.00x
BenchmarkIndexBytePortable32           183.22       182.84    1.00x
BenchmarkIndexBytePortable4K           236.72       236.18    1.00x
BenchmarkIndexBytePortable4M           229.49       228.58    1.00x
BenchmarkIndexBytePortable64M          229.62       228.74    1.00x
BenchmarkEqual32                       354.47       530.72    1.50x
BenchmarkEqual4K                       562.22      2210.19    3.93x
BenchmarkEqual4M                       475.03      1192.40    2.51x
BenchmarkEqual64M                      482.97      1134.17    2.35x
BenchmarkEqualPort32                   143.47       142.77    1.00x
BenchmarkEqualPort4K                   169.24       168.87    1.00x
BenchmarkEqualPortable4M               167.56       166.90    1.00x
BenchmarkEqualPortable64M              166.84       166.22    1.00x
BenchmarkIndex32                        39.97        27.07    0.68x
BenchmarkIndex4K                        34.62        22.92    0.66x
BenchmarkIndex4M                        34.53        22.81    0.66x
BenchmarkIndex64M                       34.53        22.83    0.66x
BenchmarkIndexEasy32                   224.99       224.98    1.00x
BenchmarkIndexEasy4K                   558.69       558.33    1.00x
BenchmarkIndexEasy4M                   555.65       556.03    1.00x
BenchmarkIndexEasy64M                  557.11       557.83    1.00x
BenchmarkCount32                        39.69        27.06    0.68x
BenchmarkCount4K                        34.61        22.93    0.66x
BenchmarkCount4M                        34.47        22.87    0.66x
BenchmarkCount64M                       34.51        22.87    0.66x
BenchmarkCountEasy32                   221.56       225.87    1.02x
BenchmarkCountEasy4K                   556.51       558.36    1.00x
BenchmarkCountEasy4M                   555.23       554.50    1.00x
BenchmarkCountEasy64M                  557.77       555.93    1.00x
BenchmarkFields                          9.02         9.18    1.02x
BenchmarkFieldsFunc                      9.16         9.26    1.01x
lucky(~) % cat strings.txt 
benchmark                           old ns/op    new ns/op    delta
BenchmarkGenericNoMatch                  7555         7427   -1.69%
BenchmarkGenericMatch1                  29053        29279   +0.78%
BenchmarkGenericMatch2                 175206       188767   +7.74%
BenchmarkSingleMaxSkipping               5981         5976   -0.08%
BenchmarkSingleLongSuffixFail            6083         6043   -0.66%
BenchmarkSingleMatch                   376140       385756   +2.56%
BenchmarkByteByteNoMatch                 2301         2301   +0.00%
BenchmarkByteByteMatch                   4902         4913   +0.22%
BenchmarkByteStringMatch                 7346         7210   -1.85%
BenchmarkHTMLEscapeNew                   2305         2339   +1.48%
BenchmarkHTMLEscapeOld                   3783         3812   +0.77%
BenchmarkByteByteReplaces               18253        18348   +0.52%
BenchmarkByteByteMap                     9875         9685   -1.92%
BenchmarkIndexRune                        246          254   +3.25%
BenchmarkIndexRuneFastPath                130          116  -10.77%
BenchmarkIndex                            129          129   +0.00%
BenchmarkMapNoChanges                     799          843   +5.51%
BenchmarkIndexHard1                   7637311      7636978   -0.00%
BenchmarkIndexHard2                   7647121      7647579   +0.01%
BenchmarkIndexHard3                   7628415      7637353   +0.12%
BenchmarkCountHard1                   8250395      8260440   +0.12%
BenchmarkCountHard2                   8253451      8260638   +0.09%
BenchmarkCountHard3                   8250371      8263897   +0.16%
BenchmarkIndexTorture                   47275        47389   +0.24%
BenchmarkCountTorture                   49252        49139   -0.23%
BenchmarkCountTortureOverlapping     52097076     29447134  -43.48%
BenchmarkFields                      90120216     89680439   -0.49%
BenchmarkFieldsFunc                  90004377     89325775   -0.75%
BenchmarkSplit1                      76455569     76104520   -0.46%
BenchmarkSplit2                      20392480     19932127   -2.26%
BenchmarkSplit3                      18983460     18571189   -2.17%

benchmark                            old MB/s     new MB/s  speedup
BenchmarkSingleMaxSkipping            1671.80      1673.30    1.00x
BenchmarkSingleLongSuffixFail          164.71       165.78    1.01x
BenchmarkSingleMatch                    39.88        38.88    0.97x
BenchmarkFields                         11.64        11.69    1.00x
BenchmarkFieldsFunc                     11.65        11.74    1.01x
lucky(~) % cat fmt.txt 
benchmark                      old ns/op    new ns/op    delta
BenchmarkSprintfEmpty                424          424   +0.00%
BenchmarkSprintfString              1499         1451   -3.20%
BenchmarkSprintfInt                 1261         1438  +14.04%
BenchmarkSprintfIntInt              2042         2031   -0.54%
BenchmarkSprintfPrefixedInt         2028         2165   +6.76%
BenchmarkSprintfFloat               3579         3686   +2.99%
BenchmarkManyArgs                   5573         6130   +9.99%
BenchmarkScanInts                5413886      5166830   -4.56%
BenchmarkScanRecursiveInt        6447875      6478815   +0.48%

Sign in to reply to this message.

bradfitz

Which arch is your chromebook? On Sat, Mar 30, 2013 at 5:04 PM, <dave@cheney.net> wrote: ...

12 years, 4 months ago (2013-03-31 00:54:53 UTC) #31

Which arch is your chromebook?

On Sat, Mar 30, 2013 at 5:04 PM, <dave@cheney.net> wrote:

> I benchmarked patchset 10 on my chromebook, the results are mixed. The
> regressions in the bytes benchmarks appear to flow through in to strings
> and fmt.
>
> lucky(~) % cat bytes.txt
>
> benchmark                           old ns/op    new ns/op    delta
> BenchmarkReadString                    112131       127869  +14.04%
> BenchmarkBufferNotEmptyWriteRe**ad      3634051      3681321   +1.30%
> BenchmarkBufferFullSmallReads          440993       441639   +0.15%
> BenchmarkIndexByte32                       76           76   +0.00%
> BenchmarkIndexByte4K                     7279         7286   +0.10%
> BenchmarkIndexByte4M                  7533005      7564464   +0.42%
> BenchmarkIndexByte64M               120355191    120813462   +0.38%
> BenchmarkIndexBytePortable32              174          175   +0.57%
> BenchmarkIndexBytePortable4K            17302        17342   +0.23%
> BenchmarkIndexBytePortable4M         18276630     18349412   +0.40%
> BenchmarkIndexBytePortable64M       292254766    293383533   +0.39%
> BenchmarkEqual32                           90           60  -33.22%
> BenchmarkEqual4K                         7285         1853  -74.56%
> BenchmarkEqual4M                      8829644      3517534  -60.16%
> BenchmarkEqual64M                   138950637     59169793  -57.42%
> BenchmarkEqualPort32                      223          224   +0.45%
> BenchmarkEqualPort4K                    24201        24255   +0.22%
> BenchmarkEqualPortable4M             25031372     25131132   +0.40%
> BenchmarkEqualPortable64M           402236591    403740442   +0.37%
> BenchmarkIndex32                          800         1182  +47.75%
> BenchmarkIndex4K                       118306       178704  +51.05%
> BenchmarkIndex4M                    121457245    183889629  +51.40%
> BenchmarkIndex64M                  1943517375   2939630294  +51.25%
> BenchmarkIndexEasy32                      142          142   +0.00%
> BenchmarkIndexEasy4K                     7331         7336   +0.07%
> BenchmarkIndexEasy4M                  7548436      7543304   -0.07%
> BenchmarkIndexEasy64M               120458129    120302727   -0.13%
> BenchmarkCount32                          806         1182  +46.65%
> BenchmarkCount4K                       118344       178649  +50.96%
> BenchmarkCount4M                    121673656    183400741  +50.73%
> BenchmarkCount64M                  1944794002   2934607085  +50.90%
> BenchmarkCountEasy32                      144          141   -2.08%
> BenchmarkCountEasy4K                     7360         7335   -0.34%
> BenchmarkCountEasy4M                  7554226      7564146   +0.13%
> BenchmarkCountEasy64M               120317272    120715047   +0.33%
> BenchmarkFields                     116213231    114222702   -1.71%
> BenchmarkFieldsFunc                 114475948    113263035   -1.06%
> BenchmarkTrimSpace                        342          336   -1.75%
>
>
> benchmark                            old MB/s     new MB/s  speedup
> BenchmarkReadString                    292.23       256.26    0.88x
> BenchmarkIndexByte32                   419.49       419.49    1.00x
> BenchmarkIndexByte4K                   562.70       562.14    1.00x
> BenchmarkIndexByte4M                   556.79       554.47    1.00x
> BenchmarkIndexByte64M                  557.59       555.48    1.00x
> BenchmarkIndexBytePortable32           183.22       182.84    1.00x
> BenchmarkIndexBytePortable4K           236.72       236.18    1.00x
> BenchmarkIndexBytePortable4M           229.49       228.58    1.00x
> BenchmarkIndexBytePortable64M          229.62       228.74    1.00x
> BenchmarkEqual32                       354.47       530.72    1.50x
> BenchmarkEqual4K                       562.22      2210.19    3.93x
> BenchmarkEqual4M                       475.03      1192.40    2.51x
> BenchmarkEqual64M                      482.97      1134.17    2.35x
> BenchmarkEqualPort32                   143.47       142.77    1.00x
> BenchmarkEqualPort4K                   169.24       168.87    1.00x
> BenchmarkEqualPortable4M               167.56       166.90    1.00x
> BenchmarkEqualPortable64M              166.84       166.22    1.00x
> BenchmarkIndex32                        39.97        27.07    0.68x
> BenchmarkIndex4K                        34.62        22.92    0.66x
> BenchmarkIndex4M                        34.53        22.81    0.66x
> BenchmarkIndex64M                       34.53        22.83    0.66x
> BenchmarkIndexEasy32                   224.99       224.98    1.00x
> BenchmarkIndexEasy4K                   558.69       558.33    1.00x
> BenchmarkIndexEasy4M                   555.65       556.03    1.00x
> BenchmarkIndexEasy64M                  557.11       557.83    1.00x
> BenchmarkCount32                        39.69        27.06    0.68x
> BenchmarkCount4K                        34.61        22.93    0.66x
> BenchmarkCount4M                        34.47        22.87    0.66x
> BenchmarkCount64M                       34.51        22.87    0.66x
> BenchmarkCountEasy32                   221.56       225.87    1.02x
> BenchmarkCountEasy4K                   556.51       558.36    1.00x
> BenchmarkCountEasy4M                   555.23       554.50    1.00x
> BenchmarkCountEasy64M                  557.77       555.93    1.00x
> BenchmarkFields                          9.02         9.18    1.02x
> BenchmarkFieldsFunc                      9.16         9.26    1.01x
> lucky(~) % cat strings.txt
>
> benchmark                           old ns/op    new ns/op    delta
> BenchmarkGenericNoMatch                  7555         7427   -1.69%
> BenchmarkGenericMatch1                  29053        29279   +0.78%
> BenchmarkGenericMatch2                 175206       188767   +7.74%
> BenchmarkSingleMaxSkipping               5981         5976   -0.08%
> BenchmarkSingleLongSuffixFail            6083         6043   -0.66%
> BenchmarkSingleMatch                   376140       385756   +2.56%
> BenchmarkByteByteNoMatch                 2301         2301   +0.00%
> BenchmarkByteByteMatch                   4902         4913   +0.22%
> BenchmarkByteStringMatch                 7346         7210   -1.85%
> BenchmarkHTMLEscapeNew                   2305         2339   +1.48%
> BenchmarkHTMLEscapeOld                   3783         3812   +0.77%
> BenchmarkByteByteReplaces               18253        18348   +0.52%
> BenchmarkByteByteMap                     9875         9685   -1.92%
> BenchmarkIndexRune                        246          254   +3.25%
> BenchmarkIndexRuneFastPath                130          116  -10.77%
> BenchmarkIndex                            129          129   +0.00%
> BenchmarkMapNoChanges                     799          843   +5.51%
> BenchmarkIndexHard1                   7637311      7636978   -0.00%
> BenchmarkIndexHard2                   7647121      7647579   +0.01%
> BenchmarkIndexHard3                   7628415      7637353   +0.12%
> BenchmarkCountHard1                   8250395      8260440   +0.12%
> BenchmarkCountHard2                   8253451      8260638   +0.09%
> BenchmarkCountHard3                   8250371      8263897   +0.16%
> BenchmarkIndexTorture                   47275        47389   +0.24%
> BenchmarkCountTorture                   49252        49139   -0.23%
> BenchmarkCountTortureOverlappi**ng     52097076     29447134  -43.48%
> BenchmarkFields                      90120216     89680439   -0.49%
> BenchmarkFieldsFunc                  90004377     89325775   -0.75%
> BenchmarkSplit1                      76455569     76104520   -0.46%
> BenchmarkSplit2                      20392480     19932127   -2.26%
> BenchmarkSplit3                      18983460     18571189   -2.17%
>
>
> benchmark                            old MB/s     new MB/s  speedup
> BenchmarkSingleMaxSkipping            1671.80      1673.30    1.00x
> BenchmarkSingleLongSuffixFail          164.71       165.78    1.01x
> BenchmarkSingleMatch                    39.88        38.88    0.97x
> BenchmarkFields                         11.64        11.69    1.00x
> BenchmarkFieldsFunc                     11.65        11.74    1.01x
> lucky(~) % cat fmt.txt
>
> benchmark                      old ns/op    new ns/op    delta
> BenchmarkSprintfEmpty                424          424   +0.00%
> BenchmarkSprintfString              1499         1451   -3.20%
> BenchmarkSprintfInt                 1261         1438  +14.04%
> BenchmarkSprintfIntInt              2042         2031   -0.54%
> BenchmarkSprintfPrefixedInt         2028         2165   +6.76%
> BenchmarkSprintfFloat               3579         3686   +2.99%
> BenchmarkManyArgs                   5573         6130   +9.99%
> BenchmarkScanInts                5413886      5166830   -4.56%
> BenchmarkScanRecursiveInt        6447875      6478815   +0.48%
>
>
>
https://codereview.appspot.**com/8056043/<https://codereview.appspot.com/8056...
>

Sign in to reply to this message.

dave_cheney.net

This is the dual A15 Exynos 5 version. I can do some benchmarks on a ...

12 years, 4 months ago (2013-03-31 00:57:13 UTC) #32

This is the dual A15 Exynos 5 version. I can do some benchmarks on a
pandaboard or a RPi but I don't expect much change.

On Sun, Mar 31, 2013 at 11:54 AM, Brad Fitzpatrick <bradfitz@golang.org> wrote:
> Which arch is your chromebook?
>
>
> On Sat, Mar 30, 2013 at 5:04 PM, <dave@cheney.net> wrote:
>>
>> I benchmarked patchset 10 on my chromebook, the results are mixed. The
>> regressions in the bytes benchmarks appear to flow through in to strings
>> and fmt.
>>
>> lucky(~) % cat bytes.txt
>>
>> benchmark                           old ns/op    new ns/op    delta
>> BenchmarkReadString                    112131       127869  +14.04%
>> BenchmarkBufferNotEmptyWriteRead      3634051      3681321   +1.30%
>> BenchmarkBufferFullSmallReads          440993       441639   +0.15%
>> BenchmarkIndexByte32                       76           76   +0.00%
>> BenchmarkIndexByte4K                     7279         7286   +0.10%
>> BenchmarkIndexByte4M                  7533005      7564464   +0.42%
>> BenchmarkIndexByte64M               120355191    120813462   +0.38%
>> BenchmarkIndexBytePortable32              174          175   +0.57%
>> BenchmarkIndexBytePortable4K            17302        17342   +0.23%
>> BenchmarkIndexBytePortable4M         18276630     18349412   +0.40%
>> BenchmarkIndexBytePortable64M       292254766    293383533   +0.39%
>> BenchmarkEqual32                           90           60  -33.22%
>> BenchmarkEqual4K                         7285         1853  -74.56%
>> BenchmarkEqual4M                      8829644      3517534  -60.16%
>> BenchmarkEqual64M                   138950637     59169793  -57.42%
>> BenchmarkEqualPort32                      223          224   +0.45%
>> BenchmarkEqualPort4K                    24201        24255   +0.22%
>> BenchmarkEqualPortable4M             25031372     25131132   +0.40%
>> BenchmarkEqualPortable64M           402236591    403740442   +0.37%
>> BenchmarkIndex32                          800         1182  +47.75%
>> BenchmarkIndex4K                       118306       178704  +51.05%
>> BenchmarkIndex4M                    121457245    183889629  +51.40%
>> BenchmarkIndex64M                  1943517375   2939630294  +51.25%
>> BenchmarkIndexEasy32                      142          142   +0.00%
>> BenchmarkIndexEasy4K                     7331         7336   +0.07%
>> BenchmarkIndexEasy4M                  7548436      7543304   -0.07%
>> BenchmarkIndexEasy64M               120458129    120302727   -0.13%
>> BenchmarkCount32                          806         1182  +46.65%
>> BenchmarkCount4K                       118344       178649  +50.96%
>> BenchmarkCount4M                    121673656    183400741  +50.73%
>> BenchmarkCount64M                  1944794002   2934607085  +50.90%
>> BenchmarkCountEasy32                      144          141   -2.08%
>> BenchmarkCountEasy4K                     7360         7335   -0.34%
>> BenchmarkCountEasy4M                  7554226      7564146   +0.13%
>> BenchmarkCountEasy64M               120317272    120715047   +0.33%
>> BenchmarkFields                     116213231    114222702   -1.71%
>> BenchmarkFieldsFunc                 114475948    113263035   -1.06%
>> BenchmarkTrimSpace                        342          336   -1.75%
>>
>>
>> benchmark                            old MB/s     new MB/s  speedup
>> BenchmarkReadString                    292.23       256.26    0.88x
>> BenchmarkIndexByte32                   419.49       419.49    1.00x
>> BenchmarkIndexByte4K                   562.70       562.14    1.00x
>> BenchmarkIndexByte4M                   556.79       554.47    1.00x
>> BenchmarkIndexByte64M                  557.59       555.48    1.00x
>> BenchmarkIndexBytePortable32           183.22       182.84    1.00x
>> BenchmarkIndexBytePortable4K           236.72       236.18    1.00x
>> BenchmarkIndexBytePortable4M           229.49       228.58    1.00x
>> BenchmarkIndexBytePortable64M          229.62       228.74    1.00x
>> BenchmarkEqual32                       354.47       530.72    1.50x
>> BenchmarkEqual4K                       562.22      2210.19    3.93x
>> BenchmarkEqual4M                       475.03      1192.40    2.51x
>> BenchmarkEqual64M                      482.97      1134.17    2.35x
>> BenchmarkEqualPort32                   143.47       142.77    1.00x
>> BenchmarkEqualPort4K                   169.24       168.87    1.00x
>> BenchmarkEqualPortable4M               167.56       166.90    1.00x
>> BenchmarkEqualPortable64M              166.84       166.22    1.00x
>> BenchmarkIndex32                        39.97        27.07    0.68x
>> BenchmarkIndex4K                        34.62        22.92    0.66x
>> BenchmarkIndex4M                        34.53        22.81    0.66x
>> BenchmarkIndex64M                       34.53        22.83    0.66x
>> BenchmarkIndexEasy32                   224.99       224.98    1.00x
>> BenchmarkIndexEasy4K                   558.69       558.33    1.00x
>> BenchmarkIndexEasy4M                   555.65       556.03    1.00x
>> BenchmarkIndexEasy64M                  557.11       557.83    1.00x
>> BenchmarkCount32                        39.69        27.06    0.68x
>> BenchmarkCount4K                        34.61        22.93    0.66x
>> BenchmarkCount4M                        34.47        22.87    0.66x
>> BenchmarkCount64M                       34.51        22.87    0.66x
>> BenchmarkCountEasy32                   221.56       225.87    1.02x
>> BenchmarkCountEasy4K                   556.51       558.36    1.00x
>> BenchmarkCountEasy4M                   555.23       554.50    1.00x
>> BenchmarkCountEasy64M                  557.77       555.93    1.00x
>> BenchmarkFields                          9.02         9.18    1.02x
>> BenchmarkFieldsFunc                      9.16         9.26    1.01x
>> lucky(~) % cat strings.txt
>>
>> benchmark                           old ns/op    new ns/op    delta
>> BenchmarkGenericNoMatch                  7555         7427   -1.69%
>> BenchmarkGenericMatch1                  29053        29279   +0.78%
>> BenchmarkGenericMatch2                 175206       188767   +7.74%
>> BenchmarkSingleMaxSkipping               5981         5976   -0.08%
>> BenchmarkSingleLongSuffixFail            6083         6043   -0.66%
>> BenchmarkSingleMatch                   376140       385756   +2.56%
>> BenchmarkByteByteNoMatch                 2301         2301   +0.00%
>> BenchmarkByteByteMatch                   4902         4913   +0.22%
>> BenchmarkByteStringMatch                 7346         7210   -1.85%
>> BenchmarkHTMLEscapeNew                   2305         2339   +1.48%
>> BenchmarkHTMLEscapeOld                   3783         3812   +0.77%
>> BenchmarkByteByteReplaces               18253        18348   +0.52%
>> BenchmarkByteByteMap                     9875         9685   -1.92%
>> BenchmarkIndexRune                        246          254   +3.25%
>> BenchmarkIndexRuneFastPath                130          116  -10.77%
>> BenchmarkIndex                            129          129   +0.00%
>> BenchmarkMapNoChanges                     799          843   +5.51%
>> BenchmarkIndexHard1                   7637311      7636978   -0.00%
>> BenchmarkIndexHard2                   7647121      7647579   +0.01%
>> BenchmarkIndexHard3                   7628415      7637353   +0.12%
>> BenchmarkCountHard1                   8250395      8260440   +0.12%
>> BenchmarkCountHard2                   8253451      8260638   +0.09%
>> BenchmarkCountHard3                   8250371      8263897   +0.16%
>> BenchmarkIndexTorture                   47275        47389   +0.24%
>> BenchmarkCountTorture                   49252        49139   -0.23%
>> BenchmarkCountTortureOverlapping     52097076     29447134  -43.48%
>> BenchmarkFields                      90120216     89680439   -0.49%
>> BenchmarkFieldsFunc                  90004377     89325775   -0.75%
>> BenchmarkSplit1                      76455569     76104520   -0.46%
>> BenchmarkSplit2                      20392480     19932127   -2.26%
>> BenchmarkSplit3                      18983460     18571189   -2.17%
>>
>>
>> benchmark                            old MB/s     new MB/s  speedup
>> BenchmarkSingleMaxSkipping            1671.80      1673.30    1.00x
>> BenchmarkSingleLongSuffixFail          164.71       165.78    1.01x
>> BenchmarkSingleMatch                    39.88        38.88    0.97x
>> BenchmarkFields                         11.64        11.69    1.00x
>> BenchmarkFieldsFunc                     11.65        11.74    1.01x
>> lucky(~) % cat fmt.txt
>>
>> benchmark                      old ns/op    new ns/op    delta
>> BenchmarkSprintfEmpty                424          424   +0.00%
>> BenchmarkSprintfString              1499         1451   -3.20%
>> BenchmarkSprintfInt                 1261         1438  +14.04%
>> BenchmarkSprintfIntInt              2042         2031   -0.54%
>> BenchmarkSprintfPrefixedInt         2028         2165   +6.76%
>> BenchmarkSprintfFloat               3579         3686   +2.99%
>> BenchmarkManyArgs                   5573         6130   +9.99%
>> BenchmarkScanInts                5413886      5166830   -4.56%
>> BenchmarkScanRecursiveInt        6447875      6478815   +0.48%
>>
>>
>> https://codereview.appspot.com/8056043/
>
>

Sign in to reply to this message.

dave_cheney.net

Here are some linux/386 from a 4 core atom 330. 220887(~) % cat bytes.txt benchmark ...

12 years, 4 months ago (2013-03-31 01:25:18 UTC) #33

Here are some linux/386 from a 4 core atom 330.

220887(~) % cat bytes.txt 
benchmark                           old ns/op    new ns/op    delta
BenchmarkReadString                    121057       126571   +4.55%
BenchmarkBufferNotEmptyWriteRead      5503485      5454785   -0.88%
BenchmarkBufferFullSmallReads          891073       889944   -0.13%
BenchmarkIndexByte32                       94           94   -0.11%
BenchmarkIndexByte4K                     7781         7813   +0.41%
BenchmarkIndexByte4M                  8619393      8483188   -1.58%
BenchmarkIndexByte64M               136653719    135896031   -0.55%
BenchmarkIndexBytePortable32              466          465   -0.21%
BenchmarkIndexBytePortable4K            56721        56661   -0.11%
BenchmarkIndexBytePortable4M         58213450     58148782   -0.11%
BenchmarkIndexBytePortable64M       931072226    930535077   -0.06%
BenchmarkEqual32                          119           63  -46.64%
BenchmarkEqual4K                        10348         2443  -76.39%
BenchmarkEqual4M                     11544390      3224185  -72.07%
BenchmarkEqual64M                   185016250     56713834  -69.35%
BenchmarkEqualPort32                      782          785   +0.38%
BenchmarkEqualPort4K                    95370        96032   +0.69%
BenchmarkEqualPortable4M             98030915     98312648   +0.29%
BenchmarkEqualPortable64M          1568407685   1573236860   +0.31%
BenchmarkIndex32                         2319         1923  -17.08%
BenchmarkIndex4K                       354957       295392  -16.78%
BenchmarkIndex4M                    364731912    303230991  -16.86%
BenchmarkIndex64M                  2147483647   2147483647  -16.86%
BenchmarkIndexEasy32                      227          210   -7.49%
BenchmarkIndexEasy4K                     7899         7901   +0.03%
BenchmarkIndexEasy4M                  8486584      8516794   +0.36%
BenchmarkIndexEasy64M               135771448    136296410   +0.39%
BenchmarkCount32                         1838         1460  -20.57%
BenchmarkCount4K                       288436       220826  -23.44%
BenchmarkCount4M                    286733235    227035260  -20.82%
BenchmarkCount64M                  2147483647   2147483647  -21.37%
BenchmarkCountEasy32                      210          193   -8.10%
BenchmarkCountEasy4K                     7888         7875   -0.16%
BenchmarkCountEasy4M                  8483266      8527767   +0.52%
BenchmarkCountEasy64M               135677742    136246780   +0.42%
BenchmarkFields                     226913045    222574539   -1.91%
BenchmarkFieldsFunc                 226840521    222770626   -1.79%
BenchmarkTrimSpace                        477          503   +5.45%

benchmark                            old MB/s     new MB/s  speedup
BenchmarkReadString                    270.68       258.89    0.96x
BenchmarkIndexByte32                   338.57       339.07    1.00x
BenchmarkIndexByte4K                   526.40       524.23    1.00x
BenchmarkIndexByte4M                   486.61       494.43    1.02x
BenchmarkIndexByte64M                  491.09       493.83    1.01x
BenchmarkIndexBytePortable32            68.55        68.69    1.00x
BenchmarkIndexBytePortable4K            72.21        72.29    1.00x
BenchmarkIndexBytePortable4M            72.05        72.13    1.00x
BenchmarkIndexBytePortable64M           72.08        72.12    1.00x
BenchmarkEqual32                       267.72       503.99    1.88x
BenchmarkEqual4K                       395.79      1676.25    4.24x
BenchmarkEqual4M                       363.32      1300.89    3.58x
BenchmarkEqual64M                      362.72      1183.29    3.26x
BenchmarkEqualPort32                    40.88        40.74    1.00x
BenchmarkEqualPort4K                    42.95        42.65    0.99x
BenchmarkEqualPortable4M                42.79        42.66    1.00x
BenchmarkEqualPortable64M               42.79        42.66    1.00x
BenchmarkIndex32                        13.80        16.64    1.21x
BenchmarkIndex4K                        11.54        13.87    1.20x
BenchmarkIndex4M                        11.50        13.83    1.20x
BenchmarkIndex64M                       11.50        13.83    1.20x
BenchmarkIndexEasy32                   140.52       152.16    1.08x
BenchmarkIndexEasy4K                   518.55       518.40    1.00x
BenchmarkIndexEasy4M                   494.23       492.47    1.00x
BenchmarkIndexEasy64M                  494.28       492.37    1.00x
BenchmarkCount32                        17.40        21.91    1.26x
BenchmarkCount4K                        14.20        18.55    1.31x
BenchmarkCount4M                        14.63        18.47    1.26x
BenchmarkCount64M                       14.53        18.48    1.27x
BenchmarkCountEasy32                   152.37       165.42    1.09x
BenchmarkCountEasy4K                   519.27       520.10    1.00x
BenchmarkCountEasy4M                   494.42       491.84    0.99x
BenchmarkCountEasy64M                  494.62       492.55    1.00x
BenchmarkFields                          4.62         4.71    1.02x
BenchmarkFieldsFunc                      4.62         4.71    1.02x
220887(~) % cat strings.txt 
benchmark                           old ns/op    new ns/op    delta
BenchmarkGenericNoMatch                 19988        18825   -5.82%
BenchmarkGenericMatch1                  68559        68689   +0.19%
BenchmarkGenericMatch2                 364551       364833   +0.08%
BenchmarkSingleMaxSkipping              16873        15462   -8.36%
BenchmarkSingleLongSuffixFail           29288        28511   -2.65%
BenchmarkSingleMatch                   826237       743121  -10.06%
BenchmarkByteByteNoMatch                 3571         3340   -6.47%
BenchmarkByteByteMatch                   9512         7961  -16.31%
BenchmarkByteStringMatch                10368        10101   -2.58%
BenchmarkHTMLEscapeNew                   3159         3015   -4.56%
BenchmarkHTMLEscapeOld                   9431         9076   -3.76%
BenchmarkByteByteReplaces               44356        44370   +0.03%
BenchmarkByteByteMap                    22273        21531   -3.33%
BenchmarkIndexRune                        606          603   -0.50%
BenchmarkIndexRuneFastPath                451          454   +0.67%
BenchmarkIndex                            499          480   -3.81%
BenchmarkMapNoChanges                    2166         2145   -0.97%
BenchmarkIndexHard1                  20219289     18614626   -7.94%
BenchmarkIndexHard2                  18818291     18609084   -1.11%
BenchmarkIndexHard3                  18784650     18606772   -0.95%
BenchmarkCountHard1                  18776488     18589004   -1.00%
BenchmarkCountHard2                  18633691     18589560   -0.24%
BenchmarkCountHard3                  19331696     18586440   -3.86%
BenchmarkIndexTorture                  104744       104592   -0.15%
BenchmarkCountTorture                  104791       104574   -0.21%
BenchmarkCountTortureOverlapping     80928224     61623014  -23.85%
BenchmarkFields                     171732962    165121290   -3.85%
BenchmarkFieldsFunc                 165616856    165492483   -0.08%
BenchmarkSplit1                     187759567    168792316  -10.10%
BenchmarkSplit2                      48277196     48175434   -0.21%
BenchmarkSplit3                      40313030     39335365   -2.43%

benchmark                            old MB/s     new MB/s  speedup
BenchmarkSingleMaxSkipping             592.64       646.75    1.09x
BenchmarkSingleLongSuffixFail           34.21        35.14    1.03x
BenchmarkSingleMatch                    18.15        20.19    1.11x
BenchmarkFields                          6.11         6.35    1.04x
BenchmarkFieldsFunc                      6.33         6.34    1.00x
220887(~) % cat fmt.txt 
benchmark                      old ns/op    new ns/op    delta
BenchmarkSprintfEmpty                681          646   -5.14%
BenchmarkSprintfString              2417         2438   +0.87%
BenchmarkSprintfInt                 1904         1947   +2.26%
BenchmarkSprintfIntInt              2835         2810   -0.88%
BenchmarkSprintfPrefixedInt         3444         3487   +1.25%
BenchmarkSprintfFloat               4997         4978   -0.38%
BenchmarkManyArgs                  10427        10465   +0.36%
BenchmarkScanInts                7494022      7444609   -0.66%
BenchmarkScanRecursiveInt        8661682      8842178   +2.08%

Sign in to reply to this message.

dave_cheney.net

Here is a bit more data from the linxu/arm regression for Count4K before: Total: 118 ...

12 years, 4 months ago (2013-03-31 03:01:21 UTC) #34

khr1

I don't understand why it is so slow. A bit slower, maybe, but not that ...

12 years, 4 months ago (2013-03-31 04:14:03 UTC) #35

dave_cheney.net

Your new runtime.memeqbody is much faster than my old byte by byte attempt, i'm experimenting ...

12 years, 4 months ago (2013-03-31 11:15:18 UTC) #36

Your new runtime.memeqbody is much faster than my old byte by byte attempt,
i'm experimenting with copying it directly into bytes.Equals to see if that
causes a regression in bytes.Count.


On Sun, Mar 31, 2013 at 3:14 PM, Keith Randall <khr@google.com> wrote:

> I don't understand why it is so slow.  A bit slower, maybe, but not that
> much.  The main loop in both the old and new implementations for n<8 are
> now identical.  (The CountX benchmarks are doing n==7.)  The new
> implementation differs just by a few constant moves, a compare and branch,
> and a BL (together with its stack pointer bump).
>
> I'm happy to revert to the old code if we can't fix this.  I did the arm
> just for completeness, the big wins are in the x86 world.
>
>
> On Sat, Mar 30, 2013 at 8:01 PM, <dave@cheney.net> wrote:
>
>> Here is a bit more data from the linxu/arm regression for Count4K
>>
>> before:
>>
>> Total: 118 samples
>>       64  54.2%  54.2%      118 100.0% bytes.Count
>>       54  45.8% 100.0%       54  45.8% bytes.Equal
>>        0   0.0% 100.0%       64  54.2% bytes_test.BenchmarkCount4K
>>
>>
http://dave.cheney.net/paste/**bytes-before.svg<http://dave.cheney.net/paste/...
>>
>> after:
>>
>> Total: 180 samples
>>      133  73.9%  73.9%      139  77.2% bytes.Count
>>       38  21.1%  95.0%       38  21.1% runtime.memeqbody
>>        9   5.0% 100.0%      180 100.0% bytes.Equal
>>        0   0.0% 100.0%      137  76.1% bytes_test.BenchmarkCount4K
>>        0   0.0% 100.0%       45  25.0% bytes_test.bmCount
>>        0   0.0% 100.0%        4   2.2% gosched0
>>        0   0.0% 100.0%        4   2.2% testing.(*B).launch
>>        0   0.0% 100.0%        4   2.2% testing.(*B).runN
>>
>>
http://dave.cheney.net/paste/**bytes-after.svg<http://dave.cheney.net/paste/b...
>>
>>
https://codereview.appspot.**com/8056043/<https://codereview.appspot.com/8056...
>>
>
>  --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "golang-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-dev+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

Sign in to reply to this message.

dave_cheney.net

Here is another arm datapoint, from a pandaboard, Cortex-A9. alarm(~/go/src/pkg/bytes) % ~/go/misc/benchcmp {old,new}.txt benchmark old ...

12 years, 4 months ago (2013-03-31 11:33:58 UTC) #37

Here is another arm datapoint, from a pandaboard, Cortex-A9. 

alarm(~/go/src/pkg/bytes) % ~/go/misc/benchcmp {old,new}.txt
benchmark                           old ns/op    new ns/op    delta
BenchmarkReadString                    207691       212456   +2.29%
BenchmarkBufferNotEmptyWriteRead     10302655     10189737   -1.10%
BenchmarkBufferFullSmallReads         1472871      1447450   -1.73%
BenchmarkIndexByte32                      164          165   +0.61%
BenchmarkIndexByte4K                    13756        13779   +0.17%
BenchmarkIndexByte4M                 27057480     26680883   -1.39%
BenchmarkIndexByte64M               438272584    428702042   -2.18%
BenchmarkIndexBytePortable32              515          515   +0.00%
BenchmarkIndexBytePortable4K            56701        56541   -0.28%
BenchmarkIndexBytePortable4M         64782309     64326364   -0.70%
BenchmarkIndexBytePortable64M      1031455804   1030296066   -0.11%
BenchmarkEqual32                          188          106  -43.62%
BenchmarkEqual4K                        15506         4636  -70.10%
BenchmarkEqual4M                     45102310     46034946   +2.07%
BenchmarkEqual64M                   743546635   1001547877  +34.70%
BenchmarkEqualPort32                      807          796   -1.36%
BenchmarkEqualPort4K                    92419        90848   -1.70%
BenchmarkEqualPortable4M            106740630    108727363   +1.86%
BenchmarkEqualPortable64M          1774788812   1798989707   +1.36%
BenchmarkIndex32                         2503         2939  +17.42%
BenchmarkIndex4K                       374861       442905  +18.15%
BenchmarkIndex4M                    391445561    458860175  +17.22%
BenchmarkIndex64M                  6260006548   7347176750  +17.37%
BenchmarkIndexEasy32                      330          348   +5.45%
BenchmarkIndexEasy4K                    14001        13923   -0.56%
BenchmarkIndexEasy4M                 26652902     26637547   -0.06%
BenchmarkIndexEasy64M               428365684    428122196   -0.06%
BenchmarkCount32                         2622         2956  +12.74%
BenchmarkCount4K                       384545       444107  +15.49%
BenchmarkCount4M                    398216529    459708583  +15.44%
BenchmarkCount64M                  6369725802   7357919173  +15.51%
BenchmarkCountEasy32                      353          373   +5.67%
BenchmarkCountEasy4K                    13907        13935   +0.20%
BenchmarkCountEasy4M                 26631245     26946392   +1.18%
BenchmarkCountEasy64M               428438891    433169914   +1.10%
BenchmarkFields                     306475738    306250622   -0.07%
BenchmarkFieldsFunc                 305981589    306415421   +0.14%
BenchmarkTrimSpace                        753          733   -2.66%

benchmark                            old MB/s     new MB/s  speedup
BenchmarkReadString                    157.77       154.23    0.98x
BenchmarkIndexByte32                   194.79       193.42    0.99x
BenchmarkIndexByte4K                   297.75       297.24    1.00x
BenchmarkIndexByte4M                   155.01       157.20    1.01x
BenchmarkIndexByte64M                  153.12       156.54    1.02x
BenchmarkIndexBytePortable32            62.12        62.14    1.00x
BenchmarkIndexBytePortable4K            72.24        72.44    1.00x
BenchmarkIndexBytePortable4M            64.74        65.20    1.01x
BenchmarkIndexBytePortable64M           65.06        65.14    1.00x
BenchmarkEqual32                       169.90       300.05    1.77x
BenchmarkEqual4K                       264.16       883.50    3.34x
BenchmarkEqual4M                        93.00        91.11    0.98x
BenchmarkEqual64M                       90.26        67.01    0.74x
BenchmarkEqualPort32                    39.61        40.15    1.01x
BenchmarkEqualPort4K                    44.32        45.09    1.02x
BenchmarkEqualPortable4M                39.29        38.58    0.98x
BenchmarkEqualPortable64M               37.81        37.30    0.99x
BenchmarkIndex32                        12.78        10.89    0.85x
BenchmarkIndex4K                        10.93         9.25    0.85x
BenchmarkIndex4M                        10.71         9.14    0.85x
BenchmarkIndex64M                       10.72         9.13    0.85x
BenchmarkIndexEasy32                    96.80        91.88    0.95x
BenchmarkIndexEasy4K                   292.54       294.18    1.01x
BenchmarkIndexEasy4M                   157.37       157.46    1.00x
BenchmarkIndexEasy64M                  156.66       156.75    1.00x
BenchmarkCount32                        12.20        10.83    0.89x
BenchmarkCount4K                        10.65         9.22    0.87x
BenchmarkCount4M                        10.53         9.12    0.87x
BenchmarkCount64M                       10.54         9.12    0.87x
BenchmarkCountEasy32                    90.47        85.58    0.95x
BenchmarkCountEasy4K                   294.52       293.93    1.00x
BenchmarkCountEasy4M                   157.50       155.65    0.99x
BenchmarkCountEasy64M                  156.64       154.93    0.99x
BenchmarkFields                          3.42         3.42    1.00x
BenchmarkFieldsFunc                      3.43         3.42    1.00x

From my amateur eye, the A9 has a shorter pipeline and is probably taking a
smaller hit from the branch than the A15 in the chromebook.

Sign in to reply to this message.

bradfitz

On Sat, Mar 30, 2013 at 9:14 PM, Keith Randall <khr@google.com> wrote: > I don't ...

12 years, 4 months ago (2013-03-31 22:54:49 UTC) #38

dave_cheney.net

Last night I copied khr's new code back into bytes.Equals, removing all the indirection and ...

12 years, 4 months ago (2013-03-31 23:40:15 UTC) #39

khr1

If the simple implemention really is much faster for short strings, I'm in favor of ...

12 years, 4 months ago (2013-04-01 00:17:50 UTC) #40

dave_cheney.net

SGTM. Lets try that. On 01/04/2013, at 11:17, Keith Randall <khr@google.com> wrote: > If the ...

12 years, 4 months ago (2013-04-01 00:21:03 UTC) #41

khr

*** Submitted as https://code.google.com/p/go/source/detail?r=7f91fab04475 *** runtime: Implement faster equals for strings and bytes. (amd64) benchmark ...

12 years, 4 months ago (2013-04-02 23:26:20 UTC) #42

dave_cheney.net

Thanks khr, i'll try to integrate your faster word-at-a-time version into the asm_arm.s Cheers Dave

12 years, 4 months ago (2013-04-03 01:56:26 UTC) #43

albert.strasheim

Since this change has gone in, I'm seeing the following crash on linux/amd64: go test ...

12 years, 4 months ago (2013-04-03 05:43:38 UTC) #44

Message was sent while issue was closed.

Since this change has gone in, I'm seeing the following crash on linux/amd64:

go test -v -short -cpu 1,2,4,8,16,256 std

=== RUN TestLargeStringWrites-2
unexpected fault address 0xc2003da008
fatal error: fault
[signal 0xb code=0x2 addr=0xc2003da008 pc=0x40e6fb]

goroutine 57 [running]:
[fp=0xc20042a140] runtime.throw(0x605737)
	/build/go.tip/go/src/pkg/runtime/panic.c:473 +0x67
[fp=0xc20042a158] runtime.sigpanic()
	/build/go.tip/go/src/pkg/runtime/os_linux.c:239 +0xe7
[fp=0xc20042a210] sweepspan(0xc20008e480, 0xf8)
	/build/go.tip/go/src/pkg/runtime/mgc0.c:1625 +0x33b
[fp=0xc20042a288] runtime.parfordo(0xc20008e480)
	/build/go.tip/go/src/pkg/runtime/parfor.c:120 +0x125
[fp=0xc20042a3b8] gc(0x7f52d1892c1c)
	/build/go.tip/go/src/pkg/runtime/mgc0.c:1940 +0x2e3
----- stack segment boundary -----
[fp=0x7f52d1892c30] runtime.gc(0xc200000000)
	/build/go.tip/go/src/pkg/runtime/mgc0.c:1857 +0x11b
[fp=0x7f52d1892c88] runtime.mallocgc(0x12000, 0x100000001, 0x7f5200000001)
	/build/go.tip/go/src/pkg/runtime/zmalloc_linux_amd64.c:101 +0x1e4
[fp=0x7f52d1892cc0] makeslice1(0x4e76e0, 0x11170, 0x11170, 0x7f52d1892d08)
	/build/go.tip/go/src/pkg/runtime/slice.c:61 +0x89
[fp=0x7f52d1892cf0] runtime.makeslice(0x4e76e0, 0x11170, 0x11170, 0xc, 0x11170,
...)
	/build/go.tip/go/src/pkg/runtime/slice.c:34 +0x9a
[fp=0x7f52d1892d28] bytes.makeSlice(0x11170, 0x0, 0x0, 0x0)
	/build/go.tip/go/src/pkg/bytes/buffer.go:191 +0x61
[fp=0x7f52d1892de0] bytes.(*Buffer).grow(0xc20045a310, 0x2710, 0x2)
	/build/go.tip/go/src/pkg/bytes/buffer.go:99 +0x20b
[fp=0x7f52d1892e30] bytes.(*Buffer).WriteString(0xc20045a310, 0xc2001c4000,
0x2710, 0xc20045a310, 0x0, ...)
	/build/go.tip/go/src/pkg/bytes/buffer.go:136 +0x49
[fp=0x7f52d1892ef0] bytes_test.fillString(0xc200481120, 0x54e990, 0x13,
0xc20045a310, 0xc2004cf000, ...)
	/build/go.tip/go/src/pkg/bytes/buffer_test.go:55 +0xe8
[fp=0x7f52d1892f78] bytes_test.TestLargeStringWrites(0xc200481120)
	/build/go.tip/go/src/pkg/bytes/buffer_test.go:178 +0xd5
[fp=0x7f52d1892fa8] testing.tRunner(0xc200481120, 0x608bc8)
	/build/go.tip/go/src/pkg/testing/testing.go:349 +0x8a
[fp=0x7f52d1892fb0] runtime.goexit()
	/build/go.tip/go/src/pkg/runtime/proc.c:1214
created by testing.RunTests
	/build/go.tip/go/src/pkg/testing/testing.go:429 +0x86b

goroutine 1 [chan receive]:
runtime.park(0x409e70, 0xc200339d10, 0x60682a)
	/build/go.tip/go/src/pkg/runtime/proc.c:1167 +0x64
runtime.chanrecv(0x4e8260, 0xc200339cc0, 0x7f52d1884ce0, 0x0, 0x0, ...)
	/build/go.tip/go/src/pkg/runtime/chan.c:366 +0x566
runtime.chanrecv1()
	/build/go.tip/go/src/pkg/runtime/chan.c:458 +0x38
testing.RunTests(0x56e1e8, 0x608b80, 0x33, 0x33, 0x1, ...)
	/build/go.tip/go/src/pkg/testing/testing.go:430 +0x88e
testing.Main(0x56e1e8, 0x608b80, 0x33, 0x33, 0x608700, ...)
	/build/go.tip/go/src/pkg/testing/testing.go:361 +0x8a
main.main()
	bytes/_test/_testmain.go:245 +0x9a
runtime.main()
	/build/go.tip/go/src/pkg/runtime/proc.c:182 +0x92
runtime.goexit()
	/build/go.tip/go/src/pkg/runtime/proc.c:1214

goroutine 2 [syscall]:
runtime.entersyscallblock()
	/build/go.tip/go/src/pkg/runtime/proc.c:1324 +0x16e
runtime.MHeap_Scavenger()
	/build/go.tip/go/src/pkg/runtime/mheap.c:435 +0xee
runtime.goexit()
	/build/go.tip/go/src/pkg/runtime/proc.c:1214
created by runtime.main
	/build/go.tip/go/src/pkg/runtime/proc.c:165
FAIL	bytes	0.158s

Sign in to reply to this message.

dave_cheney.net

Go 1.1 will be fast enough[1] without this change. If it's unstable then my vote ...

12 years, 4 months ago (2013-04-03 05:47:04 UTC) #45

Go 1.1 will be fast enough[1] without this change. If it's unstable
then my vote would be to revert this change so we have a chance of
publishing a release candidate. Thoughts ?


[1]. https://github.com/davecheney/autobench

On Wed, Apr 3, 2013 at 4:43 PM,  <fullung@gmail.com> wrote:
> Since this change has gone in, I'm seeing the following crash on
> linux/amd64:
>
> go test -v -short -cpu 1,2,4,8,16,256 std
>
> === RUN TestLargeStringWrites-2
> unexpected fault address 0xc2003da008
> fatal error: fault
> [signal 0xb code=0x2 addr=0xc2003da008 pc=0x40e6fb]
>
> goroutine 57 [running]:
> [fp=0xc20042a140] runtime.throw(0x605737)
>         /build/go.tip/go/src/pkg/runtime/panic.c:473 +0x67
> [fp=0xc20042a158] runtime.sigpanic()
>         /build/go.tip/go/src/pkg/runtime/os_linux.c:239 +0xe7
> [fp=0xc20042a210] sweepspan(0xc20008e480, 0xf8)
>         /build/go.tip/go/src/pkg/runtime/mgc0.c:1625 +0x33b
> [fp=0xc20042a288] runtime.parfordo(0xc20008e480)
>         /build/go.tip/go/src/pkg/runtime/parfor.c:120 +0x125
> [fp=0xc20042a3b8] gc(0x7f52d1892c1c)
>         /build/go.tip/go/src/pkg/runtime/mgc0.c:1940 +0x2e3
> ----- stack segment boundary -----
> [fp=0x7f52d1892c30] runtime.gc(0xc200000000)
>         /build/go.tip/go/src/pkg/runtime/mgc0.c:1857 +0x11b
> [fp=0x7f52d1892c88] runtime.mallocgc(0x12000, 0x100000001,
> 0x7f5200000001)
>         /build/go.tip/go/src/pkg/runtime/zmalloc_linux_amd64.c:101 +0x1e4
> [fp=0x7f52d1892cc0] makeslice1(0x4e76e0, 0x11170, 0x11170,
> 0x7f52d1892d08)
>         /build/go.tip/go/src/pkg/runtime/slice.c:61 +0x89
> [fp=0x7f52d1892cf0] runtime.makeslice(0x4e76e0, 0x11170, 0x11170, 0xc,
> 0x11170, ...)
>         /build/go.tip/go/src/pkg/runtime/slice.c:34 +0x9a
> [fp=0x7f52d1892d28] bytes.makeSlice(0x11170, 0x0, 0x0, 0x0)
>         /build/go.tip/go/src/pkg/bytes/buffer.go:191 +0x61
> [fp=0x7f52d1892de0] bytes.(*Buffer).grow(0xc20045a310, 0x2710, 0x2)
>         /build/go.tip/go/src/pkg/bytes/buffer.go:99 +0x20b
> [fp=0x7f52d1892e30] bytes.(*Buffer).WriteString(0xc20045a310,
> 0xc2001c4000, 0x2710, 0xc20045a310, 0x0, ...)
>         /build/go.tip/go/src/pkg/bytes/buffer.go:136 +0x49
> [fp=0x7f52d1892ef0] bytes_test.fillString(0xc200481120, 0x54e990, 0x13,
> 0xc20045a310, 0xc2004cf000, ...)
>         /build/go.tip/go/src/pkg/bytes/buffer_test.go:55 +0xe8
> [fp=0x7f52d1892f78] bytes_test.TestLargeStringWrites(0xc200481120)
>         /build/go.tip/go/src/pkg/bytes/buffer_test.go:178 +0xd5
> [fp=0x7f52d1892fa8] testing.tRunner(0xc200481120, 0x608bc8)
>         /build/go.tip/go/src/pkg/testing/testing.go:349 +0x8a
> [fp=0x7f52d1892fb0] runtime.goexit()
>         /build/go.tip/go/src/pkg/runtime/proc.c:1214
> created by testing.RunTests
>         /build/go.tip/go/src/pkg/testing/testing.go:429 +0x86b
>
> goroutine 1 [chan receive]:
> runtime.park(0x409e70, 0xc200339d10, 0x60682a)
>         /build/go.tip/go/src/pkg/runtime/proc.c:1167 +0x64
> runtime.chanrecv(0x4e8260, 0xc200339cc0, 0x7f52d1884ce0, 0x0, 0x0, ...)
>         /build/go.tip/go/src/pkg/runtime/chan.c:366 +0x566
> runtime.chanrecv1()
>         /build/go.tip/go/src/pkg/runtime/chan.c:458 +0x38
> testing.RunTests(0x56e1e8, 0x608b80, 0x33, 0x33, 0x1, ...)
>         /build/go.tip/go/src/pkg/testing/testing.go:430 +0x88e
> testing.Main(0x56e1e8, 0x608b80, 0x33, 0x33, 0x608700, ...)
>         /build/go.tip/go/src/pkg/testing/testing.go:361 +0x8a
> main.main()
>         bytes/_test/_testmain.go:245 +0x9a
> runtime.main()
>         /build/go.tip/go/src/pkg/runtime/proc.c:182 +0x92
> runtime.goexit()
>         /build/go.tip/go/src/pkg/runtime/proc.c:1214
>
> goroutine 2 [syscall]:
> runtime.entersyscallblock()
>         /build/go.tip/go/src/pkg/runtime/proc.c:1324 +0x16e
> runtime.MHeap_Scavenger()
>         /build/go.tip/go/src/pkg/runtime/mheap.c:435 +0xee
> runtime.goexit()
>         /build/go.tip/go/src/pkg/runtime/proc.c:1214
> created by runtime.main
>         /build/go.tip/go/src/pkg/runtime/proc.c:165
> FAIL    bytes   0.158s
>
> https://codereview.appspot.com/8056043/

Sign in to reply to this message.

albert.strasheim

On 2013/04/03 05:47:04, dfc wrote: > Go 1.1 will be fast enough[1] without this change. ...

12 years, 4 months ago (2013-04-03 05:51:22 UTC) #46

remyoudompheng

On 3 Apr 2013 07:47, "Dave Cheney" <dave@cheney.net> wrote: > > Go 1.1 will be ...

12 years, 4 months ago (2013-04-03 05:53:53 UTC) #47

remyoudompheng

On 2013/4/3 <fullung@gmail.com> wrote: > On 2013/04/03 05:47:04, dfc wrote: >> >> Go 1.1 will ...

12 years, 4 months ago (2013-04-03 06:33:05 UTC) #48

dave_cheney.net

12 years, 4 months ago (2013-04-03 07:35:51 UTC) #49

Wow, great catch!

On Wed, Apr 3, 2013 at 5:33 PM, Rémy Oudompheng
<remyoudompheng@gmail.com> wrote:
> On 2013/4/3  <fullung@gmail.com> wrote:
>> On 2013/04/03 05:47:04, dfc wrote:
>>>
>>> Go 1.1 will be fast enough[1] without this change. If it's unstable
>>> then my vote would be to revert this change so we have a chance of
>>> publishing a release candidate. Thoughts ?
>>
>>
>> Without looking past the stack trace, it seems like it might be a GC
>> issue though, in which case reverting this isn't going to fix the
>> underlying problem...
>
> Please try https://codereview.appspot.com/8300044
> I'm surprised the test could possibly work for anyone.
> [maybe the better fix is to check the error returned by syscall.Mprotect]
>
> Rémy.

Sign in to reply to this message.

Issue 8056043: code review 8056043: runtime: Implement faster equals for strings and bytes. (Closed)

Description

Patch Set 1 #

Patch Set 2 : diff -r ff83e39f1348 https://khr%40golang.org@code.google.com/p/go/ #

Patch Set 3 : diff -r d040d5f08d5d https://khr%40golang.org@code.google.com/p/go/ #

Patch Set 4 : diff -r d040d5f08d5d https://khr%40golang.org@code.google.com/p/go/ #

Patch Set 5 : diff -r d040d5f08d5d https://khr%40golang.org@code.google.com/p/go/ #

Patch Set 6 : diff -r d040d5f08d5d https://khr%40golang.org@code.google.com/p/go/ #

Patch Set 7 : diff -r d040d5f08d5d https://khr%40golang.org@code.google.com/p/go/ #

Patch Set 8 : diff -r 2433c3567273 https://khr%40golang.org@code.google.com/p/go/ #

Patch Set 9 : diff -r 7505bf6d8988 https://khr%40golang.org@code.google.com/p/go/ #

Patch Set 10 : diff -r 7505bf6d8988 https://khr%40golang.org@code.google.com/p/go/ #

Patch Set 11 : diff -r 7505bf6d8988 https://khr%40golang.org@code.google.com/p/go/ #

Patch Set 12 : diff -r 52e3407d249f https://khr%40golang.org@code.google.com/p/go/ #

Patch Set 13 : diff -r 52e3407d249f https://khr%40golang.org@code.google.com/p/go/ #

Patch Set 14 : diff -r 52e3407d249f https://khr%40golang.org@code.google.com/p/go/ #

Messages