Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(1257)

Issue 9038048: code review 9038048: runtime: faster x86 memmove (a.k.a. built-in copy()) (Closed)

Can't Edit
Can't Publish+Mail
Start Review
Created:
10 years, 11 months ago by khr
Modified:
10 years, 11 months ago
Reviewers:
iant
CC:
golang-dev, bradfitz, remyoudompheng, khr1, iant, Dominik Honnef
Visibility:
Public.

Description

runtime: faster x86 memmove (a.k.a. built-in copy()) REP instructions have a high startup cost, so we handle small sizes with some straightline code. The REP MOVSx instructions are really fast for large sizes. The cutover is approximately 1K. We implement up to 128/256 because that is the maximum SSE register load (loading all data into registers before any stores lets us ignore copy direction). (on a Sandy Bridge E5-1650 @ 3.20GHz) benchmark old ns/op new ns/op delta BenchmarkMemmove0 3 3 +0.86% BenchmarkMemmove1 5 5 +5.40% BenchmarkMemmove2 18 8 -56.84% BenchmarkMemmove3 18 7 -58.45% BenchmarkMemmove4 36 7 -78.63% BenchmarkMemmove5 36 8 -77.91% BenchmarkMemmove6 36 8 -77.76% BenchmarkMemmove7 36 8 -77.82% BenchmarkMemmove8 18 8 -56.33% BenchmarkMemmove9 18 7 -58.34% BenchmarkMemmove10 18 7 -58.34% BenchmarkMemmove11 18 7 -58.45% BenchmarkMemmove12 36 7 -78.51% BenchmarkMemmove13 36 7 -78.48% BenchmarkMemmove14 36 7 -78.56% BenchmarkMemmove15 36 7 -78.56% BenchmarkMemmove16 18 7 -58.24% BenchmarkMemmove32 18 8 -54.33% BenchmarkMemmove64 18 8 -53.37% BenchmarkMemmove128 20 9 -55.93% BenchmarkMemmove256 25 11 -55.16% BenchmarkMemmove512 33 33 -1.19% BenchmarkMemmove1024 43 44 +2.06% BenchmarkMemmove2048 61 61 +0.16% BenchmarkMemmove4096 95 95 +0.00%

Patch Set 1 #

Patch Set 2 : diff -r 9f146b985681 https://code.google.com/p/go/ #

Patch Set 3 : diff -r 3623b5f14f72 https://khr%40golang.org@code.google.com/p/go/ #

Patch Set 4 : diff -r 3623b5f14f72 https://khr%40golang.org@code.google.com/p/go/ #

Patch Set 5 : diff -r 3623b5f14f72 https://khr%40golang.org@code.google.com/p/go/ #

Patch Set 6 : diff -r 3623b5f14f72 https://khr%40golang.org@code.google.com/p/go/ #

Patch Set 7 : diff -r c58a49d330d1 https://khr%40golang.org@code.google.com/p/go/ #

Unified diffs Side-by-side diffs Delta from patch set Stats (+338 lines, -23 lines) Patch
M src/pkg/runtime/memmove_386.s View 1 2 3 4 5 3 chunks +97 lines, -12 lines 0 comments Download
M src/pkg/runtime/memmove_amd64.s View 1 2 3 chunks +125 lines, -11 lines 0 comments Download
A src/pkg/runtime/memmove_test.go View 1 2 1 chunk +116 lines, -0 lines 0 comments Download

Messages

Total messages: 10
khr
Hello golang-dev@googlegroups.com, I'd like you to review this change to https://khr%40golang.org@code.google.com/p/go/
10 years, 11 months ago (2013-05-16 20:51:13 UTC) #1
bradfitz
Nice. Update CL to say which CPU this was on? The cutover point will likely ...
10 years, 11 months ago (2013-05-16 21:52:03 UTC) #2
remyoudompheng
This is very interesting, I should try combinations of this with CL9101048. 2013/5/16 Brad Fitzpatrick ...
10 years, 11 months ago (2013-05-16 21:54:28 UTC) #3
khr1
If you're calling memmove directly with a constant size, maybe we should split out the ...
10 years, 11 months ago (2013-05-16 22:06:36 UTC) #4
remyoudompheng
On 2013/05/16 22:06:36, khr1 wrote: > If you're calling memmove directly with a constant size, ...
10 years, 11 months ago (2013-05-16 23:06:01 UTC) #5
remyoudompheng
On 2013/05/16 23:06:01, remyoudompheng wrote: > On 2013/05/16 22:06:36, khr1 wrote: > > If you're ...
10 years, 11 months ago (2013-05-16 23:06:41 UTC) #6
iant
These timings do very an awful lot on different versions of Intel CPUs, so it's ...
10 years, 11 months ago (2013-05-17 13:55:29 UTC) #7
Dominik Honnef
Testing just CL9038048 Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz benchmark old ns/op new ns/op delta ...
10 years, 11 months ago (2013-05-17 14:42:18 UTC) #8
iant
LGTM
10 years, 11 months ago (2013-05-17 17:33:07 UTC) #9
khr
10 years, 11 months ago (2013-05-17 19:53:54 UTC) #10
*** Submitted as https://code.google.com/p/go/source/detail?r=4cb93e2900d0 ***

runtime: faster x86 memmove (a.k.a. built-in copy())

REP instructions have a high startup cost, so we handle small
sizes with some straightline code.  The REP MOVSx instructions
are really fast for large sizes.  The cutover is approximately
1K.  We implement up to 128/256 because that is the maximum
SSE register load (loading all data into registers before any
stores lets us ignore copy direction).

(on a Sandy Bridge E5-1650 @ 3.20GHz)
benchmark               old ns/op    new ns/op    delta
BenchmarkMemmove0               3            3   +0.86%
BenchmarkMemmove1               5            5   +5.40%
BenchmarkMemmove2              18            8  -56.84%
BenchmarkMemmove3              18            7  -58.45%
BenchmarkMemmove4              36            7  -78.63%
BenchmarkMemmove5              36            8  -77.91%
BenchmarkMemmove6              36            8  -77.76%
BenchmarkMemmove7              36            8  -77.82%
BenchmarkMemmove8              18            8  -56.33%
BenchmarkMemmove9              18            7  -58.34%
BenchmarkMemmove10             18            7  -58.34%
BenchmarkMemmove11             18            7  -58.45%
BenchmarkMemmove12             36            7  -78.51%
BenchmarkMemmove13             36            7  -78.48%
BenchmarkMemmove14             36            7  -78.56%
BenchmarkMemmove15             36            7  -78.56%
BenchmarkMemmove16             18            7  -58.24%
BenchmarkMemmove32             18            8  -54.33%
BenchmarkMemmove64             18            8  -53.37%
BenchmarkMemmove128            20            9  -55.93%
BenchmarkMemmove256            25           11  -55.16%
BenchmarkMemmove512            33           33   -1.19%
BenchmarkMemmove1024           43           44   +2.06%
BenchmarkMemmove2048           61           61   +0.16%
BenchmarkMemmove4096           95           95   +0.00%

R=golang-dev, bradfitz, remyoudompheng, khr, iant, dominik.honnef
CC=golang-dev
https://codereview.appspot.com/9038048
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b