Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(13)

Issue 90500043: code review 90500043: runtime: speed up amd64 memmove (Closed)

Can't Edit
Can't Publish+Mail
Start Review
Created:
11 years, 1 month ago by ruiu
Modified:
10 years, 11 months ago
Reviewers:
khr
CC:
golang-codereviews, gobot, khr
Visibility:
Public.

Description

runtime: speed up amd64 memmove MOV with SSE registers seems faster than REP MOVSQ if the size being copied is less than about 2K. Previously we didn't use MOV if the memory region is larger than 256 byte. This patch improves the performance of 257 ~ 2048 byte non-overlapping copy by using MOV. Here is the benchmark result on Intel Xeon 3.5GHz (Nehalem). benchmark old ns/op new ns/op delta BenchmarkMemmove16 4 4 +0.42% BenchmarkMemmove32 5 5 -0.20% BenchmarkMemmove64 6 6 -0.81% BenchmarkMemmove128 7 7 -0.82% BenchmarkMemmove256 10 10 +1.92% BenchmarkMemmove512 29 16 -44.90% BenchmarkMemmove1024 37 25 -31.55% BenchmarkMemmove2048 55 44 -19.46% BenchmarkMemmove4096 92 91 -0.76% benchmark old MB/s new MB/s speedup BenchmarkMemmove16 3370.61 3356.88 1.00x BenchmarkMemmove32 6368.68 6386.99 1.00x BenchmarkMemmove64 10367.37 10462.62 1.01x BenchmarkMemmove128 17551.16 17713.48 1.01x BenchmarkMemmove256 24692.81 24142.99 0.98x BenchmarkMemmove512 17428.70 31687.72 1.82x BenchmarkMemmove1024 27401.82 40009.45 1.46x BenchmarkMemmove2048 36884.86 45766.98 1.24x BenchmarkMemmove4096 44295.91 44627.86 1.01x

Patch Set 1 #

Patch Set 2 : diff -r 74a1371ffaf1 https://code.google.com/p/go #

Patch Set 3 : diff -r 74a1371ffaf1 https://code.google.com/p/go #

Patch Set 4 : diff -r 2af1cf6a6559 https://code.google.com/p/go #

Patch Set 5 : diff -r 2af1cf6a6559 https://code.google.com/p/go #

Total comments: 2

Patch Set 6 : diff -r 2af1cf6a6559 https://code.google.com/p/go #

Patch Set 7 : diff -r ee11f19bc514 https://code.google.com/p/go #

Unified diffs Side-by-side diffs Delta from patch set Stats (+48 lines, -4 lines) Patch
M src/pkg/runtime/memmove_amd64.s View 1 2 3 4 5 3 chunks +48 lines, -4 lines 0 comments Download

Messages

Total messages: 7
ruiu
Hello golang-codereviews@googlegroups.com, I'd like you to review this change to https://code.google.com/p/go
11 years ago (2014-06-17 16:12:00 UTC) #1
gobot
R=khr@golang.org (assigned by iant@golang.org)
11 years ago (2014-06-17 17:12:35 UTC) #2
khr
On 2014/06/17 17:12:35, gobot wrote: > mailto:R=khr@golang.org (assigned by mailto:iant@golang.org) LGTM. The tree isn't open ...
11 years ago (2014-06-17 21:30:00 UTC) #3
khr
https://codereview.appspot.com/90500043/diff/70001/src/pkg/runtime/memmove_amd64.s File src/pkg/runtime/memmove_amd64.s (right): https://codereview.appspot.com/90500043/diff/70001/src/pkg/runtime/memmove_amd64.s#newcode250 src/pkg/runtime/memmove_amd64.s:250: JGE move_257through2048 Make this a JG so it matches ...
11 years ago (2014-06-17 21:30:22 UTC) #4
ruiu
https://codereview.appspot.com/90500043/diff/70001/src/pkg/runtime/memmove_amd64.s File src/pkg/runtime/memmove_amd64.s (right): https://codereview.appspot.com/90500043/diff/70001/src/pkg/runtime/memmove_amd64.s#newcode250 src/pkg/runtime/memmove_amd64.s:250: JGE move_257through2048 Making this a JG makes memmove executing ...
10 years, 12 months ago (2014-06-18 18:19:13 UTC) #5
khr
On 2014/06/18 18:19:13, ruiu wrote: > https://codereview.appspot.com/90500043/diff/70001/src/pkg/runtime/memmove_amd64.s > File src/pkg/runtime/memmove_amd64.s (right): > > https://codereview.appspot.com/90500043/diff/70001/src/pkg/runtime/memmove_amd64.s#newcode250 > ...
10 years, 11 months ago (2014-06-23 18:16:53 UTC) #6
ruiu
10 years, 11 months ago (2014-06-23 19:06:29 UTC) #7
*** Submitted as https://code.google.com/p/go/source/detail?r=d371eab5a39e ***

runtime: speed up amd64 memmove

MOV with SSE registers seems faster than REP MOVSQ if the
size being copied is less than about 2K. Previously we
didn't use MOV if the memory region is larger than 256
byte. This patch improves the performance of 257 ~ 2048
byte non-overlapping copy by using MOV.

Here is the benchmark result on Intel Xeon 3.5GHz (Nehalem).

benchmark               old ns/op    new ns/op    delta
BenchmarkMemmove16              4            4   +0.42%
BenchmarkMemmove32              5            5   -0.20%
BenchmarkMemmove64              6            6   -0.81%
BenchmarkMemmove128             7            7   -0.82%
BenchmarkMemmove256            10           10   +1.92%
BenchmarkMemmove512            29           16  -44.90%
BenchmarkMemmove1024           37           25  -31.55%
BenchmarkMemmove2048           55           44  -19.46%
BenchmarkMemmove4096           92           91   -0.76%

benchmark                old MB/s     new MB/s  speedup
BenchmarkMemmove16        3370.61      3356.88    1.00x
BenchmarkMemmove32        6368.68      6386.99    1.00x
BenchmarkMemmove64       10367.37     10462.62    1.01x
BenchmarkMemmove128      17551.16     17713.48    1.01x
BenchmarkMemmove256      24692.81     24142.99    0.98x
BenchmarkMemmove512      17428.70     31687.72    1.82x
BenchmarkMemmove1024     27401.82     40009.45    1.46x
BenchmarkMemmove2048     36884.86     45766.98    1.24x
BenchmarkMemmove4096     44295.91     44627.86    1.01x

LGTM=khr
R=golang-codereviews, gobot, khr
CC=golang-codereviews
https://codereview.appspot.com/90500043
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b