Descriptionruntime: speed up amd64 memmove
MOV with SSE registers seems faster than REP MOVSQ if the
size being copied is less than about 2K. Previously we
didn't use MOV if the memory region is larger than 256
byte. This patch improves the performance of 257 ~ 2048
byte non-overlapping copy by using MOV.
Here is the benchmark result on Intel Xeon 3.5GHz (Nehalem).
benchmark old ns/op new ns/op delta
BenchmarkMemmove16 4 4 +0.42%
BenchmarkMemmove32 5 5 -0.20%
BenchmarkMemmove64 6 6 -0.81%
BenchmarkMemmove128 7 7 -0.82%
BenchmarkMemmove256 10 10 +1.92%
BenchmarkMemmove512 29 16 -44.90%
BenchmarkMemmove1024 37 25 -31.55%
BenchmarkMemmove2048 55 44 -19.46%
BenchmarkMemmove4096 92 91 -0.76%
benchmark old MB/s new MB/s speedup
BenchmarkMemmove16 3370.61 3356.88 1.00x
BenchmarkMemmove32 6368.68 6386.99 1.00x
BenchmarkMemmove64 10367.37 10462.62 1.01x
BenchmarkMemmove128 17551.16 17713.48 1.01x
BenchmarkMemmove256 24692.81 24142.99 0.98x
BenchmarkMemmove512 17428.70 31687.72 1.82x
BenchmarkMemmove1024 27401.82 40009.45 1.46x
BenchmarkMemmove2048 36884.86 45766.98 1.24x
BenchmarkMemmove4096 44295.91 44627.86 1.01x
Patch Set 1 #Patch Set 2 : diff -r 74a1371ffaf1 https://code.google.com/p/go #Patch Set 3 : diff -r 74a1371ffaf1 https://code.google.com/p/go #Patch Set 4 : diff -r 2af1cf6a6559 https://code.google.com/p/go #Patch Set 5 : diff -r 2af1cf6a6559 https://code.google.com/p/go #
Total comments: 2
Patch Set 6 : diff -r 2af1cf6a6559 https://code.google.com/p/go #Patch Set 7 : diff -r ee11f19bc514 https://code.google.com/p/go #MessagesTotal messages: 7
|