Descriptionruntime: faster x86 memmove (a.k.a. built-in copy())
REP instructions have a high startup cost, so we handle small
sizes with some straightline code. The REP MOVSx instructions
are really fast for large sizes. The cutover is approximately
1K. We implement up to 128/256 because that is the maximum
SSE register load (loading all data into registers before any
stores lets us ignore copy direction).
(on a Sandy Bridge E5-1650 @ 3.20GHz)
benchmark old ns/op new ns/op delta
BenchmarkMemmove0 3 3 +0.86%
BenchmarkMemmove1 5 5 +5.40%
BenchmarkMemmove2 18 8 -56.84%
BenchmarkMemmove3 18 7 -58.45%
BenchmarkMemmove4 36 7 -78.63%
BenchmarkMemmove5 36 8 -77.91%
BenchmarkMemmove6 36 8 -77.76%
BenchmarkMemmove7 36 8 -77.82%
BenchmarkMemmove8 18 8 -56.33%
BenchmarkMemmove9 18 7 -58.34%
BenchmarkMemmove10 18 7 -58.34%
BenchmarkMemmove11 18 7 -58.45%
BenchmarkMemmove12 36 7 -78.51%
BenchmarkMemmove13 36 7 -78.48%
BenchmarkMemmove14 36 7 -78.56%
BenchmarkMemmove15 36 7 -78.56%
BenchmarkMemmove16 18 7 -58.24%
BenchmarkMemmove32 18 8 -54.33%
BenchmarkMemmove64 18 8 -53.37%
BenchmarkMemmove128 20 9 -55.93%
BenchmarkMemmove256 25 11 -55.16%
BenchmarkMemmove512 33 33 -1.19%
BenchmarkMemmove1024 43 44 +2.06%
BenchmarkMemmove2048 61 61 +0.16%
BenchmarkMemmove4096 95 95 +0.00%
Patch Set 1 #Patch Set 2 : diff -r 9f146b985681 https://code.google.com/p/go/ #Patch Set 3 : diff -r 3623b5f14f72 https://khr%40golang.org@code.google.com/p/go/ #Patch Set 4 : diff -r 3623b5f14f72 https://khr%40golang.org@code.google.com/p/go/ #Patch Set 5 : diff -r 3623b5f14f72 https://khr%40golang.org@code.google.com/p/go/ #Patch Set 6 : diff -r 3623b5f14f72 https://khr%40golang.org@code.google.com/p/go/ #Patch Set 7 : diff -r c58a49d330d1 https://khr%40golang.org@code.google.com/p/go/ #
MessagesTotal messages: 10
|