Descriptionruntime: get rid of most uses of REP for copying/zeroing.
REP MOVSQ and REP STOSQ have a really high startup overhead.
Use a Duff's device to do the repetition instead.
benchmark old ns/op new ns/op delta
BenchmarkClearFat32 7.20 1.60 -77.78%
BenchmarkCopyFat32 6.88 2.38 -65.41%
BenchmarkClearFat64 7.15 3.20 -55.24%
BenchmarkCopyFat64 6.88 3.44 -50.00%
BenchmarkClearFat128 9.53 5.34 -43.97%
BenchmarkCopyFat128 9.27 5.56 -40.02%
BenchmarkClearFat256 13.8 9.53 -30.94%
BenchmarkCopyFat256 13.5 10.3 -23.70%
BenchmarkClearFat512 22.3 18.0 -19.28%
BenchmarkCopyFat512 22.0 19.7 -10.45%
BenchmarkCopyFat1024 36.5 38.4 +5.21%
BenchmarkClearFat1024 35.1 35.0 -0.28%
TODO: use for stack frame zeroing
TODO: REP prefixes are still used for "reverse" copying when src/dst
regions overlap. Might be worth fixing.
Patch Set 1 #Patch Set 2 : diff -r a70a32dc121a https://khr%40golang.org@code.google.com/p/go/ #Patch Set 3 : diff -r a70a32dc121a https://khr%40golang.org@code.google.com/p/go/ #
Total comments: 2
Patch Set 4 : diff -r 1427fb6bcfa3 https://khr%40golang.org@code.google.com/p/go/ #Patch Set 5 : diff -r 1427fb6bcfa3 https://khr%40golang.org@code.google.com/p/go/ #
MessagesTotal messages: 6
|