Descriptiondecode: Faster decode by using copy when possible.
Profiling shows that the loop to copy data on decode is
a hot point (duh). Changing it to the copy() builtin would make
it faster because the runtime can do the range checks for the
whole copy once instead of twice per byte.
However, builtin copy of overlapping data doesn't work right
in this context, so we need to be careful to only use it
in the non-overlapping case.
Benchmark results:
benchmark old ns/op new ns/op delta
BenchmarkWordsDecode1e3 2506 2037 -18.72%
BenchmarkWordsDecode1e4 35477 24741 -30.26%
BenchmarkWordsDecode1e5 458040 395472 -13.66%
BenchmarkWordsDecode1e6 4536036 3886377 -14.32%
BenchmarkWordsEncode1e3 10398 10157 -2.32%
BenchmarkWordsEncode1e4 69165 68687 -0.69%
BenchmarkWordsEncode1e5 853485 853145 -0.04%
BenchmarkWordsEncode1e6 8373581 8301719 -0.86%
Benchmark_UFlat0 240950 138130 -42.67%
Benchmark_UFlat1 1977561 1529902 -22.64%
Benchmark_UFlat2 10097 9727 -3.66%
Benchmark_UFlat3 63131 42081 -33.34%
Benchmark_UFlat4 971104 562381 -42.09%
Benchmark_UFlat5 71907 51408 -28.51%
Benchmark_UFlat6 30071 17597 -41.48%
Benchmark_UFlat7 7561 5025 -33.54%
Benchmark_UFlat8 2709115 1969772 -27.29%
Benchmark_UFlat9 709485 634759 -10.53%
Benchmark_UFlat10 593795 544395 -8.32%
Benchmark_UFlat11 2015697 1688307 -16.24%
Benchmark_UFlat12 2469237 2226012 -9.85%
Benchmark_UFlat13 1118511 656056 -41.35%
Benchmark_UFlat14 126248 92398 -26.81%
Benchmark_UFlat15 9495 6588 -30.62%
Benchmark_UFlat16 241503 107066 -55.67%
Benchmark_UFlat17 664135 580412 -12.61%
Benchmark_ZFlat0 386053 399785 +3.56%
Benchmark_ZFlat1 5392099 5442586 +0.94%
Benchmark_ZFlat2 1218046 1220262 +0.18%
Benchmark_ZFlat3 816541 820941 +0.54%
Benchmark_ZFlat4 1496766 1563638 +4.47%
Benchmark_ZFlat5 165676 170218 +2.74%
Benchmark_ZFlat6 56619 57579 +1.70%
Benchmark_ZFlat7 20118 20483 +1.81%
Benchmark_ZFlat8 4250493 4394358 +3.38%
Benchmark_ZFlat9 1300489 1312684 +0.94%
Benchmark_ZFlat10 1135856 1144729 +0.78%
Benchmark_ZFlat11 3511762 3542176 +0.87%
Benchmark_ZFlat12 4529842 4539479 +0.21%
Benchmark_ZFlat13 1744524 1805723 +3.51%
Benchmark_ZFlat14 279633 283244 +1.29%
Benchmark_ZFlat15 24814 25150 +1.35%
Benchmark_ZFlat16 395791 409900 +3.56%
Benchmark_ZFlat17 1059920 1093119 +3.13%
benchmark old MB/s new MB/s speedup
BenchmarkWordsDecode1e3 398.93 490.69 1.23x
BenchmarkWordsDecode1e4 281.87 404.18 1.43x
BenchmarkWordsDecode1e5 218.32 252.86 1.16x
BenchmarkWordsDecode1e6 220.46 257.31 1.17x
BenchmarkWordsEncode1e3 96.17 98.45 1.02x
BenchmarkWordsEncode1e4 144.58 145.59 1.01x
BenchmarkWordsEncode1e5 117.17 117.21 1.00x
BenchmarkWordsEncode1e6 119.42 120.46 1.01x
Benchmark_UFlat0 424.98 741.33 1.74x
Benchmark_UFlat1 355.03 458.91 1.29x
Benchmark_UFlat2 12573.25 13051.74 1.04x
Benchmark_UFlat3 1494.18 2241.62 1.50x
Benchmark_UFlat4 421.79 728.33 1.73x
Benchmark_UFlat5 342.15 478.57 1.40x
Benchmark_UFlat6 370.78 633.61 1.71x
Benchmark_UFlat7 492.10 740.45 1.50x
Benchmark_UFlat8 380.10 522.77 1.38x
Benchmark_UFlat9 214.37 239.60 1.12x
Benchmark_UFlat10 210.81 229.94 1.09x
Benchmark_UFlat11 211.72 252.77 1.19x
Benchmark_UFlat12 195.15 216.47 1.11x
Benchmark_UFlat13 458.84 782.27 1.70x
Benchmark_UFlat14 302.89 413.86 1.37x
Benchmark_UFlat15 445.14 641.59 1.44x
Benchmark_UFlat16 491.04 1107.61 2.26x
Benchmark_UFlat17 277.53 317.57 1.14x
Benchmark_ZFlat0 265.25 256.14 0.97x
Benchmark_ZFlat1 130.21 129.00 0.99x
Benchmark_ZFlat2 104.23 104.04 1.00x
Benchmark_ZFlat3 115.52 114.90 0.99x
Benchmark_ZFlat4 273.66 261.95 0.96x
Benchmark_ZFlat5 148.50 144.54 0.97x
Benchmark_ZFlat6 196.93 193.64 0.98x
Benchmark_ZFlat7 184.95 181.66 0.98x
Benchmark_ZFlat8 242.26 234.33 0.97x
Benchmark_ZFlat9 116.95 115.86 0.99x
Benchmark_ZFlat10 110.21 109.35 0.99x
Benchmark_ZFlat11 121.52 120.48 0.99x
Benchmark_ZFlat12 106.37 106.15 1.00x
Benchmark_ZFlat13 294.19 284.22 0.97x
Benchmark_ZFlat14 136.75 135.01 0.99x
Benchmark_ZFlat15 170.34 168.07 0.99x
Benchmark_ZFlat16 299.62 289.31 0.97x
Benchmark_ZFlat17 173.90 168.62 0.97x
Patch Set 1 #Patch Set 2 : diff -r 12e4b4183793 https://code.google.com/p/snappy-go #Patch Set 3 : diff -r 12e4b4183793 https://code.google.com/p/snappy-go #
Total comments: 1
MessagesTotal messages: 8
|