Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(6031)

Issue 29770043: code review 29770043: decode: Faster decode by using copy when possible. (Closed)

Can't Edit
Can't Publish+Mail
Start Review
Created:
10 years, 5 months ago by jeff.allen
Modified:
10 years, 4 months ago
Reviewers:
r, nigeltao, 0xjnml, golang-dev
Visibility:
Public.

Description

decode: Faster decode by using copy when possible. Profiling shows that the loop to copy data on decode is a hot point (duh). Changing it to the copy() builtin would make it faster because the runtime can do the range checks for the whole copy once instead of twice per byte. However, builtin copy of overlapping data doesn't work right in this context, so we need to be careful to only use it in the non-overlapping case. Benchmark results: benchmark old ns/op new ns/op delta BenchmarkWordsDecode1e3 2506 2037 -18.72% BenchmarkWordsDecode1e4 35477 24741 -30.26% BenchmarkWordsDecode1e5 458040 395472 -13.66% BenchmarkWordsDecode1e6 4536036 3886377 -14.32% BenchmarkWordsEncode1e3 10398 10157 -2.32% BenchmarkWordsEncode1e4 69165 68687 -0.69% BenchmarkWordsEncode1e5 853485 853145 -0.04% BenchmarkWordsEncode1e6 8373581 8301719 -0.86% Benchmark_UFlat0 240950 138130 -42.67% Benchmark_UFlat1 1977561 1529902 -22.64% Benchmark_UFlat2 10097 9727 -3.66% Benchmark_UFlat3 63131 42081 -33.34% Benchmark_UFlat4 971104 562381 -42.09% Benchmark_UFlat5 71907 51408 -28.51% Benchmark_UFlat6 30071 17597 -41.48% Benchmark_UFlat7 7561 5025 -33.54% Benchmark_UFlat8 2709115 1969772 -27.29% Benchmark_UFlat9 709485 634759 -10.53% Benchmark_UFlat10 593795 544395 -8.32% Benchmark_UFlat11 2015697 1688307 -16.24% Benchmark_UFlat12 2469237 2226012 -9.85% Benchmark_UFlat13 1118511 656056 -41.35% Benchmark_UFlat14 126248 92398 -26.81% Benchmark_UFlat15 9495 6588 -30.62% Benchmark_UFlat16 241503 107066 -55.67% Benchmark_UFlat17 664135 580412 -12.61% Benchmark_ZFlat0 386053 399785 +3.56% Benchmark_ZFlat1 5392099 5442586 +0.94% Benchmark_ZFlat2 1218046 1220262 +0.18% Benchmark_ZFlat3 816541 820941 +0.54% Benchmark_ZFlat4 1496766 1563638 +4.47% Benchmark_ZFlat5 165676 170218 +2.74% Benchmark_ZFlat6 56619 57579 +1.70% Benchmark_ZFlat7 20118 20483 +1.81% Benchmark_ZFlat8 4250493 4394358 +3.38% Benchmark_ZFlat9 1300489 1312684 +0.94% Benchmark_ZFlat10 1135856 1144729 +0.78% Benchmark_ZFlat11 3511762 3542176 +0.87% Benchmark_ZFlat12 4529842 4539479 +0.21% Benchmark_ZFlat13 1744524 1805723 +3.51% Benchmark_ZFlat14 279633 283244 +1.29% Benchmark_ZFlat15 24814 25150 +1.35% Benchmark_ZFlat16 395791 409900 +3.56% Benchmark_ZFlat17 1059920 1093119 +3.13% benchmark old MB/s new MB/s speedup BenchmarkWordsDecode1e3 398.93 490.69 1.23x BenchmarkWordsDecode1e4 281.87 404.18 1.43x BenchmarkWordsDecode1e5 218.32 252.86 1.16x BenchmarkWordsDecode1e6 220.46 257.31 1.17x BenchmarkWordsEncode1e3 96.17 98.45 1.02x BenchmarkWordsEncode1e4 144.58 145.59 1.01x BenchmarkWordsEncode1e5 117.17 117.21 1.00x BenchmarkWordsEncode1e6 119.42 120.46 1.01x Benchmark_UFlat0 424.98 741.33 1.74x Benchmark_UFlat1 355.03 458.91 1.29x Benchmark_UFlat2 12573.25 13051.74 1.04x Benchmark_UFlat3 1494.18 2241.62 1.50x Benchmark_UFlat4 421.79 728.33 1.73x Benchmark_UFlat5 342.15 478.57 1.40x Benchmark_UFlat6 370.78 633.61 1.71x Benchmark_UFlat7 492.10 740.45 1.50x Benchmark_UFlat8 380.10 522.77 1.38x Benchmark_UFlat9 214.37 239.60 1.12x Benchmark_UFlat10 210.81 229.94 1.09x Benchmark_UFlat11 211.72 252.77 1.19x Benchmark_UFlat12 195.15 216.47 1.11x Benchmark_UFlat13 458.84 782.27 1.70x Benchmark_UFlat14 302.89 413.86 1.37x Benchmark_UFlat15 445.14 641.59 1.44x Benchmark_UFlat16 491.04 1107.61 2.26x Benchmark_UFlat17 277.53 317.57 1.14x Benchmark_ZFlat0 265.25 256.14 0.97x Benchmark_ZFlat1 130.21 129.00 0.99x Benchmark_ZFlat2 104.23 104.04 1.00x Benchmark_ZFlat3 115.52 114.90 0.99x Benchmark_ZFlat4 273.66 261.95 0.96x Benchmark_ZFlat5 148.50 144.54 0.97x Benchmark_ZFlat6 196.93 193.64 0.98x Benchmark_ZFlat7 184.95 181.66 0.98x Benchmark_ZFlat8 242.26 234.33 0.97x Benchmark_ZFlat9 116.95 115.86 0.99x Benchmark_ZFlat10 110.21 109.35 0.99x Benchmark_ZFlat11 121.52 120.48 0.99x Benchmark_ZFlat12 106.37 106.15 1.00x Benchmark_ZFlat13 294.19 284.22 0.97x Benchmark_ZFlat14 136.75 135.01 0.99x Benchmark_ZFlat15 170.34 168.07 0.99x Benchmark_ZFlat16 299.62 289.31 0.97x Benchmark_ZFlat17 173.90 168.62 0.97x

Patch Set 1 #

Patch Set 2 : diff -r 12e4b4183793 https://code.google.com/p/snappy-go #

Patch Set 3 : diff -r 12e4b4183793 https://code.google.com/p/snappy-go #

Total comments: 1
Unified diffs Side-by-side diffs Delta from patch set Stats (+7 lines, -3 lines) Patch
M snappy/decode.go View 1 1 chunk +7 lines, -3 lines 1 comment Download

Messages

Total messages: 8
jeff.allen
Hello golang-dev@googlegroups.com, I'd like you to review this change to https://code.google.com/p/snappy-go
10 years, 5 months ago (2013-11-20 17:10:56 UTC) #1
r
Please explain "builtin copy of overlapping data doesn't work in this context".
10 years, 5 months ago (2013-11-20 17:15:59 UTC) #2
0xjnml
The snappy copy "instruction" exploits the memcopy semantics, which Go copy() carefully and rightfully avoids, ...
10 years, 5 months ago (2013-11-20 17:22:00 UTC) #3
jeff.allen
Yes, what he said. Except he left out the barfing noises I made when I ...
10 years, 5 months ago (2013-11-20 17:28:44 UTC) #4
jeff.allen
Please take a look.
10 years, 4 months ago (2013-12-19 02:20:19 UTC) #5
nigeltao
Please change the CL description's opening sentence from: "decode: Faster decode etc." to "snappy: faster ...
10 years, 4 months ago (2013-12-19 05:50:22 UTC) #6
nigeltao
As for the code changes, I appreciate the data point, but I'm not sure if ...
10 years, 4 months ago (2013-12-19 06:02:48 UTC) #7
jeff.allen
10 years, 4 months ago (2013-12-20 08:44:43 UTC) #8
I think it's a shame to leave this much performance on the table, but I
understand your reasoning also.

I'm going to abandon this commit and maybe when you decide the right amount of
beauty/performance tradeoff, you can take this up again.
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b