Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(1323)

Issue 5515044: Improve performance of S32_{opaque|alpha}_D32_filter_DX on SSSE3 platforms. (Closed)

Can't Edit
Can't Publish+Mail
Start Review
Created:
12 years, 11 months ago by evannier
Modified:
12 years, 10 months ago
Reviewers:
epoger, TomH, reed1
CC:
skia-review_googlegroups.com
Base URL:
http://skia.googlecode.com/svn/trunk/
Visibility:
Public.

Description

Improve performance of S32_{opaque|alpha}_D32_filter_DX on SSSE3 platforms. The assembly code was adapted from code we received from Intel. The optimizations done are: - in 1/16 of the cases, the math that is done becomes much simpler, and we take advantage of this. It is of interest that this optimization is valid regardless of the platform, and could be applied on all versions of that code. Assuming a complete random distribution of samples, this optimization can therefore at most increase speed by 1/16 (6%), if the processing was completely removed. Actual benchmarks show an improvement of 4% of the actual function or 2.5% in the complete benchmark mentioned below. - math is making use of various new instructions that come with SSSE3 which pretty much allows to process twice as many pixels for each pass. - loop unrolling allows to make use of the fancier instructions above but also allows to perform fewer memory accesses. For more details, see the code where more comments detail the optimizations. Benchmarking was done on a variety of platforms: - Z600 64 bit - Z600 32 bit - GoogleTV Atom 32 bit. This was benchmarked using perf/skia bench, and real world scenarios. using skia bench: out/Release/bench -config 8888 -scale -forceFilter 1 -match bitmap -repeat 10 (Example output:) Before: running bench [640 480] bitmap_8888_update 8888: cmsecs = 42.40 running bench [640 480] bitmap_8888_update_volatile 8888: cmsecs = 42.41 running bench [640 480] bitmap_8888 8888: cmsecs = 42.45 running bench [640 480] bitmap_8888_A 8888: cmsecs = 46.86 After: running bench [640 480] bitmap_8888_update 8888: cmsecs = 31.25 running bench [640 480] bitmap_8888_update_volatile 8888: cmsecs = 31.26 running bench [640 480] bitmap_8888 8888: cmsecs = 31.21 running bench [640 480] bitmap_8888_A 8888: cmsecs = 35.48 So, in this bench, performance improvement (only the first of the four benchmarks mentioned was used, since the improvement is the same accross all 4 benchmarks) | before | after | improvement | Z600 64| 42.4 | 31.25 | 1.35 | Z600 32| 46.27 | 39.46 | 1.17 | Atom 32| 271.12 | 193.77| 1.4 | The actual speed up of the function is larger simply because the functions are only 60% of the benchmarks mentioned above. Benchmarks using that function alone show speed ups between 1.4 (z600 64 bit) to over 2x (Atom 32 bit). Credits: This code is the work of many people. Most of the praise should go to Intel's Tom C who wrote and optimized the code loop. Then lcwu and I share the reviewing, testing, clean up effort (templatization, factorization, etc). In real life browser scenarios, this function shows up on a variety of GFX intensive benchmarks. For example, http://ie.microsoft.com/testdrive/Performance/AsteroidBelt/Default.html the function. Before optimization represents 47% of workload, after 38%, so an improvement in the function of 1.44, which gives a real world improvement of roughly 16 % in frame rate.

Patch Set 1 #

Patch Set 2 : '' #

Total comments: 3

Patch Set 3 : major cleanup, factorization as requested #

Patch Set 4 : updated checkin comments #

Unified diffs Side-by-side diffs Delta from patch set Stats (+584 lines, -12 lines) Patch
M gyp/opts.gyp View 1 2 chunks +35 lines, -5 lines 0 comments Download
A src/opts/SkBitmapProcState_opts_SSSE3.h View 1 chunk +15 lines, -0 lines 0 comments Download
A src/opts/SkBitmapProcState_opts_SSSE3.cpp View 1 2 1 chunk +497 lines, -0 lines 0 comments Download
M src/opts/opts_check_SSE2.cpp View 4 chunks +37 lines, -7 lines 0 comments Download

Messages

Total messages: 12
TomH
Elliot, can you shed some light on the gyp organization question? Or is there somebody ...
12 years, 10 months ago (2012-02-07 21:21:59 UTC) #1
evannier
On 2012/02/07 21:21:59, TomH wrote: > Elliot, can you shed some light on the gyp ...
12 years, 10 months ago (2012-02-08 03:17:15 UTC) #2
evannier
http://codereview.appspot.com/5515044/diff/1005/src/opts/SkBitmapProcState_opts_SSSE3.cpp File src/opts/SkBitmapProcState_opts_SSSE3.cpp (right): http://codereview.appspot.com/5515044/diff/1005/src/opts/SkBitmapProcState_opts_SSSE3.cpp#newcode83 src/opts/SkBitmapProcState_opts_SSSE3.cpp:83: if (subY == 0) { On 2012/02/07 21:21:59, TomH ...
12 years, 10 months ago (2012-02-08 03:17:28 UTC) #3
epoger
On 2012/02/07 21:21:59, TomH wrote: > Elliot, can you shed some light on the gyp ...
12 years, 10 months ago (2012-02-08 14:48:07 UTC) #4
epoger
On 2012/02/08 14:48:07, epoger wrote: > On 2012/02/07 21:21:59, TomH wrote: > > Elliot, can ...
12 years, 10 months ago (2012-02-08 19:34:36 UTC) #5
TomH
On 2012/02/08 03:17:28, evannier wrote: > So, I am willing to get rid of it ...
12 years, 10 months ago (2012-02-08 19:49:53 UTC) #6
evannier
On 2012/02/08 19:34:36, epoger wrote: > On 2012/02/08 14:48:07, epoger wrote: > > On 2012/02/07 ...
12 years, 10 months ago (2012-02-09 03:12:25 UTC) #7
evannier
On 2012/02/08 19:49:53, TomH wrote: > On 2012/02/08 03:17:28, evannier wrote: > > So, I ...
12 years, 10 months ago (2012-02-09 03:34:53 UTC) #8
TomH
On 2012/02/09 03:34:53, evannier wrote: > Let me know if you think the 4% are ...
12 years, 10 months ago (2012-02-10 19:54:16 UTC) #9
evannier
On 2012/02/10 19:54:16, TomH wrote: > On 2012/02/09 03:34:53, evannier wrote: > > Let me ...
12 years, 10 months ago (2012-02-13 23:04:14 UTC) #10
TomH
Attempted to commit to Skia in r3193, but breaks both Windows and Mac compiles.
12 years, 10 months ago (2012-02-14 18:36:30 UTC) #11
TomH
12 years, 10 months ago (2012-02-14 19:58:35 UTC) #12
Fixes in r3194 and r3195 look like they should take; this CL is in and the
review can be closed.
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b