Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(1358)

Issue 5518045: Improve Blending BlitRow functions on x86 by around 10% by making better use of SSE2 (Closed)

Can't Edit
Can't Publish+Mail
Start Review
Created:
12 years, 11 months ago by evannier
Modified:
12 years, 9 months ago
Reviewers:
TomH, reed1
CC:
skia-review_googlegroups.com, Le-Chun Wu
Base URL:
http://skia.googlecode.com/svn/trunk/
Visibility:
Public.

Description

Improve Blending BlitRow functions on x86 by around 10% by making better use of SSE2 instructions. The optimizations done in this patch are: - Use __mm_mulhi_epi16 instead of __mm_mullo_epi16 for scale multiplication to avoid the need to do the division (or shift). - Take into account the Atom micro-architecture (execution ports) constraints and interleave the operations to make better (parallel) use of two execution ports. Using: bench -config 8888 -forceBlend 1 -match bitmap_8888 -repeat 100 (a benchmark with most cycles going in S32A_Blend_BlitRow32_SSE2 and S32_Blend_BlitRow32_SSE2), we get on a z600 64 bit: running bench [640 480] bitmap_8888_update 8888: cmsecs = 5.60 running bench [640 480] bitmap_8888_update_volatile 8888: cmsecs = 5.61 running bench [640 480] bitmap_8888 8888: cmsecs = 5.59 running bench [640 480] bitmap_8888_A 8888: cmsecs = 6.98 after: running bench [640 480] bitmap_8888_update 8888: cmsecs = 5.15 running bench [640 480] bitmap_8888_update_volatile 8888: cmsecs = 5.05 running bench [640 480] bitmap_8888 8888: cmsecs = 5.03 running bench [640 480] bitmap_8888_A 8888: cmsecs = 6.30 or between 8 and 11 % on a 64 bit Z600. on a 32 bit Atom, the results are between 4 and 10% faster. Credits: Tom C at Intel and lcwu

Patch Set 1 #

Patch Set 2 : Added a few more comments, and fixed some typos in the comments #

Unified diffs Side-by-side diffs Delta from patch set Stats (+60 lines, -27 lines) Patch
M src/opts/SkBlitRow_opts_SSE2.cpp View 1 6 chunks +60 lines, -27 lines 0 comments Download

Messages

Total messages: 5
TomH
Code changes LGTM, and I can see the expected performance improvement on a Z600. Higher-level ...
12 years, 10 months ago (2012-02-06 20:30:25 UTC) #1
evannier
On 2012/02/06 20:30:25, TomH wrote: > Code changes LGTM, and I can see the expected ...
12 years, 10 months ago (2012-02-06 22:09:31 UTC) #2
TomH
On 2012/02/06 22:09:31, evannier wrote: > Let me know if there is additional tests, additional ...
12 years, 10 months ago (2012-02-07 14:29:59 UTC) #3
evannier
On 2012/02/07 14:29:59, TomH wrote: > On 2012/02/06 22:09:31, evannier wrote: > > Let me ...
12 years, 10 months ago (2012-02-10 21:36:53 UTC) #4
TomH
12 years, 9 months ago (2012-02-28 16:17:19 UTC) #5
This one almost slipped through the cracks - committed as r3273.
Thanks again!
Please close the issue when you have a moment.
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b