Issue 5518045: Improve Blending BlitRow functions on x86 by around 10% by making better use of SSE2

Issue 5518045: Improve Blending BlitRow functions on x86 by around 10% by making better use of SSE2 (Closed)

Can't Edit
Can't Publish+Mail
Start Review

Created:
13 years, 7 months ago by evannier

Modified:
13 years, 5 months ago

Reviewers:
TomH, reed1

CC:
skia-review_googlegroups.com, Le-Chun Wu

Base URL:
http://skia.googlecode.com/svn/trunk/

Visibility:
Public.

Description

Improve Blending BlitRow functions on x86 by around 10% by making better use of SSE2 instructions. The optimizations done in this patch are: - Use __mm_mulhi_epi16 instead of __mm_mullo_epi16 for scale multiplication to avoid the need to do the division (or shift). - Take into account the Atom micro-architecture (execution ports) constraints and interleave the operations to make better (parallel) use of two execution ports. Using: bench -config 8888 -forceBlend 1 -match bitmap_8888 -repeat 100 (a benchmark with most cycles going in S32A_Blend_BlitRow32_SSE2 and S32_Blend_BlitRow32_SSE2), we get on a z600 64 bit: running bench [640 480] bitmap_8888_update 8888: cmsecs = 5.60 running bench [640 480] bitmap_8888_update_volatile 8888: cmsecs = 5.61 running bench [640 480] bitmap_8888 8888: cmsecs = 5.59 running bench [640 480] bitmap_8888_A 8888: cmsecs = 6.98 after: running bench [640 480] bitmap_8888_update 8888: cmsecs = 5.15 running bench [640 480] bitmap_8888_update_volatile 8888: cmsecs = 5.05 running bench [640 480] bitmap_8888 8888: cmsecs = 5.03 running bench [640 480] bitmap_8888_A 8888: cmsecs = 6.30 or between 8 and 11 % on a 64 bit Z600. on a 32 bit Atom, the results are between 4 and 10% faster. Credits: Tom C at Intel and lcwu

Patch Set 1 #

Patch Set 2 : Added a few more comments, and fixed some typos in the comments #

Created: 13 years, 6 months ago

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+60 lines, -27 lines)			Patch
	M	src/opts/SkBlitRow_opts_SSE2.cpp	View	1	6 chunks	+60 lines, -27 lines	0 comments	Download

Messages

Total messages: 5

Expand All Messages | Collapse All Messages

TomH

Code changes LGTM, and I can see the expected performance improvement on a Z600. Higher-level ...

13 years, 6 months ago (2012-02-06 20:30:25 UTC) #1

evannier

On 2012/02/06 20:30:25, TomH wrote: > Code changes LGTM, and I can see the expected ...

13 years, 6 months ago (2012-02-06 22:09:31 UTC) #2

TomH

On 2012/02/06 22:09:31, evannier wrote: > Let me know if there is additional tests, additional ...

13 years, 6 months ago (2012-02-07 14:29:59 UTC) #3

evannier

On 2012/02/07 14:29:59, TomH wrote: > On 2012/02/06 22:09:31, evannier wrote: > > Let me ...

13 years, 6 months ago (2012-02-10 21:36:53 UTC) #4

TomH

13 years, 5 months ago (2012-02-28 16:17:19 UTC) #5

This one almost slipped through the cracks - committed as r3273.
Thanks again!
Please close the issue when you have a moment.

Expand All Messages | Collapse All Messages