Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(398)

Issue 5685055: Implemented SSE version of ClampX_ClampY_{no}filter_scale (Closed)

Can't Edit
Can't Publish+Mail
Start Review
Created:
12 years, 6 months ago by Jin A.Yang
Modified:
12 years, 6 months ago
Reviewers:
reed1, TomH
Base URL:
http://skia.googlecode.com/svn/trunk/
Visibility:
Public.

Description

Implemented a SSE2 version of ClampX_ClampY_{no}filter_scale. Added two bitmap benchmarks to cover them in the bench. The original version: out\Release\bench -config 8888 -match bitmap_clamp -repeat 200 skia bench: alpha=0xFF antialias=1 filter=0 rotate=0 scale=0 clip=0 dither=defau lt strokeWidth=none scalar=float system=WIN32 running bench [640 480] bitmap_clamp_scale_filtering_8888_update_volatile 8888: cmsecs = 48.28 running bench [640 480] bitmap_clamp_scale_8888_update_volatile 8888: cmsecs = 12.56 The SSE2 version: out\Release\bench -config 8888 -match bitmap_clamp -repeat 200 skia bench: alpha=0xFF antialias=1 filter=0 rotate=0 scale=0 clip=0 dither=defau lt strokeWidth=none scalar=float system=WIN32 running bench [640 480] bitmap_clamp_scale_filtering_8888_update_volatile 8888: cmsecs = 39.94 running bench [640 480] bitmap_clamp_scale_8888_update_volatile 8888: cmsecs = 10.22 We can observe 17.27% gain with filter drawBitmap, and 18.63% gain with no filter drawBitmap. With this patch, my HTML5 benchmark Myalbum Zoom can get 5% performance gain. Run tests: tests gm bench

Patch Set 1 #

Patch Set 2 : patch2 #

Total comments: 6

Patch Set 3 : revised patch #

Total comments: 1
Unified diffs Side-by-side diffs Delta from patch set Stats (+288 lines, -7 lines) Patch
src/core/SkBitmapProcState.h View 1 chunk +4 lines, -0 lines 0 comments Download
src/opts/SkBitmapProcState_opts_SSE2.h View 1 chunk +4 lines, -0 lines 0 comments Download
src/opts/SkBitmapProcState_opts_SSE2.cpp View 1 2 1 chunk +265 lines, -0 lines 1 comment Download
src/opts/opts_check_SSE2.cpp View 1 chunk +15 lines, -7 lines 0 comments Download

Messages

Total messages: 12
Jin A.Yang
12 years, 6 months ago (2012-02-20 13:28:03 UTC) #1
Jin A.Yang
upload patch set 2
12 years, 6 months ago (2012-02-21 05:04:46 UTC) #2
Jin A.Yang
patch2
12 years, 6 months ago (2012-02-21 05:05:27 UTC) #3
Jin A.Yang
patch2
12 years, 6 months ago (2012-02-21 05:18:43 UTC) #4
Jin A.Yang
Hi Tom, Please help review this SSE patch, thanks! Regards, Jin
12 years, 6 months ago (2012-02-21 09:41:27 UTC) #5
TomH
Jin, I'll get into a detailed review later. Can you tell us / document how ...
12 years, 6 months ago (2012-02-21 12:26:57 UTC) #6
Jin A.Yang
On 2012/02/21 12:26:57, TomH wrote: > Jin, > > I'll get into a detailed review ...
12 years, 6 months ago (2012-02-21 13:16:05 UTC) #7
TomH
For the most part LGTM. Lines 243 and 255 are more-or-less nits; let's settle those ...
12 years, 6 months ago (2012-02-21 18:08:45 UTC) #8
Jin A.Yang
http://codereview.appspot.com/5685055/diff/1003/src/opts/SkBitmapProcState_opts_SSE2.cpp File src/opts/SkBitmapProcState_opts_SSE2.cpp (right): http://codereview.appspot.com/5685055/diff/1003/src/opts/SkBitmapProcState_opts_SSE2.cpp#newcode243 src/opts/SkBitmapProcState_opts_SSE2.cpp:243: #define CHECK_FOR_DECAL On 2012/02/21 18:08:45, TomH wrote: > Why ...
12 years, 6 months ago (2012-02-22 02:51:52 UTC) #9
Jin A.Yang
revised patch
12 years, 6 months ago (2012-02-22 02:52:37 UTC) #10
TomH
I went ahead and committed a hybrid between your patch 2 and patch 3 as ...
12 years, 6 months ago (2012-02-22 18:53:33 UTC) #11
Jin A.Yang
12 years, 6 months ago (2012-02-23 01:29:30 UTC) #12
On 2012/02/22 18:53:33, TomH wrote:
> I went ahead and committed a hybrid between your patch 2 and patch 3 as r3227;
> see below for where I want to tweak the style. We can address that in a
followup
> patch if it's important.
> 
> Thanks again, Jin!
> 
> Please close this code review when you get a chance.
> 
>
http://codereview.appspot.com/5685055/diff/4003/src/opts/SkBitmapProcState_op...
> File src/opts/SkBitmapProcState_opts_SSE2.cpp (right):
> 
>
http://codereview.appspot.com/5685055/diff/4003/src/opts/SkBitmapProcState_op...
> src/opts/SkBitmapProcState_opts_SSE2.cpp:281: SkFixed dx4 = dx2 + dx2;
> Could you find a case where this makes a discernable impact on performance? We
> don't (the compiler seems to generate the same assembly code either way?), and
> it seems like a break-even on legibility. It's a pattern that we're wary of
> because 32b builds often suffer from heavy register pressure.
> 
> If you've got a particular reason for doing this, let's not duplicate the
> definition of dx[234] in all the branches of the if(), but move it up around
> line 255-265, and likewise in the nofilter function.

I supposed this can save some multiply instructions. But as you mentioned, this
also give heavy register pressure on 32b machine. From my test, it seems I can't
observe the obvious performance impact. 
Thanks for the review!
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b