|
|
Created:
14 years, 6 months ago by Vassilis Modified:
14 years, 5 months ago Base URL:
http://skia.googlecode.com/svn/trunk/ Visibility:
Public. |
DescriptionThe S32A_Opaque_BlitRow32 function was written and unrolled in ARM assembly to improve the rendering performance on ARM cores lacking NEON.
Added in SkBlitRow_opts_arm.cpp
Performance improvement about ~9% on Peacekeeper rendering benchmarks.
Patch Set 1 #
Total comments: 3
Patch Set 2 : Errors in the previous diff - re-submiting #
Total comments: 11
Patch Set 3 : Patch updated with corrections and minor modifications #MessagesTotal messages: 19
Sign in to reply to this message.
http://codereview.appspot.com/1873044/diff/1/2 File src/opts/SkBlitRow_opts_arm.cpp (right): http://codereview.appspot.com/1873044/diff/1/2#newcode559 src/opts/SkBlitRow_opts_arm.cpp:559: #elif defined (__ARM_ARCH__) Is this necessary ? I believe this file is only compiled on arm anyway, so making this section the #else clause seems like the right thing to do. http://codereview.appspot.com/1873044/diff/1/2#newcode569 src/opts/SkBlitRow_opts_arm.cpp:569: "\tCMP %[count], #0 \n" /* comparing count with 0 */ I'll let Stephen comment more deeply on the style used by skia, but it seems to me that moving the \t's to the end of the lines and use lower case mnemonics like other assembly sections would make it more readable.
Sign in to reply to this message.
http://codereview.appspot.com/1873044/diff/1/2 File src/opts/SkBlitRow_opts_arm.cpp (right): http://codereview.appspot.com/1873044/diff/1/2#newcode569 src/opts/SkBlitRow_opts_arm.cpp:569: "\tCMP %[count], #0 \n" /* comparing count with 0 */ Would be nice if this code was formatted the same as the neon code: lower case instead of upper for opcodes (if that's valid), no leading tab, indented the same, etc. Actually I see that the neon code puts leading tabs on the previous line. Weird, but let's make it consistent either way.
Sign in to reply to this message.
On 2010/07/29 19:49:29, Antoine Labour wrote: > http://codereview.appspot.com/1873044/diff/1/2 > File src/opts/SkBlitRow_opts_arm.cpp (right): > > http://codereview.appspot.com/1873044/diff/1/2#newcode559 > src/opts/SkBlitRow_opts_arm.cpp:559: #elif defined (__ARM_ARCH__) > Is this necessary ? I believe this file is only compiled on arm anyway, so > making this section the #else clause seems like the right thing to do. > You're right Antoine, I forgot about this cause I had initially placed the patch in another file where this was required. :) > http://codereview.appspot.com/1873044/diff/1/2#newcode569 > src/opts/SkBlitRow_opts_arm.cpp:569: "\tCMP %[count], #0 \n" /* > comparing count with 0 */ > I'll let Stephen comment more deeply on the style used by skia, but it seems to > me that moving the \t's to the end of the lines and use lower case mnemonics > like other assembly sections would make it more readable. > Sure :)
Sign in to reply to this message.
On 2010/07/29 19:51:17, Stephen White wrote: > http://codereview.appspot.com/1873044/diff/1/2 > File src/opts/SkBlitRow_opts_arm.cpp (right): > > http://codereview.appspot.com/1873044/diff/1/2#newcode569 > src/opts/SkBlitRow_opts_arm.cpp:569: "\tCMP %[count], #0 \n" /* > comparing count with 0 */ > Would be nice if this code was formatted the same as the neon code: lower case > instead of upper for opcodes (if that's valid), no leading tab, indented the > same, etc. > > Actually I see that the neon code puts leading tabs on the previous line. > Weird, but let's make it consistent either way. > The only reason that the \t's are in front of the line is the first line in the objdump output: it is misses the tab :) But you're right, will put them in the end to make the inline ASM more readable. Sorry about the uppercase and tabbing inconsistencies. Will correct them now :)
Sign in to reply to this message.
Previous patch updated according to suggestions. Retested.
Sign in to reply to this message.
Previous patch updated according to suggestions. Retested.
Sign in to reply to this message.
Suggested changes applied. Submitted the updated version
Sign in to reply to this message.
I went over the code, and it looks correct. Just a couple of nits. I don't have commit access to skia, Stephen, do you mind submitting this when it's ready ? http://codereview.appspot.com/1873044/diff/14001/15001 File src/opts/SkBlitRow_opts_arm.cpp (right): http://codereview.appspot.com/1873044/diff/14001/15001#newcode567 src/opts/SkBlitRow_opts_arm.cpp:567: /* Does not support the TEST_SRC_ALPHA case */ Can you make it a #error inside a #ifdef ? http://codereview.appspot.com/1873044/diff/14001/15001#newcode587 src/opts/SkBlitRow_opts_arm.cpp:587: "rsb r4, r4, #256 \n\t" /* subtracting the alpha from 256 -> r4=scale */ quick question here: in the 2 instances of the similar code below, this instruction happens above the load of r10. I don't know anything about scheduling on ARM, and maybe it doesn't matter, but could it be more efficient to move it right above ? http://codereview.appspot.com/1873044/diff/14001/15001#newcode587 src/opts/SkBlitRow_opts_arm.cpp:587: "rsb r4, r4, #256 \n\t" /* subtracting the alpha from 256 -> r4=scale */ also, it looks like you're doing the same shortcut as the NEON code above (a+1 vs a+a>>7), which is reasonable but worth a comment. http://codereview.appspot.com/1873044/diff/14001/15001#newcode629 src/opts/SkBlitRow_opts_arm.cpp:629: "and r9, ip, r7 \n\t" /* r5 = br masked by ip */ r5->r9 in comment http://codereview.appspot.com/1873044/diff/14001/15001#newcode632 src/opts/SkBlitRow_opts_arm.cpp:632: "and r10, ip, r7, lsr #8 \n\t" /* r6 = ag masked by ip */ r6->r10 in comment
Sign in to reply to this message.
Patch updated with corrections and minor modifications
Sign in to reply to this message.
Patch updated with corrections and minor modifications
Sign in to reply to this message.
Patch updated with corrections and minor modifications
Sign in to reply to this message.
Amended the patch according to suggestions, and made minor scheduling changes. http://codereview.appspot.com/1873044/diff/14001/15001 File src/opts/SkBlitRow_opts_arm.cpp (right): http://codereview.appspot.com/1873044/diff/14001/15001#newcode567 src/opts/SkBlitRow_opts_arm.cpp:567: /* Does not support the TEST_SRC_ALPHA case */ On 2010/08/02 20:35:02, Antoine Labour wrote: > Can you make it a #error inside a #ifdef ? Sure! http://codereview.appspot.com/1873044/diff/14001/15001#newcode567 src/opts/SkBlitRow_opts_arm.cpp:567: /* Does not support the TEST_SRC_ALPHA case */ On 2010/08/02 20:35:02, Antoine Labour wrote: > Can you make it a #error inside a #ifdef ? Done. http://codereview.appspot.com/1873044/diff/14001/15001#newcode587 src/opts/SkBlitRow_opts_arm.cpp:587: "rsb r4, r4, #256 \n\t" /* subtracting the alpha from 256 -> r4=scale */ On 2010/08/02 20:35:02, Antoine Labour wrote: > also, it looks like you're doing the same shortcut as the NEON code above (a+1 > vs a+a>>7), which is reasonable but worth a comment. We're actually doing what I saw that that C code is doing. I could add a comment for this if you want me to. http://codereview.appspot.com/1873044/diff/14001/15001#newcode587 src/opts/SkBlitRow_opts_arm.cpp:587: "rsb r4, r4, #256 \n\t" /* subtracting the alpha from 256 -> r4=scale */ On 2010/08/02 20:35:02, Antoine Labour wrote: > quick question here: in the 2 instances of the similar code below, this > instruction happens above the load of r10. I don't know anything about > scheduling on ARM, and maybe it doesn't matter, but could it be more efficient > to move it right above ? We did try different scheduling approaches to find the optimal one. The newest version of the patch has some minor scheduling changes (your suggestion is included), that seem to give a bit more fluid results. http://codereview.appspot.com/1873044/diff/14001/15001#newcode629 src/opts/SkBlitRow_opts_arm.cpp:629: "and r9, ip, r7 \n\t" /* r5 = br masked by ip */ On 2010/08/02 20:35:02, Antoine Labour wrote: > r5->r9 in comment Done. http://codereview.appspot.com/1873044/diff/14001/15001#newcode632 src/opts/SkBlitRow_opts_arm.cpp:632: "and r10, ip, r7, lsr #8 \n\t" /* r6 = ag masked by ip */ On 2010/08/02 20:35:02, Antoine Labour wrote: > r6->r10 in comment Done.
Sign in to reply to this message.
LGTM Stephen or Adam, do you mind checking this in ? Thanks.
Sign in to reply to this message.
On Tue, Aug 17, 2010 at 12:19 PM, <piman@google.com> wrote: > LGTM > Stephen or Adam, do you mind checking this in ? > Thanks. > > http://codereview.appspot.com/1873044/ > Skia r595
Sign in to reply to this message.
On Tue, Aug 17, 2010 at 9:24 AM, Adam Langley <agl@chromium.org> wrote: > On Tue, Aug 17, 2010 at 12:19 PM, <piman@google.com> wrote: > > LGTM > > Stephen or Adam, do you mind checking this in ? > > Thanks. > > > > http://codereview.appspot.com/1873044/ > > > > Skia r595 > Thanks ! Antoine
Sign in to reply to this message.
On 2010/08/17 16:25:50, Antoine Labour wrote: > On Tue, Aug 17, 2010 at 9:24 AM, Adam Langley <mailto:agl@chromium.org> wrote: > > > On Tue, Aug 17, 2010 at 12:19 PM, <mailto:piman@google.com> wrote: > > > LGTM > > > Stephen or Adam, do you mind checking this in ? > > > Thanks. > > > > > > http://codereview.appspot.com/1873044/ > > > > > > > Skia r595 > > > > Thanks ! > Antoine > Thanks a lot guys! I guess we can close this issue now.
Sign in to reply to this message.
|