|
|
Created:
12 years, 4 months ago by DerekS Modified:
11 years, 7 months ago CC:
skia-review_googlegroups.com Base URL:
https://skia.googlecode.com/svn/trunk Visibility:
Public. |
DescriptionUpdate ARM and NEON optimizations for S32A_Opaque_BlitRow32.
These patches replace those written by ARM with ones provided by NVidia.
Committed: https://code.google.com/p/skia/source/detail?r=5364
Patch Set 1 #Patch Set 2 : rebasing #Patch Set 3 : rebasing again #Patch Set 4 : keep both versions but enable testing src alpha #Patch Set 5 : remove bench #
Total comments: 2
Patch Set 6 : fix #define and add incorporate nvidia bench #
MessagesTotal messages: 15
After spending some time trying to generate bench results to verify if this patch outperforms our existing code I still don't have a clear answer. The results from the standard benches are still to noisy (even with setting a repeat count 150) to make a clear determination. I'm seeing variance of up to 46% on separate runs of the same code which is making it impossible to do any meaningful analysis.
Sign in to reply to this message.
I would be really helpful in this case if Nvidia could help us get a better measure. 1. new tests that better isolate the change? 2. better environment for running the existing tests to reduce noise?
Sign in to reply to this message.
I saw this today and created a similar benchmark. Unfortunately, we've done some overlapping work: https://codereview.appspot.com/6488052/ Anyway I learned how to build trunk skia for Android and compiled it for Xoom. I ran it on Nexus 7. ../android/bin/android_make -d xoom SkiaAndroidApp BUILDTYPE=Release I ran benchmark: bench -repeat 50 -match bitmap -match repeatTile -config 8888 This is what bench_compare.py shows for TOT vs patch set 3: bitmap_8888_A_source_striped 8888 c 78.22 80.08 -1.86 -2.4% bitmap_8888_update_volatile 8888 c 16.31 16.50 -0.19 -1.2% bitmap_8888_update 8888 c 16.27 16.38 -0.11 -0.7% bitmap_8888_update_scale_filter 8888 c 275.07 276.78 -1.71 -0.6% bitmap_8888_update_volatile_scale_filter 8888 c 275.42 276.96 -1.54 -0.6% bitmap_8888_scale_filter 8888 c 275.59 276.96 -1.37 -0.5% bitmap_4444_A 8888 c 101.00 101.23 -0.23 -0.2% bitmap_8888_update_volatile_scale_rotate_filter 8888 c 352.64 352.93 -0.29 -0.1% bitmap_8888_scale_rotate_filter 8888 c 352.90 352.76 +0.14 +0.0% bitmap_4444 8888 c 44.44 44.41 +0.03 +0.1% repeatTile_565 8888 c 110.07 109.96 +0.11 +0.1% bitmap_8888 8888 c 16.41 16.38 +0.03 +0.2% bitmap_8888_update_scale_rotate_filter 8888 c 353.22 352.56 +0.66 +0.2% bitmap_index8 8888 c 62.05 61.93 +0.12 +0.2% bitmap_565 8888 c 115.69 115.33 +0.36 +0.3% bitmap_8888_A_scale_rotate_filter 8888 c 414.21 381.80 +32.41 +7.8% bitmap_8888_A_scale_filter 8888 c 344.10 304.98 +39.12 +11.4% repeatTile_4444 8888 c 159.85 110.31 +49.54 +31.0% repeatTile_8888 8888 c 138.20 88.39 +49.81 +36.0% bitmap_index8_A 8888 c 125.74 72.08 +53.66 +42.7% bitmap_8888_A 8888 c 78.33 41.19 +37.14 +47.4% bitmap_8888_A_source_opaque 8888 c 78.41 32.53 +45.88 +58.5% repeatTile_index8 8888 c 138.24 45.78 +92.46 +66.9% bitmap_8888_A_source_transparent 8888 c 78.37 19.68 +58.69 +74.9% Note how my new source_transparent and source_opaque show benefit, but source_striped doesn't regress - well that -2 % is within the noise. The source_striped represent a bad case for the state machine, where it needs to change state for every pixel.
Sign in to reply to this message.
I got my results ready for NEON version ran on Nexus 7. Compile ../android/bin/android_make -d nexus_7 SkiaAndroidApp BUILDTYPE=Release I ran benchmark: bench -repeat 50 -match bitmap -match repeatTile -config 8888 bitmap_8888_A_source_striped 8888 c 62.95 63.57 -0.62 -1.0% bitmap_8888_scale_rotate_filter 8888 c 268.13 269.34 -1.21 -0.5% bitmap_8888_update_scale_rotate_filter 8888 c 267.82 268.37 -0.55 -0.2% bitmap_565 8888 c 113.10 113.20 -0.10 -0.1% bitmap_8888_update_volatile_scale_rotate_filter 8888 c 267.97 268.07 -0.10 -0.0% repeatTile_565 8888 c 109.73 109.65 +0.08 +0.1% bitmap_index8 8888 c 59.91 59.85 +0.06 +0.1% bitmap_8888_update_scale_filter 8888 c 229.10 228.85 +0.25 +0.1% bitmap_4444 8888 c 44.42 44.37 +0.05 +0.1% bitmap_4444_A 8888 c 100.94 100.82 +0.12 +0.1% bitmap_8888_update_volatile_scale_filter 8888 c 229.01 228.66 +0.35 +0.2% bitmap_8888_scale_filter 8888 c 229.19 228.59 +0.60 +0.3% bitmap_8888_update 8888 c 16.26 16.08 +0.18 +1.1% bitmap_8888 8888 c 16.45 16.12 +0.33 +2.0% bitmap_8888_update_volatile 8888 c 16.58 16.07 +0.51 +3.1% bitmap_8888_A_scale_rotate_filter 8888 c 321.91 298.81 +23.10 +7.2% bitmap_8888_A_scale_filter 8888 c 288.51 257.57 +30.94 +10.7% repeatTile_4444 8888 c 142.75 102.73 +40.02 +28.0% repeatTile_8888 8888 c 119.18 78.72 +40.46 +33.9% bitmap_index8_A 8888 c 114.50 70.28 +44.22 +38.6% bitmap_8888_A 8888 c 63.00 36.98 +26.02 +41.3% bitmap_8888_A_source_opaque 8888 c 62.92 35.41 +27.51 +43.7% repeatTile_index8 8888 c 118.16 52.54 +65.62 +55.5% bitmap_8888_A_source_transparent 8888 c 63.28 21.23 +42.05 +66.5% @DerekS, do you see similar numbers?
Sign in to reply to this message.
On 2012/08/29 16:32:02, Antti wrote: > I got my results ready for NEON version ran on Nexus 7. > > Compile > ../android/bin/android_make -d nexus_7 SkiaAndroidApp BUILDTYPE=Release > > I ran benchmark: > bench -repeat 50 -match bitmap -match repeatTile -config 8888 > > bitmap_8888_A_source_striped 8888 c 62.95 63.57 -0.62 -1.0% > > bitmap_8888_scale_rotate_filter 8888 c 268.13 269.34 -1.21 > -0.5% > bitmap_8888_update_scale_rotate_filter 8888 c 267.82 268.37 > -0.55 -0.2% > bitmap_565 8888 c 113.10 113.20 -0.10 -0.1% > > bitmap_8888_update_volatile_scale_rotate_filter 8888 c 267.97 268.07 > -0.10 -0.0% > repeatTile_565 8888 c 109.73 109.65 +0.08 +0.1% > > bitmap_index8 8888 c 59.91 59.85 +0.06 +0.1% > > bitmap_8888_update_scale_filter 8888 c 229.10 228.85 +0.25 > +0.1% > bitmap_4444 8888 c 44.42 44.37 +0.05 +0.1% > > bitmap_4444_A 8888 c 100.94 100.82 +0.12 +0.1% > > bitmap_8888_update_volatile_scale_filter 8888 c 229.01 228.66 > +0.35 +0.2% > bitmap_8888_scale_filter 8888 c 229.19 228.59 +0.60 +0.3% > > bitmap_8888_update 8888 c 16.26 16.08 +0.18 +1.1% > > bitmap_8888 8888 c 16.45 16.12 +0.33 +2.0% > > bitmap_8888_update_volatile 8888 c 16.58 16.07 +0.51 +3.1% > > bitmap_8888_A_scale_rotate_filter 8888 c 321.91 298.81 +23.10 > +7.2% > bitmap_8888_A_scale_filter 8888 c 288.51 257.57 +30.94 +10.7% > > repeatTile_4444 8888 c 142.75 102.73 +40.02 +28.0% > > repeatTile_8888 8888 c 119.18 78.72 +40.46 +33.9% > > bitmap_index8_A 8888 c 114.50 70.28 +44.22 +38.6% > > bitmap_8888_A 8888 c 63.00 36.98 +26.02 +41.3% > > bitmap_8888_A_source_opaque 8888 c 62.92 35.41 +27.51 +43.7% > > repeatTile_index8 8888 c 118.16 52.54 +65.62 +55.5% > > bitmap_8888_A_source_transparent 8888 c 63.28 21.23 +42.05 > +66.5% > > @DerekS, do you see similar numbers? Ye, I am seeing similar numbers with your test and my patch. As I think your bench is in much better shape than mine I'll remove mine in the next patch set for this CL. In the meantime we can get your bench CL ready to submit and hopefully we can wrap all this up. :)
Sign in to reply to this message.
I've removed my bench in favor of https://codereview.appspot.com/6488052/ and unless anyone has any objections to using this CL over https://codereview.appspot.com/6430069/ I'll submit this after the bench goes in.
Sign in to reply to this message.
I think there's a bug. See the inline comment. I also refreshed results for patchset 3 with never version of the benchmark that has new pathological stipes_evil case. NEON takes a bigger hit in it than ARM, but I wouldn't still call -13% bad. NEON: -repeat 50 -match bitmap -match repeatTile -config 8888 bitmap_8888_A_source_stripes_evil 8888 c 74.28 84.19 -9.91 -13.3% bitmap_8888_A_source_stripes 8888 c 72.70 75.50 -2.80 -3.9% bitmap_4444 8888 c 52.44 52.57 -0.13 -0.2% bitmap_index8 8888 c 69.99 70.08 -0.09 -0.1% bitmap_8888_update_scale_filter 8888 c 275.28 275.60 -0.32 -0.1% bitmap_565 8888 c 135.75 135.90 -0.15 -0.1% bitmap_4444_A 8888 c 122.74 122.82 -0.08 -0.1% bitmap_8888_scale_rotate_filter 8888 c 321.54 321.60 -0.06 -0.0% bitmap_8888_update 8888 c 18.08 18.07 +0.01 +0.1% bitmap_8888_update_volatile_scale_rotate_filter 8888 c 321.48 321.30 +0.18 +0.1% bitmap_8888_update_scale_rotate_filter 8888 c 321.53 321.11 +0.42 +0.1% bitmap_8888_update_volatile_scale_filter 8888 c 276.10 275.58 +0.52 +0.2% repeatTile_565 8888 c 131.64 131.34 +0.30 +0.2% bitmap_8888_scale_filter 8888 c 276.45 275.70 +0.75 +0.3% bitmap_8888 8888 c 17.64 17.49 +0.15 +0.9% bitmap_8888_update_volatile 8888 c 18.23 17.93 +0.30 +1.6% bitmap_8888_A_scale_rotate_filter 8888 c 383.13 356.94 +26.19 +6.8% bitmap_8888_A_scale_filter 8888 c 342.79 308.15 +34.64 +10.1% repeatTile_4444 8888 c 162.91 118.85 +44.06 +27.0% repeatTile_8888 8888 c 132.86 89.89 +42.97 +32.3% bitmap_index8_A 8888 c 133.81 83.97 +49.84 +37.2% bitmap_8888_A 8888 c 72.46 43.68 +28.78 +39.7% bitmap_8888_A_source_opaque 8888 c 72.82 40.31 +32.51 +44.6% repeatTile_index8 8888 c 132.91 62.87 +70.04 +52.7% bitmap_8888_A_source_transparent 8888 c 73.11 27.01 +46.10 +63.1% ARM: -repeat 50 -match bitmap -match repeatTile -config 8888 bitmap_8888_A_source_stripes 8888 c 91.48 93.80 -2.32 -2.5% bitmap_8888_A_source_stripes_evil 8888 c 91.55 93.82 -2.27 -2.5% bitmap_8888 8888 c 17.56 17.85 -0.29 -1.7% bitmap_8888_update 8888 c 17.48 17.71 -0.23 -1.3% bitmap_8888_update_volatile_scale_filter 8888 c 331.50 335.74 -4.24 -1.3% bitmap_8888_update_scale_filter 8888 c 331.66 335.67 -4.01 -1.2% bitmap_8888_scale_filter 8888 c 331.57 335.57 -4.00 -1.2% bitmap_8888_update_volatile 8888 c 17.45 17.65 -0.20 -1.1% bitmap_8888_scale_rotate_filter 8888 c 421.94 422.04 -0.10 -0.0% bitmap_8888_update_scale_rotate_filter 8888 c 422.01 422.11 -0.10 -0.0% bitmap_8888_update_volatile_scale_rotate_filter 8888 c 421.97 422.04 -0.07 -0.0% repeatTile_565 8888 c 132.97 132.92 +0.05 +0.0% bitmap_565 8888 c 139.49 139.41 +0.08 +0.1% bitmap_4444_A 8888 c 122.57 122.43 +0.14 +0.1% bitmap_4444 8888 c 52.73 52.50 +0.23 +0.4% bitmap_index8 8888 c 73.45 73.07 +0.38 +0.5% bitmap_8888_A_scale_rotate_filter 8888 c 491.76 453.49 +38.27 +7.8% bitmap_8888_A_scale_filter 8888 c 408.40 364.21 +44.19 +10.8% repeatTile_4444 8888 c 185.54 128.34 +57.20 +30.8% repeatTile_8888 8888 c 159.16 100.35 +58.81 +37.0% bitmap_index8_A 8888 c 148.51 85.14 +63.37 +42.7% bitmap_8888_A 8888 c 91.86 47.76 +44.10 +48.0% bitmap_8888_A_source_opaque 8888 c 91.48 36.47 +55.01 +60.1% repeatTile_index8 8888 c 160.88 56.50 +104.38 +64.9% bitmap_8888_A_source_transparent 8888 c 91.47 25.11 +66.36 +72.5% https://codereview.appspot.com/6465075/diff/16001/src/opts/SkBlitRow_opts_arm... File src/opts/SkBlitRow_opts_arm_neon.cpp (right): https://codereview.appspot.com/6465075/diff/16001/src/opts/SkBlitRow_opts_arm... src/opts/SkBlitRow_opts_arm_neon.cpp:1281: #if 1 I think there is one extra line here that will add one element to the array causing Opaque_neon_src_alpha to be used as Blend function and we don't use the optimization at all.
Sign in to reply to this message.
The latest CL fixes the bug in my code, ensures that the alpha channel is in the expected position, and adds your test case. https://codereview.appspot.com/6465075/diff/16001/src/opts/SkBlitRow_opts_arm... File src/opts/SkBlitRow_opts_arm_neon.cpp (right): https://codereview.appspot.com/6465075/diff/16001/src/opts/SkBlitRow_opts_arm... src/opts/SkBlitRow_opts_arm_neon.cpp:1281: #if 1 On 2012/08/30 12:39:59, Antti wrote: > I think there is one extra line here that will add one element to the array > causing Opaque_neon_src_alpha to be used as Blend function and we don't use the > optimization at all. Thanks for catching that. I was a little sloppy in my copy and paste. :) I did notice that this code depends on an implicit byte ordering so for now I'll update the ifdef to only enable this function if the alpha value occupies bits 24-32. In practice this will cover all current Android devices, but will prevent us from crashing if we ever change up the ordering.
Sign in to reply to this message.
On 2012/08/30 14:55:26, DerekS wrote:
Sign in to reply to this message.
On 2012/08/30 17:25:11, shikhas wrote: > On 2012/08/30 14:55:26, DerekS wrote: Any final comments before I put this patch into the repo?
Sign in to reply to this message.
On 2012/08/30 18:25:30, DerekS wrote: > On 2012/08/30 17:25:11, shikhas wrote: > > On 2012/08/30 14:55:26, DerekS wrote: > > Any final comments before I put this patch into the repo? Am fine with this patch. Thanks!!
Sign in to reply to this message.
> I did notice that this code depends on an implicit byte ordering so for now I'll > update the ifdef to only enable this function if the alpha value occupies bits > 24-32. Yeah, you are right. The original change in our tree was under condition defined(SK_CPU_LENDIAN) for the same reason. Looks good to me.
Sign in to reply to this message.
On 2012/08/31 06:02:39, Antti wrote: > > I did notice that this code depends on an implicit byte ordering so for now > I'll > > update the ifdef to only enable this function if the alpha value occupies bits > > 24-32. > > Yeah, you are right. The original change in our tree was under condition > defined(SK_CPU_LENDIAN) for the same reason. > > Looks good to me. I notice that this change got reverted by https://code.google.com/p/skia/source/detail?spec=svn6205&r=5378 Do we know why it was reverted? Thanks!
Sign in to reply to this message.
Message was sent while issue was closed.
Ping. I just noticed we'd never connected the dots to why this was reverted (by Rob) or if it was followed up with any successful landings of these optimizations. Did we get another patch in with them? Are there other external optimizations that have slipped through the cracks that we could be bringing over?
Sign in to reply to this message.
Message was sent while issue was closed.
This change broke the Android build in the roll attempt to r5377 (https://codereview.chromium.org/10917067/). The bits have rotted but I believe it was due to Thumb instructions. I believe the neon part was re-applied in r8579 (https://codereview.chromium.org/13060004) but the other half is still MIA.
Sign in to reply to this message.
|