|
|
Created:
11 years, 11 months ago by Humper Modified:
11 years, 10 months ago CC:
skia-review_googlegroups.com Base URL:
https://skia.googlecode.com/svn/trunk Visibility:
Public. |
DescriptionFix a few bugs in both fast and slow blur; implementations now match visually. Also provide a way to interpret the incoming radius as sigma/2
Note -- may need to guard things to do the multi-stage webkit rebaseline goodness
BUG=
Committed: https://code.google.com/p/skia/source/detail?r=7793
Patch Set 1 #Patch Set 2 : re-enable old GM tests #
Total comments: 41
Patch Set 3 : Style cleanups and restore old high->low quality trigger behavior #Patch Set 4 : fix syntax error #Patch Set 5 : style cleanups, and add support for rectangles that are thinner than the support of the blur kernel #
Total comments: 11
Patch Set 6 : style nitpicks and function rename. #Patch Set 7 : Add support for blur styles, redo gm to exercise it. I cut out some of the blur radii from the gm … #
Total comments: 1
Patch Set 8 : Merge 8 bit rollback from stephen #Patch Set 9 : fix issues with merge #
Total comments: 8
Patch Set 10 : Pare down the # of tests / GMs #
MessagesTotal messages: 35
https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (left): https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#o... src/effects/SkBlurMask.cpp:32: static int boxBlur(const uint8_t* src, int src_y_stride, uint8_t* dst, box blur functions changed to use 16 bit fixed point and read/write to temporary copies of the data for added precision. https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#o... src/effects/SkBlurMask.cpp:916: SkAutoTMalloc<uint8_t> tmpBuffer(dstSize); All blurs done to temp. copies to use 16 bits of fixed point. https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (right): https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:54: *dptr = (sum * scale + (1 << 15)) >> 16; \ effectively add 1/2; this prevents bias and makes sure that we blur an array of 255's to an array of 255's, not 249's. https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:255: for (; x < new_width - border - 16; x += 16) { Make sure that we touch exactly the right pixels; before this was causing some weird issues in the output size of the blurred mask where for certain border values it would be off by one. Very noticable when I animated the blur radius from 1 to 20; artifacts now gone. https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:1115: float scaled_x = ( center - x - .5 ) * invr; If I had a numerical lifetime arch nemesis, it would be .5 https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:1202: unsigned int profile_y = profile_lookup( profile, y, dst_height, h ); refactored https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:1216: bool SkBlurMask::BlurSimple(SkMask* dst, const SkMask& src, SkScalar provided_radius, Style style, SkIPoint* margin) { Implementation of separable gaussian convolution for ground-truth comparison only.
Sign in to reply to this message.
https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (left): https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#o... src/effects/SkBlurMask.cpp:32: static int boxBlur(const uint8_t* src, int src_y_stride, uint8_t* dst, On 2013/02/08 19:02:32, Humper wrote: > box blur functions changed to use 16 bit fixed point and read/write to temporary > copies of the data for added precision. Have you benchmarked this? How does it affect performance? https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (right): https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:54: *dptr = (sum * scale + (1 << 15)) >> 16; \ On 2013/02/08 19:02:32, Humper wrote: > effectively add 1/2; this prevents bias and makes sure that we blur an array of > 255's to an array of 255's, not 249's. This looks good, but could you benchmark all the rounding alone, separately from the 16bit change? https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:255: for (; x < new_width - border - 16; x += 16) { On 2013/02/08 19:02:32, Humper wrote: > Make sure that we touch exactly the right pixels; before this was causing some > weird issues in the output size of the blurred mask where for certain border > values it would be off by one. Very noticable when I animated the blur radius > from 1 to 20; artifacts now gone. Could you give me an example of the input parameters where this would occur (not saying you're wrong, I just want to understand). https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:873: SkScalar passRadius = (kHigh_Quality == quality) ? SkScalarMul( radius, kBlurRadiusFudgeFactor): radius; This is definitely going to change behaviour: before we were only forcing low quality when the input radius was < 3; now we're only going to do it when the *adjusted* radius is < 3. (I suspect this might be where the animation artifacts actually went away, although I could be wrong.)
Sign in to reply to this message.
nits https://codereview.appspot.com/7307076/diff/2001/bench/BlurRectBench.cpp File bench/BlurRectBench.cpp (right): https://codereview.appspot.com/7307076/diff/2001/bench/BlurRectBench.cpp#newc... bench/BlurRectBench.cpp:60: should be loopCount. Why not just set loopCount to 1000 and then only change it when > 25? https://codereview.appspot.com/7307076/diff/2001/bench/BlurRectBench.cpp#newc... bench/BlurRectBench.cpp:129: }; should have: private: typedef BlurRectBench INHERITED; https://codereview.appspot.com/7307076/diff/2001/bench/BlurRectBench.cpp#newc... bench/BlurRectBench.cpp:137: } else { SkScalarRound is deprecated - should have a toInt at the end. https://codereview.appspot.com/7307076/diff/2001/bench/BlurRectBench.cpp#newc... bench/BlurRectBench.cpp:152: } should have: private: typedef BlurRectSeparableBench INHERITED; https://codereview.appspot.com/7307076/diff/2001/bench/BlurRectBench.cpp#newc... bench/BlurRectBench.cpp:171: mask.fImage = NULL; this->radius() - elsewhere too https://codereview.appspot.com/7307076/diff/2001/bench/BlurRectBench.cpp#newc... bench/BlurRectBench.cpp:175: } here too https://codereview.appspot.com/7307076/diff/2001/bench/BlurRectBench.cpp#newc... bench/BlurRectBench.cpp:177: What's up with these guys? https://codereview.appspot.com/7307076/diff/2001/gm/blurrect.cpp File gm/blurrect.cpp (right): https://codereview.appspot.com/7307076/diff/2001/gm/blurrect.cpp#newcode133 gm/blurrect.cpp:133: Is this lined up correctly? Should be private. https://codereview.appspot.com/7307076/diff/2001/gm/blurrect.cpp#newcode242 gm/blurrect.cpp:242: public: rectWidth, rectHeight, blurRadius https://codereview.appspot.com/7307076/diff/2001/gm/blurrect.cpp#newcode248 gm/blurrect.cpp:248: protected: override - const? https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (right): https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:54: *dptr = (sum * scale + (1 << 15)) >> 16; \ A comment might be in order https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:872: int passCount = (kHigh_Quality == quality) ? 3 : 1; overlength https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:881: overlength https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:882: // multiply the given radius by sqrt(2)/2 to convert from (2x) standard deviation to needed box width radiusMultiplier https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:884: SkScalar box_width = SkScalarMul(passRadius, radius_multiplier); SK_ScalarHalf https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:966: ++i https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:1215: A explanatory comment?
Sign in to reply to this message.
If it helps, I've attached what the code looks like with the unrolling taken out, and the two cases (width < diameter, width >= diameter) broken into separate clauses. Hopefully that makes it a bit clearer. I think you can see from this version that both cases fill exactly width + diameter = width + leftRadius + rightRadius pixels, which I believe is correct. static int boxBlur(const uint8_t* src, int src_y_stride, uint8_t* dst, int leftRadius, int rightRadius, int width, int height, bool transpose) { int diameter = leftRadius + rightRadius; int kernelSize = diameter + 1; uint32_t scale = (1 << 24) / kernelSize; int new_width = width + SkMax32(leftRadius, rightRadius) * 2; int dst_x_stride = transpose ? height : 1; int dst_y_stride = transpose ? 1 : new_width; if (width < diameter) { for (int y = 0; y < height; ++y) { int sum = 0; uint8_t* dptr = dst + y * dst_y_stride; const uint8_t* right = src + y * src_y_stride; const uint8_t* left = right; int x = 0; for (; x < width; ++x) { sum += *right++; \ *dptr = (sum * scale) >> 24; \ dptr += dst_x_stride; } for (; x < diameter; ++x) { *dptr = (sum * scale) >> 24; \ dptr += dst_x_stride; } x = 0; for (; x < width; ++x) { *dptr = (sum * scale) >> 24; \ sum -= *left++; \ dptr += dst_x_stride; } SkASSERT(sum == 0); } } else { for (int y = 0; y < height; ++y) { int sum = 0; uint8_t* dptr = dst + y * dst_y_stride; const uint8_t* right = src + y * src_y_stride; const uint8_t* left = right; int x = 0; for (; x < diameter; ++x) { sum += *right++; \ *dptr = (sum * scale) >> 24; \ dptr += dst_x_stride; } for (; x < width; ++x) { sum += *right++; \ *dptr = (sum * scale) >> 24; \ sum -= *left++; \ dptr += dst_x_stride; } x = 0; for (; x < diameter; ++x) { *dptr = (sum * scale) >> 24; \ } SkASSERT(sum == 0); } } return new_width; }
Sign in to reply to this message.
https://codereview.appspot.com/7307076/diff/2001/bench/BlurRectBench.cpp File bench/BlurRectBench.cpp (right): https://codereview.appspot.com/7307076/diff/2001/bench/BlurRectBench.cpp#newc... bench/BlurRectBench.cpp:60: On 2013/02/08 19:33:25, robertphillips wrote: > should be loopCount. Why not just set loopCount to 1000 and then only change it > when > 25? Very recent change definitely a bug. I'll clean it up. https://codereview.appspot.com/7307076/diff/2001/bench/BlurRectBench.cpp#newc... bench/BlurRectBench.cpp:177: They got forgotten about :( Will re-enable. On 2013/02/08 19:33:25, robertphillips wrote: > What's up with these guys? https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (left): https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#o... src/effects/SkBlurMask.cpp:32: static int boxBlur(const uint8_t* src, int src_y_stride, uint8_t* dst, Not separately. I'll do that now. On 2013/02/08 19:24:18, Stephen White wrote: > On 2013/02/08 19:02:32, Humper wrote: > > box blur functions changed to use 16 bit fixed point and read/write to > temporary > > copies of the data for added precision. > > Have you benchmarked this? How does it affect performance? https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (right): https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:54: *dptr = (sum * scale + (1 << 15)) >> 16; \ Sorry, do you mean benchmark the code with and without the + (1 << 15) On 2013/02/08 19:24:18, Stephen White wrote: > On 2013/02/08 19:02:32, Humper wrote: > > effectively add 1/2; this prevents bias and makes sure that we blur an array > of > > 255's to an array of 255's, not 249's. > > This looks good, but could you benchmark all the rounding alone, separately from > the 16bit change? https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:255: for (; x < new_width - border - 16; x += 16) { I wish I had written them down :( I'll see if I can reproduce it in the master branch. Basically what I would see is that as I increased the size of the blur radius, the blur profile on the right hand side wasn't growing like you'd expect. There would be two or even sometimes three blur radii where the output mask was exactly the same width, even though the blur was getting wider. I initially thought this had to do with floor'ing something instead of round'ing it, but in the end I realized that if we're going to touch "border" pixels on the left hand side of the output mask, we should definitely touch "border" pixels on the right hand side, so I rewrote the loop this way and it went away. I'll see if I can repro it and send you a specific example. On 2013/02/08 19:24:18, Stephen White wrote: > On 2013/02/08 19:02:32, Humper wrote: > > Make sure that we touch exactly the right pixels; before this was causing some > > weird issues in the output size of the blurred mask where for certain border > > values it would be off by one. Very noticable when I animated the blur radius > > from 1 to 20; artifacts now gone. > > Could you give me an example of the input parameters where this would occur (not > saying you're wrong, I just want to understand). > https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:873: SkScalar passRadius = (kHigh_Quality == quality) ? SkScalarMul( radius, kBlurRadiusFudgeFactor): radius; Nuts, I didn't mean to move this above the low quality check, nice catch. On 2013/02/08 19:24:18, Stephen White wrote: > This is definitely going to change behaviour: before we were only forcing low > quality when the input radius was < 3; now we're only going to do it when the > *adjusted* radius is < 3. (I suspect this might be where the animation > artifacts actually went away, although I could be wrong.) https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:884: SkScalar box_width = SkScalarMul(passRadius, radius_multiplier); On 2013/02/08 19:33:25, robertphillips wrote: > SK_ScalarHalf hero!
Sign in to reply to this message.
https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (right): https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:255: for (; x < new_width - border - 16; x += 16) { On 2013/02/08 19:51:21, Humper wrote: > Basically what I would see is that as I increased the size of the blur radius, > the blur profile on the right hand side wasn't growing like you'd expect. There > would be two or even sometimes three blur radii where the output mask was > exactly the same width, even though the blur was getting wider. Part of that may be due to the 1/sqrt(3) fudge factor: the mask size will only grow when the *adjusted* radius grows by a pixel. (Sorry if this is obvious.) > I initially > thought this had to do with floor'ing something instead of round'ing it, but in > the end I realized that if we're going to touch "border" pixels on the left hand > side of the output mask, we should definitely touch "border" pixels on the right > hand side, so I rewrote the loop this way and it went away. That's not what this code does. In the width < diameter case, it fills "width" pixels with the left border then "width - diameter" with a constant, then it fills "width" pixels with the right border. In the width >= diameter case, it fills "diameter" pixels with the left border, then "width - diameter" with the blurred center, then it fills "diameter" pixels with the right border. In both cases, it should fill exactly width + LR + RR pixels. (Note that each border straddles the edge of the primitive, hence why they're "diameter", and not just LR or RR in size. In the first case, that makes them overlap, so they cancel each other out and the interior is a constant. (I probably should hoist the constant out of the loop, but I got lazy and am relying on the compiler to do CSE.)
Sign in to reply to this message.
https://codereview.appspot.com/7307076/diff/2001/bench/BlurRectBench.cpp File bench/BlurRectBench.cpp (right): https://codereview.appspot.com/7307076/diff/2001/bench/BlurRectBench.cpp#newc... bench/BlurRectBench.cpp:60: haha loop_count isn't even used! :D On 2013/02/08 19:51:21, Humper wrote: > On 2013/02/08 19:33:25, robertphillips wrote: > > should be loopCount. Why not just set loopCount to 1000 and then only change > it > > when > 25? > > Very recent change definitely a bug. I'll clean it up.
Sign in to reply to this message.
https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (right): https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:54: *dptr = (sum * scale + (1 << 15)) >> 16; \ On 2013/02/08 19:51:21, Humper wrote: > Sorry, do you mean benchmark the code with and without the + (1 << 15) Yes, I'd like to see how each change (16bit, rounding) separately affects performance. (I actually had a 32bit flavour of this at some point, and it wasn't horribly slower on Intel, but I have no idea how ARM devices might behave.) > On 2013/02/08 19:24:18, Stephen White wrote: > > On 2013/02/08 19:02:32, Humper wrote: > > > effectively add 1/2; this prevents bias and makes sure that we blur an array > > of > > > 255's to an array of 255's, not 249's. > > > > This looks good, but could you benchmark all the rounding alone, separately > from > > the 16bit change? >
Sign in to reply to this message.
Okay I've done all the style cleanups Robert mentioned. I'm going to benchmark the 16 bit fixed point changes now, and revisit the blur border code. On 2013/02/08 20:13:15, Humper wrote: > https://codereview.appspot.com/7307076/diff/2001/bench/BlurRectBench.cpp > File bench/BlurRectBench.cpp (right): > > https://codereview.appspot.com/7307076/diff/2001/bench/BlurRectBench.cpp#newc... > bench/BlurRectBench.cpp:60: > haha loop_count isn't even used! :D > > On 2013/02/08 19:51:21, Humper wrote: > > On 2013/02/08 19:33:25, robertphillips wrote: > > > should be loopCount. Why not just set loopCount to 1000 and then only change > > it > > > when > 25? > > > > Very recent change definitely a bug. I'll clean it up.
Sign in to reply to this message.
https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (right): https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:54: *dptr = (sum * scale + (1 << 15)) >> 16; \ After talking to Mike briefly about this, he agreed that the + (1 << 15) code was needed for correctness, so benching it separately wasn't important for landing this. I'm submitting some bench trybots with the new code so we can compare against ToT performance. On 2013/02/08 20:13:18, Stephen White wrote: > On 2013/02/08 19:51:21, Humper wrote: > > Sorry, do you mean benchmark the code with and without the + (1 << 15) > > Yes, I'd like to see how each change (16bit, rounding) separately affects > performance. (I actually had a 32bit flavour of this at some point, and it > wasn't horribly slower on Intel, but I have no idea how ARM devices might > behave.) > > > On 2013/02/08 19:24:18, Stephen White wrote: > > > On 2013/02/08 19:02:32, Humper wrote: > > > > effectively add 1/2; this prevents bias and makes sure that we blur an > array > > > of > > > > 255's to an array of 255's, not 249's. > > > > > > This looks good, but could you benchmark all the rounding alone, separately > > from > > > the 16bit change? > > >
Sign in to reply to this message.
https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (right): https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:54: *dptr = (sum * scale + (1 << 15)) >> 16; \ On 2013/02/08 21:36:29, Humper wrote: > After talking to Mike briefly about this, he agreed that the + (1 << 15) code > was needed for correctness, so benching it separately wasn't important for > landing this. Yes, I agree that the rounding is good. I'm not sure about the uint16_t change, though. All the other blur implementations in Skia and WebKit use same-sized (8 or 8888) intermediate buffers, and see no visible loss of quality AFAIK. Even the non-separable blur in this file used 8-bit temporary buffers between the 3 SAT passes, and it seemed to be OK. You're also adding a conversion step that wasn't there before, and possibly could be skipped by templating the blur loop to go from 8 -> 16 on the first pass, if 16 bit is absolutely necessary. > I'm submitting some bench trybots with the new code so we can compare against > ToT performance. > > On 2013/02/08 20:13:18, Stephen White wrote: > > On 2013/02/08 19:51:21, Humper wrote: > > > Sorry, do you mean benchmark the code with and without the + (1 << 15) > > > > Yes, I'd like to see how each change (16bit, rounding) separately affects > > performance. (I actually had a 32bit flavour of this at some point, and it > > wasn't horribly slower on Intel, but I have no idea how ARM devices might > > behave.) > > > > > On 2013/02/08 19:24:18, Stephen White wrote: > > > > On 2013/02/08 19:02:32, Humper wrote: > > > > > effectively add 1/2; this prevents bias and makes sure that we blur an > > array > > > > of > > > > > 255's to an array of 255's, not 249's. > > > > > > > > This looks good, but could you benchmark all the rounding alone, > separately > > > from > > > > the 16bit change? > > > > > >
Sign in to reply to this message.
https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (right): https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#n... src/effects/SkBlurMask.cpp:54: *dptr = (sum * scale + (1 << 15)) >> 16; \ Yep, we're in agreement -- I'm just waiting for the bench trybots to finish their thing before asking anyone to sign off. Might not happen until Monday at this rate :) On 2013/02/08 21:50:30, Stephen White wrote: > On 2013/02/08 21:36:29, Humper wrote: > > After talking to Mike briefly about this, he agreed that the + (1 << 15) code > > was needed for correctness, so benching it separately wasn't important for > > landing this. > > Yes, I agree that the rounding is good. I'm not sure about the uint16_t change, > though. All the other blur implementations in Skia and WebKit use same-sized (8 > or 8888) intermediate buffers, and see no visible loss of quality AFAIK. Even > the non-separable blur in this file used 8-bit temporary buffers between the 3 > SAT passes, and it seemed to be OK. You're also adding a conversion step that > wasn't there before, and possibly could be skipped by templating the blur loop > to go from 8 -> 16 on the first pass, if 16 bit is absolutely necessary. > > > I'm submitting some bench trybots with the new code so we can compare against > > ToT performance. > > > > On 2013/02/08 20:13:18, Stephen White wrote: > > > On 2013/02/08 19:51:21, Humper wrote: > > > > Sorry, do you mean benchmark the code with and without the + (1 << 15) > > > > > > Yes, I'd like to see how each change (16bit, rounding) separately affects > > > performance. (I actually had a 32bit flavour of this at some point, and it > > > wasn't horribly slower on Intel, but I have no idea how ARM devices might > > > behave.) > > > > > > > On 2013/02/08 19:24:18, Stephen White wrote: > > > > > On 2013/02/08 19:02:32, Humper wrote: > > > > > > effectively add 1/2; this prevents bias and makes sure that we blur an > > > array > > > > > of > > > > > > 255's to an array of 255's, not 249's. > > > > > > > > > > This looks good, but could you benchmark all the rounding alone, > > separately > > > > from > > > > > the 16bit change? > > > > > > > > > >
Sign in to reply to this message.
Bench results here: http://70.32.156.53:10117/builders/Skia_Shuttle_Ubuntu12_ATI5770_Float_Bench_... Greg Humphreys | Software Engineer [skia, chrome] | humper@google.com | 434-260-0543 On Fri, Feb 8, 2013 at 4:51 PM, <humper@google.com> wrote: > > https://codereview.appspot.**com/7307076/diff/2001/src/** > effects/SkBlurMask.cpp<https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp> > File src/effects/SkBlurMask.cpp (right): > > https://codereview.appspot.**com/7307076/diff/2001/src/** > effects/SkBlurMask.cpp#**newcode54<https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#newcode54> > src/effects/SkBlurMask.cpp:54: *dptr = (sum * scale + (1 << 15)) >> 16; > \ > Yep, we're in agreement -- I'm just waiting for the bench trybots to > finish their thing before asking anyone to sign off. Might not happen > until Monday at this rate :) > > > On 2013/02/08 21:50:30, Stephen White wrote: > >> On 2013/02/08 21:36:29, Humper wrote: >> > After talking to Mike briefly about this, he agreed that the + (1 << >> > 15) code > >> > was needed for correctness, so benching it separately wasn't >> > important for > >> > landing this. >> > > Yes, I agree that the rounding is good. I'm not sure about the >> > uint16_t change, > >> though. All the other blur implementations in Skia and WebKit use >> > same-sized (8 > >> or 8888) intermediate buffers, and see no visible loss of quality >> > AFAIK. Even > >> the non-separable blur in this file used 8-bit temporary buffers >> > between the 3 > >> SAT passes, and it seemed to be OK. You're also adding a conversion >> > step that > >> wasn't there before, and possibly could be skipped by templating the >> > blur loop > >> to go from 8 -> 16 on the first pass, if 16 bit is absolutely >> > necessary. > > > I'm submitting some bench trybots with the new code so we can >> > compare against > >> > ToT performance. >> > >> > On 2013/02/08 20:13:18, Stephen White wrote: >> > > On 2013/02/08 19:51:21, Humper wrote: >> > > > Sorry, do you mean benchmark the code with and without the + (1 >> > << 15) > >> > > >> > > Yes, I'd like to see how each change (16bit, rounding) separately >> > affects > >> > > performance. (I actually had a 32bit flavour of this at some >> > point, and it > >> > > wasn't horribly slower on Intel, but I have no idea how ARM >> > devices might > >> > > behave.) >> > > >> > > > On 2013/02/08 19:24:18, Stephen White wrote: >> > > > > On 2013/02/08 19:02:32, Humper wrote: >> > > > > > effectively add 1/2; this prevents bias and makes sure that >> > we blur an > >> > > array >> > > > > of >> > > > > > 255's to an array of 255's, not 249's. >> > > > > >> > > > > This looks good, but could you benchmark all the rounding >> > alone, > >> > separately >> > > > from >> > > > > the 16bit change? >> > > > >> > > >> > >> > > > https://codereview.appspot.**com/7307076/<https://codereview.appspot.com/7307... >
Sign in to reply to this message.
I also submitted bench runs for other platforms, but the system didn't send me mail when they were done like I thought it would, so I have to hunt those down. Greg Humphreys | Software Engineer [skia, chrome] | humper@google.com | 434-260-0543 On Sat, Feb 9, 2013 at 9:13 AM, Greg Humphreys <humper@google.com> wrote: > Bench results here: > > > http://70.32.156.53:10117/builders/Skia_Shuttle_Ubuntu12_ATI5770_Float_Bench_... > > Greg Humphreys | Software Engineer [skia, chrome] | humper@google.com | > 434-260-0543 > > > On Fri, Feb 8, 2013 at 4:51 PM, <humper@google.com> wrote: > >> >> https://codereview.appspot.**com/7307076/diff/2001/src/** >> effects/SkBlurMask.cpp<https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp> >> File src/effects/SkBlurMask.cpp (right): >> >> https://codereview.appspot.**com/7307076/diff/2001/src/** >> effects/SkBlurMask.cpp#**newcode54<https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#newcode54> >> src/effects/SkBlurMask.cpp:54: *dptr = (sum * scale + (1 << 15)) >> 16; >> \ >> Yep, we're in agreement -- I'm just waiting for the bench trybots to >> finish their thing before asking anyone to sign off. Might not happen >> until Monday at this rate :) >> >> >> On 2013/02/08 21:50:30, Stephen White wrote: >> >>> On 2013/02/08 21:36:29, Humper wrote: >>> > After talking to Mike briefly about this, he agreed that the + (1 << >>> >> 15) code >> >>> > was needed for correctness, so benching it separately wasn't >>> >> important for >> >>> > landing this. >>> >> >> Yes, I agree that the rounding is good. I'm not sure about the >>> >> uint16_t change, >> >>> though. All the other blur implementations in Skia and WebKit use >>> >> same-sized (8 >> >>> or 8888) intermediate buffers, and see no visible loss of quality >>> >> AFAIK. Even >> >>> the non-separable blur in this file used 8-bit temporary buffers >>> >> between the 3 >> >>> SAT passes, and it seemed to be OK. You're also adding a conversion >>> >> step that >> >>> wasn't there before, and possibly could be skipped by templating the >>> >> blur loop >> >>> to go from 8 -> 16 on the first pass, if 16 bit is absolutely >>> >> necessary. >> >> > I'm submitting some bench trybots with the new code so we can >>> >> compare against >> >>> > ToT performance. >>> > >>> > On 2013/02/08 20:13:18, Stephen White wrote: >>> > > On 2013/02/08 19:51:21, Humper wrote: >>> > > > Sorry, do you mean benchmark the code with and without the + (1 >>> >> << 15) >> >>> > > >>> > > Yes, I'd like to see how each change (16bit, rounding) separately >>> >> affects >> >>> > > performance. (I actually had a 32bit flavour of this at some >>> >> point, and it >> >>> > > wasn't horribly slower on Intel, but I have no idea how ARM >>> >> devices might >> >>> > > behave.) >>> > > >>> > > > On 2013/02/08 19:24:18, Stephen White wrote: >>> > > > > On 2013/02/08 19:02:32, Humper wrote: >>> > > > > > effectively add 1/2; this prevents bias and makes sure that >>> >> we blur an >> >>> > > array >>> > > > > of >>> > > > > > 255's to an array of 255's, not 249's. >>> > > > > >>> > > > > This looks good, but could you benchmark all the rounding >>> >> alone, >> >>> > separately >>> > > > from >>> > > > > the 16bit change? >>> > > > >>> > > >>> > >>> >> >> >> https://codereview.appspot.**com/7307076/<https://codereview.appspot.com/7307... >> > >
Sign in to reply to this message.
... what is the summary of the bench-results for your CL? On Sat, Feb 9, 2013 at 9:14 AM, Greg Humphreys <humper@google.com> wrote: > I also submitted bench runs for other platforms, but the system didn't > send me mail when they were done like I thought it would, so I have to hunt > those down. > > Greg Humphreys | Software Engineer [skia, chrome] | humper@google.com | > 434-260-0543 > > > On Sat, Feb 9, 2013 at 9:13 AM, Greg Humphreys <humper@google.com> wrote: > >> Bench results here: >> >> >> http://70.32.156.53:10117/builders/Skia_Shuttle_Ubuntu12_ATI5770_Float_Bench_... >> >> Greg Humphreys | Software Engineer [skia, chrome] | humper@google.com | >> 434-260-0543 >> >> >> On Fri, Feb 8, 2013 at 4:51 PM, <humper@google.com> wrote: >> >>> >>> https://codereview.appspot.**com/7307076/diff/2001/src/** >>> effects/SkBlurMask.cpp<https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp> >>> File src/effects/SkBlurMask.cpp (right): >>> >>> https://codereview.appspot.**com/7307076/diff/2001/src/** >>> effects/SkBlurMask.cpp#**newcode54<https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#newcode54> >>> src/effects/SkBlurMask.cpp:54: *dptr = (sum * scale + (1 << 15)) >> 16; >>> \ >>> Yep, we're in agreement -- I'm just waiting for the bench trybots to >>> finish their thing before asking anyone to sign off. Might not happen >>> until Monday at this rate :) >>> >>> >>> On 2013/02/08 21:50:30, Stephen White wrote: >>> >>>> On 2013/02/08 21:36:29, Humper wrote: >>>> > After talking to Mike briefly about this, he agreed that the + (1 << >>>> >>> 15) code >>> >>>> > was needed for correctness, so benching it separately wasn't >>>> >>> important for >>> >>>> > landing this. >>>> >>> >>> Yes, I agree that the rounding is good. I'm not sure about the >>>> >>> uint16_t change, >>> >>>> though. All the other blur implementations in Skia and WebKit use >>>> >>> same-sized (8 >>> >>>> or 8888) intermediate buffers, and see no visible loss of quality >>>> >>> AFAIK. Even >>> >>>> the non-separable blur in this file used 8-bit temporary buffers >>>> >>> between the 3 >>> >>>> SAT passes, and it seemed to be OK. You're also adding a conversion >>>> >>> step that >>> >>>> wasn't there before, and possibly could be skipped by templating the >>>> >>> blur loop >>> >>>> to go from 8 -> 16 on the first pass, if 16 bit is absolutely >>>> >>> necessary. >>> >>> > I'm submitting some bench trybots with the new code so we can >>>> >>> compare against >>> >>>> > ToT performance. >>>> > >>>> > On 2013/02/08 20:13:18, Stephen White wrote: >>>> > > On 2013/02/08 19:51:21, Humper wrote: >>>> > > > Sorry, do you mean benchmark the code with and without the + (1 >>>> >>> << 15) >>> >>>> > > >>>> > > Yes, I'd like to see how each change (16bit, rounding) separately >>>> >>> affects >>> >>>> > > performance. (I actually had a 32bit flavour of this at some >>>> >>> point, and it >>> >>>> > > wasn't horribly slower on Intel, but I have no idea how ARM >>>> >>> devices might >>> >>>> > > behave.) >>>> > > >>>> > > > On 2013/02/08 19:24:18, Stephen White wrote: >>>> > > > > On 2013/02/08 19:02:32, Humper wrote: >>>> > > > > > effectively add 1/2; this prevents bias and makes sure that >>>> >>> we blur an >>> >>>> > > array >>>> > > > > of >>>> > > > > > 255's to an array of 255's, not 249's. >>>> > > > > >>>> > > > > This looks good, but could you benchmark all the rounding >>>> >>> alone, >>> >>>> > separately >>>> > > > from >>>> > > > > the 16bit change? >>>> > > > >>>> > > >>>> > >>>> >>> >>> >>> https://codereview.appspot.**com/7307076/<https://codereview.appspot.com/7307... >>> >> >> >
Sign in to reply to this message.
Ah, the summary is that the bench trybots don't email me when they're done, which limits their usefulness since it can take a pretty long time for the results to finish. Now I have to go hunt down all the results and compare them to ToT. I'll do that now. Greg Humphreys | Software Engineer [skia, chrome] | humper@google.com | 434-260-0543 On Mon, Feb 11, 2013 at 9:36 AM, Mike Reed <reed@google.com> wrote: > ... what is the summary of the bench-results for your CL? > > > On Sat, Feb 9, 2013 at 9:14 AM, Greg Humphreys <humper@google.com> wrote: > >> I also submitted bench runs for other platforms, but the system didn't >> send me mail when they were done like I thought it would, so I have to hunt >> those down. >> >> Greg Humphreys | Software Engineer [skia, chrome] | humper@google.com | >> 434-260-0543 >> >> >> On Sat, Feb 9, 2013 at 9:13 AM, Greg Humphreys <humper@google.com> wrote: >> >>> Bench results here: >>> >>> >>> http://70.32.156.53:10117/builders/Skia_Shuttle_Ubuntu12_ATI5770_Float_Bench_... >>> >>> Greg Humphreys | Software Engineer [skia, chrome] | humper@google.com | >>> 434-260-0543 >>> >>> >>> On Fri, Feb 8, 2013 at 4:51 PM, <humper@google.com> wrote: >>> >>>> >>>> https://codereview.appspot.**com/7307076/diff/2001/src/** >>>> effects/SkBlurMask.cpp<https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp> >>>> File src/effects/SkBlurMask.cpp (right): >>>> >>>> https://codereview.appspot.**com/7307076/diff/2001/src/** >>>> effects/SkBlurMask.cpp#**newcode54<https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#newcode54> >>>> src/effects/SkBlurMask.cpp:54: *dptr = (sum * scale + (1 << 15)) >> 16; >>>> \ >>>> Yep, we're in agreement -- I'm just waiting for the bench trybots to >>>> finish their thing before asking anyone to sign off. Might not happen >>>> until Monday at this rate :) >>>> >>>> >>>> On 2013/02/08 21:50:30, Stephen White wrote: >>>> >>>>> On 2013/02/08 21:36:29, Humper wrote: >>>>> > After talking to Mike briefly about this, he agreed that the + (1 << >>>>> >>>> 15) code >>>> >>>>> > was needed for correctness, so benching it separately wasn't >>>>> >>>> important for >>>> >>>>> > landing this. >>>>> >>>> >>>> Yes, I agree that the rounding is good. I'm not sure about the >>>>> >>>> uint16_t change, >>>> >>>>> though. All the other blur implementations in Skia and WebKit use >>>>> >>>> same-sized (8 >>>> >>>>> or 8888) intermediate buffers, and see no visible loss of quality >>>>> >>>> AFAIK. Even >>>> >>>>> the non-separable blur in this file used 8-bit temporary buffers >>>>> >>>> between the 3 >>>> >>>>> SAT passes, and it seemed to be OK. You're also adding a conversion >>>>> >>>> step that >>>> >>>>> wasn't there before, and possibly could be skipped by templating the >>>>> >>>> blur loop >>>> >>>>> to go from 8 -> 16 on the first pass, if 16 bit is absolutely >>>>> >>>> necessary. >>>> >>>> > I'm submitting some bench trybots with the new code so we can >>>>> >>>> compare against >>>> >>>>> > ToT performance. >>>>> > >>>>> > On 2013/02/08 20:13:18, Stephen White wrote: >>>>> > > On 2013/02/08 19:51:21, Humper wrote: >>>>> > > > Sorry, do you mean benchmark the code with and without the + (1 >>>>> >>>> << 15) >>>> >>>>> > > >>>>> > > Yes, I'd like to see how each change (16bit, rounding) separately >>>>> >>>> affects >>>> >>>>> > > performance. (I actually had a 32bit flavour of this at some >>>>> >>>> point, and it >>>> >>>>> > > wasn't horribly slower on Intel, but I have no idea how ARM >>>>> >>>> devices might >>>> >>>>> > > behave.) >>>>> > > >>>>> > > > On 2013/02/08 19:24:18, Stephen White wrote: >>>>> > > > > On 2013/02/08 19:02:32, Humper wrote: >>>>> > > > > > effectively add 1/2; this prevents bias and makes sure that >>>>> >>>> we blur an >>>> >>>>> > > array >>>>> > > > > of >>>>> > > > > > 255's to an array of 255's, not 249's. >>>>> > > > > >>>>> > > > > This looks good, but could you benchmark all the rounding >>>>> >>>> alone, >>>> >>>>> > separately >>>>> > > > from >>>>> > > > > the 16bit change? >>>>> > > > >>>>> > > >>>>> > >>>>> >>>> >>>> >>>> https://codereview.appspot.**com/7307076/<https://codereview.appspot.com/7307... >>>> >>> >>> >> >
Sign in to reply to this message.
Nevermind, the mails were just going to spam. The summary is that it seems some blur benches got a *little* faster, and some got a *little* slower. I'm going to do a before/after image comparison of 16 vs. 8 bit fixed point intermediate calculations and see if we notice any appreciable difference, both between the 16/8 versions themselves, and also between the analytic solutions and the 16/8 versions. Greg Humphreys | Software Engineer [skia, chrome] | humper@google.com | 434-260-0543 On Mon, Feb 11, 2013 at 9:38 AM, Greg Humphreys <humper@google.com> wrote: > Ah, the summary is that the bench trybots don't email me when they're > done, which limits their usefulness since it can take a pretty long time > for the results to finish. Now I have to go hunt down all the results and > compare them to ToT. > > I'll do that now. > > Greg Humphreys | Software Engineer [skia, chrome] | humper@google.com | > 434-260-0543 > > > On Mon, Feb 11, 2013 at 9:36 AM, Mike Reed <reed@google.com> wrote: > >> ... what is the summary of the bench-results for your CL? >> >> >> On Sat, Feb 9, 2013 at 9:14 AM, Greg Humphreys <humper@google.com> wrote: >> >>> I also submitted bench runs for other platforms, but the system didn't >>> send me mail when they were done like I thought it would, so I have to hunt >>> those down. >>> >>> Greg Humphreys | Software Engineer [skia, chrome] | humper@google.com | >>> 434-260-0543 >>> >>> >>> On Sat, Feb 9, 2013 at 9:13 AM, Greg Humphreys <humper@google.com>wrote: >>> >>>> Bench results here: >>>> >>>> >>>> http://70.32.156.53:10117/builders/Skia_Shuttle_Ubuntu12_ATI5770_Float_Bench_... >>>> >>>> Greg Humphreys | Software Engineer [skia, chrome] | humper@google.com | >>>> 434-260-0543 >>>> >>>> >>>> On Fri, Feb 8, 2013 at 4:51 PM, <humper@google.com> wrote: >>>> >>>>> >>>>> https://codereview.appspot.**com/7307076/diff/2001/src/** >>>>> effects/SkBlurMask.cpp<https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp> >>>>> File src/effects/SkBlurMask.cpp (right): >>>>> >>>>> https://codereview.appspot.**com/7307076/diff/2001/src/** >>>>> effects/SkBlurMask.cpp#**newcode54<https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#newcode54> >>>>> src/effects/SkBlurMask.cpp:54: *dptr = (sum * scale + (1 << 15)) >> 16; >>>>> \ >>>>> Yep, we're in agreement -- I'm just waiting for the bench trybots to >>>>> finish their thing before asking anyone to sign off. Might not happen >>>>> until Monday at this rate :) >>>>> >>>>> >>>>> On 2013/02/08 21:50:30, Stephen White wrote: >>>>> >>>>>> On 2013/02/08 21:36:29, Humper wrote: >>>>>> > After talking to Mike briefly about this, he agreed that the + (1 << >>>>>> >>>>> 15) code >>>>> >>>>>> > was needed for correctness, so benching it separately wasn't >>>>>> >>>>> important for >>>>> >>>>>> > landing this. >>>>>> >>>>> >>>>> Yes, I agree that the rounding is good. I'm not sure about the >>>>>> >>>>> uint16_t change, >>>>> >>>>>> though. All the other blur implementations in Skia and WebKit use >>>>>> >>>>> same-sized (8 >>>>> >>>>>> or 8888) intermediate buffers, and see no visible loss of quality >>>>>> >>>>> AFAIK. Even >>>>> >>>>>> the non-separable blur in this file used 8-bit temporary buffers >>>>>> >>>>> between the 3 >>>>> >>>>>> SAT passes, and it seemed to be OK. You're also adding a conversion >>>>>> >>>>> step that >>>>> >>>>>> wasn't there before, and possibly could be skipped by templating the >>>>>> >>>>> blur loop >>>>> >>>>>> to go from 8 -> 16 on the first pass, if 16 bit is absolutely >>>>>> >>>>> necessary. >>>>> >>>>> > I'm submitting some bench trybots with the new code so we can >>>>>> >>>>> compare against >>>>> >>>>>> > ToT performance. >>>>>> > >>>>>> > On 2013/02/08 20:13:18, Stephen White wrote: >>>>>> > > On 2013/02/08 19:51:21, Humper wrote: >>>>>> > > > Sorry, do you mean benchmark the code with and without the + (1 >>>>>> >>>>> << 15) >>>>> >>>>>> > > >>>>>> > > Yes, I'd like to see how each change (16bit, rounding) separately >>>>>> >>>>> affects >>>>> >>>>>> > > performance. (I actually had a 32bit flavour of this at some >>>>>> >>>>> point, and it >>>>> >>>>>> > > wasn't horribly slower on Intel, but I have no idea how ARM >>>>>> >>>>> devices might >>>>> >>>>>> > > behave.) >>>>>> > > >>>>>> > > > On 2013/02/08 19:24:18, Stephen White wrote: >>>>>> > > > > On 2013/02/08 19:02:32, Humper wrote: >>>>>> > > > > > effectively add 1/2; this prevents bias and makes sure that >>>>>> >>>>> we blur an >>>>> >>>>>> > > array >>>>>> > > > > of >>>>>> > > > > > 255's to an array of 255's, not 249's. >>>>>> > > > > >>>>>> > > > > This looks good, but could you benchmark all the rounding >>>>>> >>>>> alone, >>>>> >>>>>> > separately >>>>>> > > > from >>>>>> > > > > the 16bit change? >>>>>> > > > >>>>>> > > >>>>>> > >>>>>> >>>>> >>>>> >>>>> https://codereview.appspot.**com/7307076/<https://codereview.appspot.com/7307... >>>>> >>>> >>>> >>> >> >
Sign in to reply to this message.
In these experiments I saw that in 8 bit mode we lack the precision to make sure that averaging a bunch of 255's yields 255. They don't quite dip down as low as 249 like before, but in many cases I did see them go down to 253. tl;dr version: performance impact of making 16 bit fixed point copies for intermediate calculations seem to be basically noise, and the results are measurably higher quality. Greg Humphreys | Software Engineer [skia, chrome] | humper@google.com | 434-260-0543 On Mon, Feb 11, 2013 at 10:29 AM, Greg Humphreys <humper@google.com> wrote: > Nevermind, the mails were just going to spam. > > The summary is that it seems some blur benches got a *little* faster, and > some got a *little* slower. > > I'm going to do a before/after image comparison of 16 vs. 8 bit fixed > point intermediate calculations and see if we notice any appreciable > difference, both between the 16/8 versions themselves, and also between the > analytic solutions and the 16/8 versions. > > Greg Humphreys | Software Engineer [skia, chrome] | humper@google.com | > 434-260-0543 > > > On Mon, Feb 11, 2013 at 9:38 AM, Greg Humphreys <humper@google.com> wrote: > >> Ah, the summary is that the bench trybots don't email me when they're >> done, which limits their usefulness since it can take a pretty long time >> for the results to finish. Now I have to go hunt down all the results and >> compare them to ToT. >> >> I'll do that now. >> >> Greg Humphreys | Software Engineer [skia, chrome] | humper@google.com | >> 434-260-0543 >> >> >> On Mon, Feb 11, 2013 at 9:36 AM, Mike Reed <reed@google.com> wrote: >> >>> ... what is the summary of the bench-results for your CL? >>> >>> >>> On Sat, Feb 9, 2013 at 9:14 AM, Greg Humphreys <humper@google.com>wrote: >>> >>>> I also submitted bench runs for other platforms, but the system didn't >>>> send me mail when they were done like I thought it would, so I have to hunt >>>> those down. >>>> >>>> Greg Humphreys | Software Engineer [skia, chrome] | humper@google.com | >>>> 434-260-0543 >>>> >>>> >>>> On Sat, Feb 9, 2013 at 9:13 AM, Greg Humphreys <humper@google.com>wrote: >>>> >>>>> Bench results here: >>>>> >>>>> >>>>> http://70.32.156.53:10117/builders/Skia_Shuttle_Ubuntu12_ATI5770_Float_Bench_... >>>>> >>>>> Greg Humphreys | Software Engineer [skia, chrome] | humper@google.com >>>>> | 434-260-0543 >>>>> >>>>> >>>>> On Fri, Feb 8, 2013 at 4:51 PM, <humper@google.com> wrote: >>>>> >>>>>> >>>>>> https://codereview.appspot.**com/7307076/diff/2001/src/** >>>>>> effects/SkBlurMask.cpp<https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp> >>>>>> File src/effects/SkBlurMask.cpp (right): >>>>>> >>>>>> https://codereview.appspot.**com/7307076/diff/2001/src/** >>>>>> effects/SkBlurMask.cpp#**newcode54<https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#newcode54> >>>>>> src/effects/SkBlurMask.cpp:54: *dptr = (sum * scale + (1 << 15)) >> >>>>>> 16; >>>>>> \ >>>>>> Yep, we're in agreement -- I'm just waiting for the bench trybots to >>>>>> finish their thing before asking anyone to sign off. Might not happen >>>>>> until Monday at this rate :) >>>>>> >>>>>> >>>>>> On 2013/02/08 21:50:30, Stephen White wrote: >>>>>> >>>>>>> On 2013/02/08 21:36:29, Humper wrote: >>>>>>> > After talking to Mike briefly about this, he agreed that the + (1 >>>>>>> << >>>>>>> >>>>>> 15) code >>>>>> >>>>>>> > was needed for correctness, so benching it separately wasn't >>>>>>> >>>>>> important for >>>>>> >>>>>>> > landing this. >>>>>>> >>>>>> >>>>>> Yes, I agree that the rounding is good. I'm not sure about the >>>>>>> >>>>>> uint16_t change, >>>>>> >>>>>>> though. All the other blur implementations in Skia and WebKit use >>>>>>> >>>>>> same-sized (8 >>>>>> >>>>>>> or 8888) intermediate buffers, and see no visible loss of quality >>>>>>> >>>>>> AFAIK. Even >>>>>> >>>>>>> the non-separable blur in this file used 8-bit temporary buffers >>>>>>> >>>>>> between the 3 >>>>>> >>>>>>> SAT passes, and it seemed to be OK. You're also adding a conversion >>>>>>> >>>>>> step that >>>>>> >>>>>>> wasn't there before, and possibly could be skipped by templating the >>>>>>> >>>>>> blur loop >>>>>> >>>>>>> to go from 8 -> 16 on the first pass, if 16 bit is absolutely >>>>>>> >>>>>> necessary. >>>>>> >>>>>> > I'm submitting some bench trybots with the new code so we can >>>>>>> >>>>>> compare against >>>>>> >>>>>>> > ToT performance. >>>>>>> > >>>>>>> > On 2013/02/08 20:13:18, Stephen White wrote: >>>>>>> > > On 2013/02/08 19:51:21, Humper wrote: >>>>>>> > > > Sorry, do you mean benchmark the code with and without the + (1 >>>>>>> >>>>>> << 15) >>>>>> >>>>>>> > > >>>>>>> > > Yes, I'd like to see how each change (16bit, rounding) separately >>>>>>> >>>>>> affects >>>>>> >>>>>>> > > performance. (I actually had a 32bit flavour of this at some >>>>>>> >>>>>> point, and it >>>>>> >>>>>>> > > wasn't horribly slower on Intel, but I have no idea how ARM >>>>>>> >>>>>> devices might >>>>>> >>>>>>> > > behave.) >>>>>>> > > >>>>>>> > > > On 2013/02/08 19:24:18, Stephen White wrote: >>>>>>> > > > > On 2013/02/08 19:02:32, Humper wrote: >>>>>>> > > > > > effectively add 1/2; this prevents bias and makes sure that >>>>>>> >>>>>> we blur an >>>>>> >>>>>>> > > array >>>>>>> > > > > of >>>>>>> > > > > > 255's to an array of 255's, not 249's. >>>>>>> > > > > >>>>>>> > > > > This looks good, but could you benchmark all the rounding >>>>>>> >>>>>> alone, >>>>>> >>>>>>> > separately >>>>>>> > > > from >>>>>>> > > > > the 16bit change? >>>>>>> > > > >>>>>>> > > >>>>>>> > >>>>>>> >>>>>> >>>>>> >>>>>> https://codereview.appspot.**com/7307076/<https://codereview.appspot.com/7307... >>>>>> >>>>> >>>>> >>>> >>> >> >
Sign in to reply to this message.
CL updated with some style cleanups as well as support for rectangles that are thinner than the support of the blur kernel. On 2013/02/11 17:23:56, Humper wrote: > In these experiments I saw that in 8 bit mode we lack the precision to make > sure that averaging a bunch of 255's yields 255. They don't quite dip down > as low as 249 like before, but in many cases I did see them go down to 253. > > tl;dr version: performance impact of making 16 bit fixed point copies for > intermediate calculations seem to be basically noise, and the results are > measurably higher quality. > > Greg Humphreys | Software Engineer [skia, chrome] | mailto:humper@google.com | > 434-260-0543 > > > On Mon, Feb 11, 2013 at 10:29 AM, Greg Humphreys <mailto:humper@google.com> wrote: > > > Nevermind, the mails were just going to spam. > > > > The summary is that it seems some blur benches got a *little* faster, and > > some got a *little* slower. > > > > I'm going to do a before/after image comparison of 16 vs. 8 bit fixed > > point intermediate calculations and see if we notice any appreciable > > difference, both between the 16/8 versions themselves, and also between the > > analytic solutions and the 16/8 versions. > > > > Greg Humphreys | Software Engineer [skia, chrome] | mailto:humper@google.com | > > 434-260-0543 > > > > > > On Mon, Feb 11, 2013 at 9:38 AM, Greg Humphreys <mailto:humper@google.com> wrote: > > > >> Ah, the summary is that the bench trybots don't email me when they're > >> done, which limits their usefulness since it can take a pretty long time > >> for the results to finish. Now I have to go hunt down all the results and > >> compare them to ToT. > >> > >> I'll do that now. > >> > >> Greg Humphreys | Software Engineer [skia, chrome] | mailto:humper@google.com | > >> 434-260-0543 > >> > >> > >> On Mon, Feb 11, 2013 at 9:36 AM, Mike Reed <mailto:reed@google.com> wrote: > >> > >>> ... what is the summary of the bench-results for your CL? > >>> > >>> > >>> On Sat, Feb 9, 2013 at 9:14 AM, Greg Humphreys <humper@google.com>wrote: > >>> > >>>> I also submitted bench runs for other platforms, but the system didn't > >>>> send me mail when they were done like I thought it would, so I have to > hunt > >>>> those down. > >>>> > >>>> Greg Humphreys | Software Engineer [skia, chrome] | mailto:humper@google.com | > >>>> 434-260-0543 > >>>> > >>>> > >>>> On Sat, Feb 9, 2013 at 9:13 AM, Greg Humphreys <humper@google.com>wrote: > >>>> > >>>>> Bench results here: > >>>>> > >>>>> > >>>>> > http://70.32.156.53:10117/builders/Skia_Shuttle_Ubuntu12_ATI5770_Float_Bench_... > >>>>> > >>>>> Greg Humphreys | Software Engineer [skia, chrome] | mailto:humper@google.com > >>>>> | 434-260-0543 > >>>>> > >>>>> > >>>>> On Fri, Feb 8, 2013 at 4:51 PM, <mailto:humper@google.com> wrote: > >>>>> > >>>>>> > >>>>>> https://codereview.appspot.**com/7307076/diff/2001/src/** > >>>>>> > effects/SkBlurMask.cpp<https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp> > >>>>>> File src/effects/SkBlurMask.cpp (right): > >>>>>> > >>>>>> https://codereview.appspot.**com/7307076/diff/2001/src/** > >>>>>> > effects/SkBlurMask.cpp#**newcode54<https://codereview.appspot.com/7307076/diff/2001/src/effects/SkBlurMask.cpp#newcode54> > >>>>>> src/effects/SkBlurMask.cpp:54: *dptr = (sum * scale + (1 << 15)) >> > >>>>>> 16; > >>>>>> \ > >>>>>> Yep, we're in agreement -- I'm just waiting for the bench trybots to > >>>>>> finish their thing before asking anyone to sign off. Might not happen > >>>>>> until Monday at this rate :) > >>>>>> > >>>>>> > >>>>>> On 2013/02/08 21:50:30, Stephen White wrote: > >>>>>> > >>>>>>> On 2013/02/08 21:36:29, Humper wrote: > >>>>>>> > After talking to Mike briefly about this, he agreed that the + (1 > >>>>>>> << > >>>>>>> > >>>>>> 15) code > >>>>>> > >>>>>>> > was needed for correctness, so benching it separately wasn't > >>>>>>> > >>>>>> important for > >>>>>> > >>>>>>> > landing this. > >>>>>>> > >>>>>> > >>>>>> Yes, I agree that the rounding is good. I'm not sure about the > >>>>>>> > >>>>>> uint16_t change, > >>>>>> > >>>>>>> though. All the other blur implementations in Skia and WebKit use > >>>>>>> > >>>>>> same-sized (8 > >>>>>> > >>>>>>> or 8888) intermediate buffers, and see no visible loss of quality > >>>>>>> > >>>>>> AFAIK. Even > >>>>>> > >>>>>>> the non-separable blur in this file used 8-bit temporary buffers > >>>>>>> > >>>>>> between the 3 > >>>>>> > >>>>>>> SAT passes, and it seemed to be OK. You're also adding a conversion > >>>>>>> > >>>>>> step that > >>>>>> > >>>>>>> wasn't there before, and possibly could be skipped by templating the > >>>>>>> > >>>>>> blur loop > >>>>>> > >>>>>>> to go from 8 -> 16 on the first pass, if 16 bit is absolutely > >>>>>>> > >>>>>> necessary. > >>>>>> > >>>>>> > I'm submitting some bench trybots with the new code so we can > >>>>>>> > >>>>>> compare against > >>>>>> > >>>>>>> > ToT performance. > >>>>>>> > > >>>>>>> > On 2013/02/08 20:13:18, Stephen White wrote: > >>>>>>> > > On 2013/02/08 19:51:21, Humper wrote: > >>>>>>> > > > Sorry, do you mean benchmark the code with and without the + (1 > >>>>>>> > >>>>>> << 15) > >>>>>> > >>>>>>> > > > >>>>>>> > > Yes, I'd like to see how each change (16bit, rounding) separately > >>>>>>> > >>>>>> affects > >>>>>> > >>>>>>> > > performance. (I actually had a 32bit flavour of this at some > >>>>>>> > >>>>>> point, and it > >>>>>> > >>>>>>> > > wasn't horribly slower on Intel, but I have no idea how ARM > >>>>>>> > >>>>>> devices might > >>>>>> > >>>>>>> > > behave.) > >>>>>>> > > > >>>>>>> > > > On 2013/02/08 19:24:18, Stephen White wrote: > >>>>>>> > > > > On 2013/02/08 19:02:32, Humper wrote: > >>>>>>> > > > > > effectively add 1/2; this prevents bias and makes sure that > >>>>>>> > >>>>>> we blur an > >>>>>> > >>>>>>> > > array > >>>>>>> > > > > of > >>>>>>> > > > > > 255's to an array of 255's, not 249's. > >>>>>>> > > > > > >>>>>>> > > > > This looks good, but could you benchmark all the rounding > >>>>>>> > >>>>>> alone, > >>>>>> > >>>>>>> > separately > >>>>>>> > > > from > >>>>>>> > > > > the 16bit change? > >>>>>>> > > > > >>>>>>> > > > >>>>>>> > > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > https://codereview.appspot.**com/7307076/%3Chttps://codereview.appspot.com/73...> > >>>>>> > >>>>> > >>>>> > >>>> > >>> > >> > >
Sign in to reply to this message.
https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (right): https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp#... src/effects/SkBlurMask.cpp:198: { I know its not new to this CL, but it is not at all clear what math-space we are dealing with in this function. I see byte >> 7 which is just adding the high-bit for some reason, and byte << 8 / kernel, etc. Can this be described in a short block-comment somewhere? https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp#... src/effects/SkBlurMask.cpp:1161: static inline unsigned int profile_lookup( unsigned int *profile, int loc, int blurred_width, int sharp_width ) { is this abs() call getting inlined for sure? If not, we have SkAbs32() which does. https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp#... src/effects/SkBlurMask.cpp:1163: int ox = dx >> 1; if (...) { } https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp#... src/effects/SkBlurMask.cpp:1238: for (int y = 0 ; y < dstHeight ; ++y) for (...) { https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.h File src/effects/SkBlurMask.h (right): https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.h#ne... src/effects/SkBlurMask.h:33: SkIPoint *margin = NULL); // What does "Simple" mean here? Why no Quality param? Why is radius renamed to provided_radius? In general, even though this header is "private", I think we should try to make its interface clean, consistent, and somewhat clear/documented. (not just limited to this CL).
Sign in to reply to this message.
https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (right): https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp#... src/effects/SkBlurMask.cpp:198: { To be honest the details of boxBlurInterp are a little opaque to me, since I didn't write it. Maybe we can open a new issue for Steven to document this code a little? On 2013/02/12 15:14:35, reed1 wrote: > I know its not new to this CL, but it is not at all clear what math-space we are > dealing with in this function. I see byte >> 7 which is just adding the high-bit > for some reason, and byte << 8 / kernel, etc. Can this be described in a short > block-comment somewhere? https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp#... src/effects/SkBlurMask.cpp:1161: static inline unsigned int profile_lookup( unsigned int *profile, int loc, int blurred_width, int sharp_width ) { Not sure -- I'll use the Skia one. On 2013/02/12 15:14:35, reed1 wrote: > is this abs() call getting inlined for sure? If not, we have SkAbs32() which > does. https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp#... src/effects/SkBlurMask.cpp:1163: int ox = dx >> 1; On 2013/02/12 15:14:35, reed1 wrote: > if (...) { > } Done. https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp#... src/effects/SkBlurMask.cpp:1238: for (int y = 0 ; y < dstHeight ; ++y) On 2013/02/12 15:14:35, reed1 wrote: > for (...) { Done. https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.h File src/effects/SkBlurMask.h (right): https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.h#ne... src/effects/SkBlurMask.h:33: SkIPoint *margin = NULL); In this case it's the ground truth gaussian blur that I was using for debugging. I don't think we should remove it from the interface because it's really useful to have around should things go wonky in the future. I'll rename it to something like BlurGroundTruth or something On 2013/02/12 15:14:35, reed1 wrote: > // What does "Simple" mean here? Why no Quality param? Why is radius renamed to > provided_radius? > > In general, even though this header is "private", I think we should try to make > its interface clean, consistent, and somewhat clear/documented. (not just > limited to this CL).
Sign in to reply to this message.
New code uploaded. Incidentally, the 'quality' and 'style' parameters to BlurRect aren't used anywhere. They should probably be removed, since BlurRect doesn't really support them. On 2013/02/12 20:34:04, Humper wrote: > https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp > File src/effects/SkBlurMask.cpp (right): > > https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp#... > src/effects/SkBlurMask.cpp:198: { > To be honest the details of boxBlurInterp are a little opaque to me, since I > didn't write it. Maybe we can open a new issue for Steven to document this code > a little? > > On 2013/02/12 15:14:35, reed1 wrote: > > I know its not new to this CL, but it is not at all clear what math-space we > are > > dealing with in this function. I see byte >> 7 which is just adding the > high-bit > > for some reason, and byte << 8 / kernel, etc. Can this be described in a short > > block-comment somewhere? > > https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp#... > src/effects/SkBlurMask.cpp:1161: static inline unsigned int profile_lookup( > unsigned int *profile, int loc, int blurred_width, int sharp_width ) { > Not sure -- I'll use the Skia one. > > On 2013/02/12 15:14:35, reed1 wrote: > > is this abs() call getting inlined for sure? If not, we have SkAbs32() which > > does. > > https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp#... > src/effects/SkBlurMask.cpp:1163: int ox = dx >> 1; > On 2013/02/12 15:14:35, reed1 wrote: > > if (...) { > > } > > Done. > > https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp#... > src/effects/SkBlurMask.cpp:1238: for (int y = 0 ; y < dstHeight ; ++y) > On 2013/02/12 15:14:35, reed1 wrote: > > for (...) { > > Done. > > https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.h > File src/effects/SkBlurMask.h (right): > > https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.h#ne... > src/effects/SkBlurMask.h:33: SkIPoint *margin = NULL); > In this case it's the ground truth gaussian blur that I was using for debugging. > I don't think we should remove it from the interface because it's really useful > to have around should things go wonky in the future. I'll rename it to > something like BlurGroundTruth or something > > On 2013/02/12 15:14:35, reed1 wrote: > > // What does "Simple" mean here? Why no Quality param? Why is radius renamed > to > > provided_radius? > > > > In general, even though this header is "private", I think we should try to > make > > its interface clean, consistent, and somewhat clear/documented. (not just > > limited to this CL).
Sign in to reply to this message.
Final draft adds support for blur styles to the fast analytic path; they match up with their slow counterparts. On 2013/02/12 20:38:11, Humper wrote: > New code uploaded. > > Incidentally, the 'quality' and 'style' parameters to BlurRect aren't used > anywhere. They should probably be removed, since BlurRect doesn't really > support them. > > On 2013/02/12 20:34:04, Humper wrote: > > https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp > > File src/effects/SkBlurMask.cpp (right): > > > > > https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp#... > > src/effects/SkBlurMask.cpp:198: { > > To be honest the details of boxBlurInterp are a little opaque to me, since I > > didn't write it. Maybe we can open a new issue for Steven to document this > code > > a little? > > > > On 2013/02/12 15:14:35, reed1 wrote: > > > I know its not new to this CL, but it is not at all clear what math-space we > > are > > > dealing with in this function. I see byte >> 7 which is just adding the > > high-bit > > > for some reason, and byte << 8 / kernel, etc. Can this be described in a > short > > > block-comment somewhere? > > > > > https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp#... > > src/effects/SkBlurMask.cpp:1161: static inline unsigned int profile_lookup( > > unsigned int *profile, int loc, int blurred_width, int sharp_width ) { > > Not sure -- I'll use the Skia one. > > > > On 2013/02/12 15:14:35, reed1 wrote: > > > is this abs() call getting inlined for sure? If not, we have SkAbs32() which > > > does. > > > > > https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp#... > > src/effects/SkBlurMask.cpp:1163: int ox = dx >> 1; > > On 2013/02/12 15:14:35, reed1 wrote: > > > if (...) { > > > } > > > > Done. > > > > > https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp#... > > src/effects/SkBlurMask.cpp:1238: for (int y = 0 ; y < dstHeight ; ++y) > > On 2013/02/12 15:14:35, reed1 wrote: > > > for (...) { > > > > Done. > > > > https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.h > > File src/effects/SkBlurMask.h (right): > > > > > https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.h#ne... > > src/effects/SkBlurMask.h:33: SkIPoint *margin = NULL); > > In this case it's the ground truth gaussian blur that I was using for > debugging. > > I don't think we should remove it from the interface because it's really > useful > > to have around should things go wonky in the future. I'll rename it to > > something like BlurGroundTruth or something > > > > On 2013/02/12 15:14:35, reed1 wrote: > > > // What does "Simple" mean here? Why no Quality param? Why is radius renamed > > to > > > provided_radius? > > > > > > In general, even though this header is "private", I think we should try to > > make > > > its interface clean, consistent, and somewhat clear/documented. (not just > > > limited to this CL).
Sign in to reply to this message.
You're also going to need to wrap the SkBlurMask changes in an #ifdef, so that we can safely rebaseline WebKit. https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (right): https://codereview.appspot.com/7307076/diff/13002/src/effects/SkBlurMask.cpp#... src/effects/SkBlurMask.cpp:198: { On 2013/02/12 20:34:04, Humper wrote: > To be honest the details of boxBlurInterp are a little opaque to me, since I > didn't write it. This is the reason your changes are rather concerning to me: it feels like you've hacked it until it works, rather than providing conclusive evidence that the solutions are correct. > Maybe we can open a new issue for Steven to document this code > a little? > > On 2013/02/12 15:14:35, reed1 wrote: > > I know its not new to this CL, but it is not at all clear what math-space we > are > > dealing with in this function. I see byte >> 7 which is just adding the > high-bit > > for some reason, and byte << 8 / kernel, etc. Can this be described in a short > > block-comment somewhere? > Sure. At a minimum, I could provide a commented-out, non-unrolled version of the code broken out into the two cases (width < diameter, width >= diameter), as I did for the non-interp case, which might make it clearer. https://codereview.appspot.com/7307076/diff/13003/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (right): https://codereview.appspot.com/7307076/diff/13003/src/effects/SkBlurMask.cpp#... src/effects/SkBlurMask.cpp:32: static int boxBlur(const uint16_t* src, int src_y_stride, uint16_t* dst, I'm still not convinced that 16 bit is required here. If the box blur is not producing all-white on a given pass, it won't be all-white just by increasing the precision in the intermediate buffers (ie., in 8bit, between passes it's either 255 or it's not.) For N white pixels, the box blur ends up doing (N * 255) / N), which should always be 255 (up to overflow which would show up as a much worse problem!) However, the division is done with an approximation, and as you've pointed out with the rounding fix, may not result in 255. If rounding fixes that part, then it should be fine to use 8bit in the intermediate buffers. We could try substituting an actual integer divide (temporarily), just to verify that the division approximation is to blame.
Sign in to reply to this message.
Sign in to reply to this message.
Stephen, since you understand the current code better, do you have an alternate fix for the 255 issue that keeps the 8bit buffers? If so, perhaps we might land that first, and then greg could build on top of that. That could be nicer, in that we have two separate changes that only do one thing each.
Sign in to reply to this message.
On 2013/02/15 17:01:55, reed1 wrote: > Stephen, since you understand the current code better, do you have an alternate > fix for the 255 issue that keeps the 8bit buffers? If so, perhaps we might land > that first, and then greg could build on top of that. That could be nicer, in > that we have two separate changes that only do one thing each. OK, I'll give it a shot.
Sign in to reply to this message.
There seem to be a lot of changes in SkBlurMask.cpp that are just renamings (e.g. outer_scale -> outerScale). This adds a bit of labor for the reviewer. Are these useful? https://codereview.appspot.com/7307076/diff/21001/bench/BlurRectBench.cpp File bench/BlurRectBench.cpp (right): https://codereview.appspot.com/7307076/diff/21001/bench/BlurRectBench.cpp#new... bench/BlurRectBench.cpp:189: Wow, do we need/want to run 40 extra benches on every CL? Can we perform fewer and still (reasonably) catch future regressions? https://codereview.appspot.com/7307076/diff/21001/gm/blurrect.cpp File gm/blurrect.cpp (right): https://codereview.appspot.com/7307076/diff/21001/gm/blurrect.cpp#newcode311 gm/blurrect.cpp:311: Wow, that's a lot of new GMs. Do we need so many to catch regressions in the future? https://codereview.appspot.com/7307076/diff/21001/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (right): https://codereview.appspot.com/7307076/diff/21001/src/effects/SkBlurMask.cpp#... src/effects/SkBlurMask.cpp:533: uint32_t tmp = sum[px+py] + sum[nx+ny] - sum[nx+py] - sum[px+ny]; Do we not need to add half before the shift, as we've just recently added in other (similar?) places?
Sign in to reply to this message.
I'll pare down the tests / benches. As for the style changes, I made those in response to some feedback I got from Robert. I don't mind reverting them; was just trying to proactively converge to a single style, but perhaps those should be done as separate CLs. https://codereview.appspot.com/7307076/diff/21001/bench/BlurRectBench.cpp File bench/BlurRectBench.cpp (right): https://codereview.appspot.com/7307076/diff/21001/bench/BlurRectBench.cpp#new... bench/BlurRectBench.cpp:189: Whoops, I meant to pare these down; these were done for making those graphs. Will fix. On 2013/02/20 14:28:24, reed1 wrote: > Wow, do we need/want to run 40 extra benches on every CL? Can we perform fewer > and still (reasonably) catch future regressions? https://codereview.appspot.com/7307076/diff/21001/gm/blurrect.cpp File gm/blurrect.cpp (right): https://codereview.appspot.com/7307076/diff/21001/gm/blurrect.cpp#newcode311 gm/blurrect.cpp:311: No, we can pare these down. They were done this way for visual comparisons. I'll fix. On 2013/02/20 14:28:24, reed1 wrote: > Wow, that's a lot of new GMs. Do we need so many to catch regressions in the > future? https://codereview.appspot.com/7307076/diff/21001/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (right): https://codereview.appspot.com/7307076/diff/21001/src/effects/SkBlurMask.cpp#... src/effects/SkBlurMask.cpp:533: uint32_t tmp = sum[px+py] + sum[nx+ny] - sum[nx+py] - sum[px+ny]; We may very well - I didn't write / use this code :) On 2013/02/20 14:28:24, reed1 wrote: > Do we not need to add half before the shift, as we've just recently added in > other (similar?) places?
Sign in to reply to this message.
https://codereview.appspot.com/7307076/diff/21001/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (right): https://codereview.appspot.com/7307076/diff/21001/src/effects/SkBlurMask.cpp#... src/effects/SkBlurMask.cpp:533: uint32_t tmp = sum[px+py] + sum[nx+ny] - sum[nx+py] - sum[px+ny]; On 2013/02/20 14:33:35, Humper wrote: > We may very well - I didn't write / use this code :) > > On 2013/02/20 14:28:24, reed1 wrote: > > Do we not need to add half before the shift, as we've just recently added in > > other (similar?) places? > Ah. Perhaps we just add a // TODO comment asking that question, and leave it alone for this CL.
Sign in to reply to this message.
On 2013/02/20 14:40:56, reed1 wrote: > Ah. Perhaps we just add a // TODO comment asking that question, and leave it > alone for this CL. Should I also revert the style changes?
Sign in to reply to this message.
https://codereview.appspot.com/7307076/diff/21001/src/effects/SkBlurMask.cpp File src/effects/SkBlurMask.cpp (right): https://codereview.appspot.com/7307076/diff/21001/src/effects/SkBlurMask.cpp#... src/effects/SkBlurMask.cpp:533: uint32_t tmp = sum[px+py] + sum[nx+ny] - sum[nx+py] - sum[px+ny]; On 2013/02/20 14:40:56, reed1 wrote: > On 2013/02/20 14:33:35, Humper wrote: > > We may very well - I didn't write / use this code :) > > > > On 2013/02/20 14:28:24, reed1 wrote: > > > Do we not need to add half before the shift, as we've just recently added in > > > other (similar?) places? > > > > Ah. Perhaps we just add a // TODO comment asking that question, and leave it > alone for this CL. We could do the rounding here too, but this is the old (full summed area table, not separable) implementation, which isn't enabled anywhere anymore (skia or Chrome). I wanted to keep it around until the separable impl had shipped in stable and was proven not to be problematic, and then we could remove it.
Sign in to reply to this message.
On 2013/02/20 14:47:57, Humper wrote: > On 2013/02/20 14:40:56, reed1 wrote: > > Ah. Perhaps we just add a // TODO comment asking that question, and leave it > > alone for this CL. > > Should I also revert the style changes? ok, tests and GM's pared down
Sign in to reply to this message.
You have a note in your description about needing a guard for layoutests. Should you replace that with a doc about your IGNORE_CORRECTNESS flag? lgtm
Sign in to reply to this message.
Soon we should consider splitting some of this into a separate impl file (as we finally did for all the different backends for gradients). SkBlurRect.cpp is mighty in stature...
Sign in to reply to this message.
|