Issue 7058077: Adding SSE2 Matrix-Matrix multiply to SetConcat (when in X64 mode).

Can't Edit
Can't Publish+Mail
Start Review

Created:
12 years, 6 months ago by whunt

Modified:
12 years, 2 months ago

Reviewers:
jamesr, reed1, whunt1, TomH

CC:
skia-review_googlegroups.com, mike3, reed1

Base URL:
http://skia.googlecode.com/svn/trunk/

Visibility:
Public.

More Reviews

Description

Adding SSE2 Matrix-Matrix multiply to SetConcat (when in X64 mode). Branch occurs at compile time rather than run time. Performance goes from 3.25 units to 1.87 units.

Patch Set 6 : #

Created: 12 years, 6 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+107 lines, -4 lines)			Patch
M	bench/Matrix44Bench.cpp	View	1 2	2 chunks	+23 lines, -0 lines	0 comments	Download
M	include/utils/SkMatrix44.h	View	1 2 3 4	3 chunks	+15 lines, -2 lines	0 comments	Download
M	src/utils/SkMatrix44.cpp	View	1 2 3 4 5	3 chunks	+69 lines, -2 lines	0 comments	Download

Messages

Total messages: 29

Expand All Messages | Collapse All Messages

whunt

https://codereview.appspot.com/7058077/diff/1/bench/Matrix44Bench.cpp File bench/Matrix44Bench.cpp (right): https://codereview.appspot.com/7058077/diff/1/bench/Matrix44Bench.cpp#newcode133 bench/Matrix44Bench.cpp:133: fM1.set(2, 0, 3.0f); Added these to make the benchmark ...

12 years, 6 months ago (2013-01-10 19:19:29 UTC) #1

reed1

1. do we already have adequate coverage in unittests to feel that we're getting the ...

12 years, 6 months ago (2013-01-10 19:28:03 UTC) #2

reed1

https://codereview.appspot.com/7058077/diff/1/src/utils/SkMatrix44.cpp File src/utils/SkMatrix44.cpp (right): https://codereview.appspot.com/7058077/diff/1/src/utils/SkMatrix44.cpp#newcode438 src/utils/SkMatrix44.cpp:438: p->w_zw = b0 * x_zw + b1 * y_zw ...

12 years, 6 months ago (2013-01-10 19:30:21 UTC) #3

whunt

On 2013/01/10 19:28:03, reed1 wrote: > 1. do we already have adequate coverage in unittests ...

12 years, 6 months ago (2013-01-10 22:11:31 UTC) #4

whunt

On 2013/01/10 19:30:21, reed1 wrote: > https://codereview.appspot.com/7058077/diff/1/src/utils/SkMatrix44.cpp > File src/utils/SkMatrix44.cpp (right): > > https://codereview.appspot.com/7058077/diff/1/src/utils/SkMatrix44.cpp#newcode438 > ...

12 years, 6 months ago (2013-01-10 22:14:41 UTC) #5

whunt

https://codereview.appspot.com/7058077/diff/6001/src/utils/SkMatrix44.cpp File src/utils/SkMatrix44.cpp (right): https://codereview.appspot.com/7058077/diff/6001/src/utils/SkMatrix44.cpp#newcode395 src/utils/SkMatrix44.cpp:395: sk_bzero(result, sizeof(storage)); This change *over doubled* the performance of ...

12 years, 6 months ago (2013-01-10 22:18:03 UTC) #6

reed1

On 2013/01/10 22:14:41, whunt wrote: > On 2013/01/10 19:30:21, reed1 wrote: > > https://codereview.appspot.com/7058077/diff/1/src/utils/SkMatrix44.cpp > ...

12 years, 6 months ago (2013-01-10 22:20:23 UTC) #7

On 2013/01/10 22:14:41, whunt wrote:
> On 2013/01/10 19:30:21, reed1 wrote:
> > https://codereview.appspot.com/7058077/diff/1/src/utils/SkMatrix44.cpp
> > File src/utils/SkMatrix44.cpp (right):
> > 
> >
>
https://codereview.appspot.com/7058077/diff/1/src/utils/SkMatrix44.cpp#newcod...
> > src/utils/SkMatrix44.cpp:438: p->w_zw = b0 * x_zw + b1 * y_zw + b2 * z_zw +
b3
> *
> > w_zw;
> > need { } around this "if"
> > 
> >
>
https://codereview.appspot.com/7058077/diff/1/src/utils/SkMatrix44.cpp#newcod...
> > src/utils/SkMatrix44.cpp:440: memcpy(fMat, &storageSSE2,
sizeof(storageSSE2));
> > Can we cleanly refactor so we don't have to repeat this dirty+return code?
> > 
> > Related, I wonder if its acceptable to jump to this (or the old C version)
via
> a
> > function ptr. If so, ...
> > 1. it would be easier to read
> > 2. it would allow us to access the SSE2 code on 32bit machines where we
don't
> > know at compile-time that we have SSE2, but we do detect that at runtime.
Our
> > blitter code does a *lot* of this sort of thing.
> 
> We could certainly do a jump.  This would however add a performance penalty to
> all calls to setconcat in all cases.  Personally I find matrix math to be a
> primitive and therefore should be fast, but SkMatrix44 is neither primitive
nor
> fast so in this case I really don't have an opinion.  It's less work for us if
> we keep it the way it is and it doesn't impose a penalty.
> 
> Note:  I'm doing a study to see what % of machines have SSE2.  (Intel started
> making SSE2 machines in 2000, so it's been 13 years).  If the chrome base is
> anything like the steam hardware survey base it'll be over 99.5% and I'll
> petition to cut the tail and just assume SSE2 always.

This is just a product issue for Chrome on Windows. I'd be very happy if we can
convince them to let us set SSE2 at compile-time, but today that is not the
case.

jamesr

On 2013/01/10 22:20:23, reed1 wrote: > On 2013/01/10 22:14:41, whunt wrote: > > On 2013/01/10 ...

12 years, 6 months ago (2013-01-10 22:23:29 UTC) #9

On 2013/01/10 22:20:23, reed1 wrote:
> On 2013/01/10 22:14:41, whunt wrote:
> > On 2013/01/10 19:30:21, reed1 wrote:
> > > https://codereview.appspot.com/7058077/diff/1/src/utils/SkMatrix44.cpp
> > > File src/utils/SkMatrix44.cpp (right):
> > > 
> > >
> >
>
https://codereview.appspot.com/7058077/diff/1/src/utils/SkMatrix44.cpp#newcod...
> > > src/utils/SkMatrix44.cpp:438: p->w_zw = b0 * x_zw + b1 * y_zw + b2 * z_zw
+
> b3
> > *
> > > w_zw;
> > > need { } around this "if"
> > > 
> > >
> >
>
https://codereview.appspot.com/7058077/diff/1/src/utils/SkMatrix44.cpp#newcod...
> > > src/utils/SkMatrix44.cpp:440: memcpy(fMat, &storageSSE2,
> sizeof(storageSSE2));
> > > Can we cleanly refactor so we don't have to repeat this dirty+return code?
> > > 
> > > Related, I wonder if its acceptable to jump to this (or the old C version)
> via
> > a
> > > function ptr. If so, ...
> > > 1. it would be easier to read
> > > 2. it would allow us to access the SSE2 code on 32bit machines where we
> don't
> > > know at compile-time that we have SSE2, but we do detect that at runtime.
> Our
> > > blitter code does a *lot* of this sort of thing.
> > 
> > We could certainly do a jump.  This would however add a performance penalty
to
> > all calls to setconcat in all cases.  Personally I find matrix math to be a
> > primitive and therefore should be fast, but SkMatrix44 is neither primitive
> nor
> > fast so in this case I really don't have an opinion.  It's less work for us
if
> > we keep it the way it is and it doesn't impose a penalty.
> > 
> > Note:  I'm doing a study to see what % of machines have SSE2.  (Intel
started
> > making SSE2 machines in 2000, so it's been 13 years).  If the chrome base is
> > anything like the steam hardware survey base it'll be over 99.5% and I'll
> > petition to cut the tail and just assume SSE2 always.
> 
> This is just a product issue for Chrome on Windows. I'd be very happy if we
can
> convince them to let us set SSE2 at compile-time, but today that is not the
> case.

You can depend on SSE2 at compile time on x64 always for Chrome on anything
(it's tautologically there)

reed1

I presume you meant 16 bytes, not bits. Doesn't that mean it also matters how ...

12 years, 6 months ago (2013-01-10 22:23:42 UTC) #10

reed1

We will need to turn the static_assert into an #ifdef, since we can build SkMatrix44 ...

12 years, 6 months ago (2013-01-10 22:25:51 UTC) #11

whunt

On 2013/01/10 22:23:29, jamesr wrote: > On 2013/01/10 22:20:23, reed1 wrote: > > On 2013/01/10 ...

12 years, 6 months ago (2013-01-10 22:26:04 UTC) #12

On 2013/01/10 22:23:29, jamesr wrote:
> On 2013/01/10 22:20:23, reed1 wrote:
> > On 2013/01/10 22:14:41, whunt wrote:
> > > On 2013/01/10 19:30:21, reed1 wrote:
> > > > https://codereview.appspot.com/7058077/diff/1/src/utils/SkMatrix44.cpp
> > > > File src/utils/SkMatrix44.cpp (right):
> > > > 
> > > >
> > >
> >
>
https://codereview.appspot.com/7058077/diff/1/src/utils/SkMatrix44.cpp#newcod...
> > > > src/utils/SkMatrix44.cpp:438: p->w_zw = b0 * x_zw + b1 * y_zw + b2 *
z_zw
> +
> > b3
> > > *
> > > > w_zw;
> > > > need { } around this "if"
> > > > 
> > > >
> > >
> >
>
https://codereview.appspot.com/7058077/diff/1/src/utils/SkMatrix44.cpp#newcod...
> > > > src/utils/SkMatrix44.cpp:440: memcpy(fMat, &storageSSE2,
> > sizeof(storageSSE2));
> > > > Can we cleanly refactor so we don't have to repeat this dirty+return
code?
> > > > 
> > > > Related, I wonder if its acceptable to jump to this (or the old C
version)
> > via
> > > a
> > > > function ptr. If so, ...
> > > > 1. it would be easier to read
> > > > 2. it would allow us to access the SSE2 code on 32bit machines where we
> > don't
> > > > know at compile-time that we have SSE2, but we do detect that at
runtime.
> > Our
> > > > blitter code does a *lot* of this sort of thing.
> > > 
> > > We could certainly do a jump.  This would however add a performance
penalty
> to
> > > all calls to setconcat in all cases.  Personally I find matrix math to be
a
> > > primitive and therefore should be fast, but SkMatrix44 is neither
primitive
> > nor
> > > fast so in this case I really don't have an opinion.  It's less work for
us
> if
> > > we keep it the way it is and it doesn't impose a penalty.
> > > 
> > > Note:  I'm doing a study to see what % of machines have SSE2.  (Intel
> started
> > > making SSE2 machines in 2000, so it's been 13 years).  If the chrome base
is
> > > anything like the steam hardware survey base it'll be over 99.5% and I'll
> > > petition to cut the tail and just assume SSE2 always.
> > 
> > This is just a product issue for Chrome on Windows. I'd be very happy if we
> can
> > convince them to let us set SSE2 at compile-time, but today that is not the
> > case.
> 
> You can depend on SSE2 at compile time on x64 always for Chrome on anything
> (it's tautologically there)

Yep, I understand we can't make that assumption yet.  I'm hoping that we can
sometime in the next year or two though.  First step is to collect data showing
that only a very small fraction of users doesn't have SSE2 so I'm adding an ISA
histogram.  SSE2 is defined in the AMD64 spec so yes it's always available in 64
bit mode on all intel/amd platforms.

jamesr

On 2013/01/10 22:23:42, reed1 wrote: > I presume you meant 16 bytes, not bits. > ...

12 years, 6 months ago (2013-01-10 22:26:57 UTC) #13

whunt

On 2013/01/10 22:23:42, reed1 wrote: > I presume you meant 16 bytes, not bits. > ...

12 years, 6 months ago (2013-01-10 22:27:58 UTC) #14

whunt

On 2013/01/10 22:25:51, reed1 wrote: > We will need to turn the static_assert into an ...

12 years, 6 months ago (2013-01-10 22:28:28 UTC) #15

whunt

On 2013/01/10 22:25:51, reed1 wrote: > We will need to turn the static_assert into an ...

12 years, 6 months ago (2013-01-10 22:29:45 UTC) #16

TomH

On 2013/01/10 22:26:04, whunt wrote: > On 2013/01/10 22:23:29, jamesr wrote: > > You can ...

12 years, 6 months ago (2013-01-11 10:08:47 UTC) #17

whunt1

All Intel Atom (used in mobile Intel devices) processors support SSE, SSE2, SSE3 and SSE3E ...

12 years, 6 months ago (2013-01-11 18:14:01 UTC) #19

whunt

On 2013/01/11 18:14:01, whunt_google.com wrote: > All Intel Atom (used in mobile Intel devices) processors ...

12 years, 6 months ago (2013-01-11 18:52:40 UTC) #20

TomH

On 2013/01/11 18:52:40, whunt wrote: > For skia, what's the process to go from LGTM ...

12 years, 6 months ago (2013-01-14 09:57:19 UTC) #21

reed1

https://codereview.appspot.com/7058077/diff/10001/include/utils/SkMatrix44.h File include/utils/SkMatrix44.h (right): https://codereview.appspot.com/7058077/diff/10001/include/utils/SkMatrix44.h#newcode60 include/utils/SkMatrix44.h:60: Lets consider consolidating the build flag tests into a ...

12 years, 6 months ago (2013-01-14 14:53:00 UTC) #22

whunt1

On 2013/01/14 14:53:00, reed1 wrote: > https://codereview.appspot.com/7058077/diff/10001/include/utils/SkMatrix44.h > File include/utils/SkMatrix44.h (right): > > https://codereview.appspot.com/7058077/diff/10001/include/utils/SkMatrix44.h#newcode60 > ...

12 years, 6 months ago (2013-01-14 18:17:01 UTC) #23

reed1

lgtm we have other examples where we name the test more explicitly (e.g. _scaletrans instead ...

12 years, 6 months ago (2013-01-14 18:21:36 UTC) #24

TomH

Unfortunately, when I run out/Debug/tests on Linux with this patch installed, I get: [42/85] Matrix44... ...

12 years, 6 months ago (2013-01-15 11:47:22 UTC) #25

reed1

On 2013/01/15 11:47:22, TomH wrote: > Unfortunately, when I run out/Debug/tests on Linux with this ...

12 years, 6 months ago (2013-01-15 13:17:47 UTC) #26

TomH

Mike reduced the stringency of our size checks, which avoided the assertion failure. Landed as ...

12 years, 6 months ago (2013-01-17 12:18:41 UTC) #27

TomH

Nope, can't close it. Worked for me in Ubuntu Debug, but tests fail on most ...

12 years, 6 months ago (2013-01-17 13:34:16 UTC) #28

Expand All Messages | Collapse All Messages