Hi Mike, here is a sketch of my work in progress. I am still experimenting ...
12 years, 5 months ago
(2012-06-22 14:12:18 UTC)
#1
Hi Mike, here is a sketch of my work in progress. I am still experimenting with
benchmarks and looking at profiling data to squeeze out more cycles...
http://codereview.appspot.com/6339046/diff/1/src/core/SkPictureFlat.h
File src/core/SkPictureFlat.h (right):
http://codereview.appspot.com/6339046/diff/1/src/core/SkPictureFlat.h#newcode159
src/core/SkPictureFlat.h:159: if (a->fHash > b->fHash) return 1;
FYI: I tried doing a single memcmp that covers fHash and the data (which are
contiguous in memory), but it was significantly slower. There seem to be a lot
of fixed cost overhead in memcmp, which makes it inefficient for comparing small
chunks (at least on windows). Eventually I think we should have our own fast
alternative to memcmp in skia.
I like the idea (assuming it pays off in your benchmarks. I also want to ...
12 years, 5 months ago
(2012-06-22 15:02:46 UTC)
#2
I like the idea (assuming it pays off in your benchmarks. I also want to try
adding a hash-table infront of this, but I'd like to do that as a separate CL.
SkHash.h
1. Do we need to slow down the hash function by checking for
non-sizeof(HashType) multiples in the size. Seems like we should at least have a
version that asserts that invariant, and doesn't check at runtime (since our
callers at the moment all fall into that category.
2. Do we need the 64bit version of this?
3. Nit: I'd prefer a slightly more descriptive name, like SkComputeHash32, over
just SkHash32.
4. I'm sure your hash function is better than the one I used in FontDescriptor,
but it is also slower. Is the better one needed? If so, if it doesn't speed up
fonts, perhaps you can have ComputeHashFast and ComputeHashGood or some choice
in the API.
Regarding performance: I am geting about 30% speedup on the unique SkPaint test, and about ...
12 years, 5 months ago
(2012-06-22 15:19:51 UTC)
#4
Regarding performance: I am geting about 30% speedup on the unique SkPaint test,
and about 5% speedup for the test that uses 100 recurring SkPaints
On 2012/06/22 15:02:46, reed1 wrote:
> I like the idea (assuming it pays off in your benchmarks. I also want to try
> adding a hash-table infront of this, but I'd like to do that as a separate CL.
>
> SkHash.h
>
> 1. Do we need to slow down the hash function by checking for
> non-sizeof(HashType) multiples in the size. Seems like we should at least have
a
> version that asserts that invariant, and doesn't check at runtime (since our
> callers at the moment all fall into that category.
Will do.
> 2. Do we need the 64bit version of this?
That was for experimenting, I found that the 64-bit version computes slightly
faster. I think int the end I will uses the 64-bit method, but compress the
results into 32 by XORing the high and low halves together.
>
> 4. I'm sure your hash function is better than the one I used in
FontDescriptor,
> but it is also slower. Is the better one needed? If so, if it doesn't speed up
> fonts, perhaps you can have ComputeHashFast and ComputeHashGood or some choice
> in the API.
Actually my hash function is pretty much the same as the one that was in
SkDescriptor, with added handling for non-multiple sizes. For stronger hashing
(if we ever felt we needed it), we could add a DEPS on
http://code.google.com/p/cityhash/
On 2012/06/22 15:07:59, reed1 wrote: > Is Checksum a better name for this functionality than ...
12 years, 5 months ago
(2012-06-22 15:24:20 UTC)
#5
On 2012/06/22 15:07:59, reed1 wrote:
> Is Checksum a better name for this functionality than Hash?
Hash is more general. Checksum implies that it is used for data integrity
checking, and usually exhibit certain properties that allow to detect the type
of data error (e.g. number of erroneous bits).
Ah right. It looked more complicated, but I see it is close to the same ...
12 years, 5 months ago
(2012-06-22 15:26:20 UTC)
#6
Ah right. It looked more complicated, but I see it is close to the same (an
extra ^, needed?)
63bit: cool that its faster. We should assert that the passed-in address is
already 8-byte aligned... (same for the 32bit one)... or better yet (?) -- why
can't the template function take const HashType* instead of const void*?
(assuming cpus require 64bi alignment for 64bit reads...)
http://codereview.appspot.com/6339046/diff/1/include/core/SkHash.h
File include/core/SkHash.h (right):
http://codereview.appspot.com/6339046/diff/1/include/core/SkHash.h#newcode14
include/core/SkHash.h:14: template<typename HashType>
do we need const& for these, since HashType should always be a POD?
this does an extra ^ from the previous. Is that needed?
So I squeezed a lot of cyle out of this one... Performance evolved quite a ...
12 years, 5 months ago
(2012-06-22 20:58:16 UTC)
#8
So I squeezed a lot of cyle out of this one... Performance evolved quite a bit
since comment #4.
With the code in Patch Set 3, I am registering a 6x speedup for
picture_record_unique_paint_dictionary in bench.exe!
About Patch Set 4: I found a way to boost performance yet a bit more ...
12 years, 5 months ago
(2012-06-22 21:23:31 UTC)
#9
About Patch Set 4:
I found a way to boost performance yet a bit more by changing the Checksum to 64
bit in SkFlatData. Even though the benchmarks show that the current
implementation of SkComputeChecksum32 and SkComputeChecksum64 have identical
performance, it turns out that promoting fChecksum to uint64_t in SkPictureFlat
will nudge the alignment of the SkPictureFlat::data() to make it 128-bit
aligned, which ends up greatly boosting the performance of the memcpy that fills
in the data.
Great news on the performance. I'd still like to have the interface/behavior of the 64bit ...
12 years, 5 months ago
(2012-06-25 12:38:05 UTC)
#11
Great news on the performance.
I'd still like to have the interface/behavior of the 64bit checkum guy better
defined. Can we require that a compute routine that takes uint64 require that
its length be a multiple of 8? It seems that we're just asking the caller
(SkFlatData at the moment) to just pad its data to a mult of 8 (1 extra long
half the time).
What machine(s) did you run the bench on? I found pretty different 64bit
performance between linux (skia built for 64bit) and mac (skia built for 32bit).
http://codereview.appspot.com/6339046/diff/11001/include/core/SkChecksum.h
File include/core/SkChecksum.h (right):
http://codereview.appspot.com/6339046/diff/11001/include/core/SkChecksum.h#ne...
include/core/SkChecksum.h:61: // Handle trailing 32-bit chunk
:( Why is SkComputeChecksum64 accepting non-64bit-multiples in its size?
http://codereview.appspot.com/6339046/diff/11001/include/core/SkChecksum.h#ne...
include/core/SkChecksum.h:94: result ^= result >> 32;
why are we only returning the low 16bits?
On Mon, Jun 25, 2012 at 8:38 AM, <reed@google.com> wrote: > Great news on the ...
12 years, 5 months ago
(2012-06-26 13:30:21 UTC)
#12
On Mon, Jun 25, 2012 at 8:38 AM, <reed@google.com> wrote:
> Great news on the performance.
>
> I'd still like to have the interface/behavior of the 64bit checkum guy
> better defined. Can we require that a compute routine that takes uint64
> require that its length be a multiple of 8? It seems that we're just
> asking the caller (SkFlatData at the moment) to just pad its data to a
> mult of 8 (1 extra long half the time).
>
I agree, let's do that
>
> What machine(s) did you run the bench on? I found pretty different 64bit
> performance between linux (skia built for 64bit) and mac (skia built for
> 32bit).
>
I tested win32 builds on a Xeon E5520, with 64-bit windows. I was a little
surprised that the 64-bit version of the checksum calculation was fastest
even with in win32.
>
>
> http://codereview.appspot.com/**6339046/diff/11001/include/**
>
core/SkChecksum.h<http://codereview.appspot.com/6339046/diff/11001/include/core/SkChecksum.h>
> File include/core/SkChecksum.h (right):
>
> http://codereview.appspot.com/**6339046/diff/11001/include/**
>
core/SkChecksum.h#newcode61<http://codereview.appspot.com/6339046/diff/11001/include/core/SkChecksum.h#newcode61>
> include/core/SkChecksum.h:61: // Handle trailing 32-bit chunk
> :( Why is SkComputeChecksum64 accepting non-64bit-multiples in its size?
>
That was to allow it to be a drop-in replacement for a 32-bit checksum.
>
> http://codereview.appspot.com/**6339046/diff/11001/include/**
>
core/SkChecksum.h#newcode94<http://codereview.appspot.com/6339046/diff/11001/include/core/SkChecksum.h#newcode94>
> include/core/SkChecksum.h:94: result ^= result >> 32;
> why are we only returning the low 16bits?
>
Actually, what I was trying to do here is to the faster 64-bit checksum
into a 32-bit checksum API. That line XORs the high and low 32-bit parts of
a 64-bit result in order to produce a 32-bit checksum.
>
>
http://codereview.appspot.com/**6339046/<http://codereview.appspot.com/6339046/>
>
I think I will remove SK_PREFER_32BIT_CHECKSUM from SkChecksum, and just
put it at the call site instead. I think the code will be much clearer that
way, and it will allow me to get rid of some weird stuff.
Oh yeah, that was a mistake, I meant 00000000FFFFFFFFULL. But I will take that out, ...
12 years, 5 months ago
(2012-06-26 13:35:57 UTC)
#14
Oh yeah, that was a mistake, I meant 00000000FFFFFFFFULL. But I will take that
out, it's ugly. I will leave just pure 32-bit and 64-bit implementation.
http://codereview.appspot.com/6339046/diff/4004/include/config/SkUserConfig.h#newcode178 > include/config/SkUserConfig.h:178: */ > Which platforms will enable this? More global build flags are ...
12 years, 5 months ago
(2012-06-27 15:53:38 UTC)
#18
http://codereview.appspot.com/6339046/diff/4004/include/config/SkUserConfig.h...
> include/config/SkUserConfig.h:178: */
> Which platforms will enable this? More global build flags are yucky unless
> really needed.
I was thinking probably android, but I don't know how to test it locally. It
would be useful if we could get the RunBench buildbot step to print out
benchmark times.
>
> Since we use #if instead of #ifdef in the code, shouldn't we (somewhere) do
the
> following?
>
> #ifndef SK_PREFER...
> #define SK_PREFER... 0
> #endif
>
It's there, at the top of SkChecksum.h Is there a better place for it?
I think, since 1. we do the #ifndef check 2. its experiemental at the moment ...
12 years, 5 months ago
(2012-06-27 16:03:08 UTC)
#19
I think, since
1. we do the #ifndef check
2. its experiemental at the moment
we can move the flag completely into SkChecksum.h, and not edit SkUserConfig.h
at all.
On 2012/06/27 16:03:30, reed1 wrote: > and then remove it if we end up not ...
12 years, 5 months ago
(2012-06-27 17:53:15 UTC)
#21
On 2012/06/27 16:03:30, reed1 wrote:
> and then remove it if we end up not needing it on android
Done.
Also, I had to make minor changes to SkChunkAlloc to make it consistently spew
8-byte aligned allocations, in order to respect new alignment checks.
lgtm I wonder if we can make the SkChunkAlloc change cleaner. Seems like (other than ...
12 years, 5 months ago
(2012-06-27 17:59:30 UTC)
#22
lgtm
I wonder if we can make the SkChunkAlloc change cleaner. Seems like (other than
align8(size), we just want to ensure the sizeof(Block) is a multiple of 8. Can
we change one of its fields to be a union w/ uint64_t to achieve that?
On 2012/06/27 17:59:30, reed1 wrote: > lgtm > > I wonder if we can make ...
12 years, 5 months ago
(2012-06-27 18:31:56 UTC)
#24
On 2012/06/27 17:59:30, reed1 wrote:
> lgtm
>
> I wonder if we can make the SkChunkAlloc change cleaner. Seems like (other
than
> align8(size), we just want to ensure the sizeof(Block) is a multiple of 8. Can
> we change one of its fields to be a union w/ uint64_t to achieve that?
It is trickier than that because not all platforms force int64 to be 64-bit
aligned. (e.g. win32).
Here is a portable solution, but I am not sure I like it:
struct SkChunkAlloc::Block {
struct Data {
Block* fNext;
size_t fFreeSize;
char* fFreePtr;
};
union {
Data fData;
int64_t fPadding[(sizeof(Data) + 7) >> 3];
};
Issue 6339046: Adding checksum to SkFlatData to accelerate SkPicture recording
(Closed)
Created 12 years, 5 months ago by junov1
Modified 12 years, 5 months ago
Reviewers: reed1
Base URL: http://skia.googlecode.com/svn/trunk/
Comments: 14