Issue 7088044: gm: use new Expectations struct instead of comparison bitmaps

epoger

This will allow us to compare against more complex expectations than one bitmap per test ...

12 years, 9 months ago (2013-01-11 07:58:09 UTC) #1

This will allow us to compare against more complex expectations than one bitmap
per test (multiple allowed checksums, whether to ignore failures per-test, etc.)

For now, though, it still only compares against a single bitmap per test, read
from disk.  But adding additional implementations of ExpectationsSource will be
easy after this CL...

context: https://goto.google.com/ImprovingTheSkiaRebaseliningProcess

https://codereview.appspot.com/7088044/diff/1/gm/gm_expectations.h
File gm/gm_expectations.h (right):

https://codereview.appspot.com/7088044/diff/1/gm/gm_expectations.h#newcode20
gm/gm_expectations.h:20: // The actual type we use to represent a checksum is
hidden in here.
(in case we want to switch it to be an MD5 hash string instead of a uint64_t,
for example)

https://codereview.appspot.com/7088044/diff/1/gm/gm_expectations.h#newcode26
gm/gm_expectations.h:26: static SkString make_filename(const char path[],
moved from gmmain.cpp

https://codereview.appspot.com/7088044/diff/1/gm/gm_expectations.h#newcode80
gm/gm_expectations.h:80: class IndividualImageExpectationsSource : public
ExpectationsSource {
and coming soon:

class JsonExpectationsSource : public ExpectationsSource

https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp
File gm/gmmain.cpp (left):

https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp#oldcode569
gm/gmmain.cpp:569: // both NULL (and thus no images are read from or written to
disk).
This is why I split this method into two:

compare_test_results_to_stored_expectations()
  similar to handle_test_results() for...
  - non-NULL readPath
  - blank renderModeDescriptor
  - NULL referenceBitmap

compare_test_results_to_reference_bitmap()
  similar to handle_test_results() for...
  - NULL readPath
  - non-blank renderModeDescriptor
  - non-NULL referenceBitmap

https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp
File gm/gmmain.cpp (right):

https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp#newcode467
gm/gmmain.cpp:467: * JSON summary.  We may wish to change that.
Setting the default to TRUE would make the JSON summary a whole lot bigger,
which would be a mixed blessing.

Pro: we would see exactly which drawing methods (RTree, tiled, etc.) ran for
each test, and verify that they all generated the same checksums.  (Multiple
drawing methods are already verified now, but currently you can't tell which
comparisons actually occurred by looking at the JSON summary or stdout.)

Con: bigger file, more noise.

https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp#newcode477
gm/gmmain.cpp:477: const char* completeName = completeNameString.c_str();
for example:

baseNameString = "8888/dashing2" or "comparison/dashing2"
completeNameString = "comparison/dashing2-pipe" or "comparison/dashing2-replay"

https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again...
File
gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt
(right):

https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again...
gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt:15:
"ignore-failure" : false
Changed this to be more consistent with the existing stdout, which yields:

reading from gm/tests/inputs/empty-dir
writing to gm/tests/outputs/compared-against-empty-dir/output-actual/images
drawing... dashing2 [640 480]
FAILED to read gm/tests/inputs/empty-dir/8888/dashing2.png
Ran 1 tests: 0 passed, 0 failed, 1 missing reference images

(I figured it made more sense for this to show up in "no-comparison" rather than
"failure-ignored".)

In the long run, if we want to keep both stdout summary and JSON report, maybe
we should make the result types the same in both.  Right now they are:

JSON:
- failed
- failure-ignored
- no-comparison
- succeeded

stdout:
- passed
- failed
- missing reference images

Sign in to reply to this message.

EricB

https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp File gm/gmmain.cpp (right): https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp#newcode489 gm/gmmain.cpp:489: } Is there any reason for the Expectations object ...

12 years, 9 months ago (2013-01-11 15:04:06 UTC) #2

https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp
File gm/gmmain.cpp (right):

https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp#newcode489
gm/gmmain.cpp:489: }
Is there any reason for the Expectations object not to perform this test?  ie.
Add:
bool Expectations::matchesExpectation(const Checksum&)

then expectations.fAllowedChecksums could be private.

https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again...
File
gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt
(right):

https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again...
gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt:15:
"ignore-failure" : false
On 2013/01/11 07:58:09, epoger wrote:
> Changed this to be more consistent with the existing stdout, which yields:
> 
> reading from gm/tests/inputs/empty-dir
> writing to gm/tests/outputs/compared-against-empty-dir/output-actual/images
> drawing... dashing2 [640 480]
> FAILED to read gm/tests/inputs/empty-dir/8888/dashing2.png
> Ran 1 tests: 0 passed, 0 failed, 1 missing reference images
> 
> (I figured it made more sense for this to show up in "no-comparison" rather
than
> "failure-ignored".)
> 
> In the long run, if we want to keep both stdout summary and JSON report, maybe
> we should make the result types the same in both.  Right now they are:
> 
> JSON:
> - failed
> - failure-ignored
> - no-comparison
> - succeeded
> 
> stdout:
> - passed
> - failed
> - missing reference images

What if no-comparison and "missing reference images" became missing-expectation?

Also, how often do we expect to use the failure-ignored case?  Would it be okay
for GM to consider those as failures, write the images, and then let the
CompareGMs step read the expected-results to determine whether it should ignore
each failure?  Then we could remove failure-ignored altogether.

Just curious: Is there any reason why dashes are used here while underscores
seem to be the norm elsewhere?

Sign in to reply to this message.

epoger

Thanks, Eric... https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp File gm/gmmain.cpp (right): https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp#newcode489 gm/gmmain.cpp:489: } On 2013/01/11 15:04:06, EricB wrote: > ...

12 years, 9 months ago (2013-01-11 17:38:54 UTC) #4

Thanks, Eric...

https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp
File gm/gmmain.cpp (right):

https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp#newcode489
gm/gmmain.cpp:489: }
On 2013/01/11 15:04:06, EricB wrote:
> Is there any reason for the Expectations object not to perform this test?  ie.
> Add:
> bool Expectations::matchesExpectation(const Checksum&)
> 
> then expectations.fAllowedChecksums could be private.

We could do that... but we would still need a way to read out the whole set of
allowedChecksums for the JSON expectations summary (line 565 below).

To some extent, fAllowedChecksums is reasonably well encapsulated because it is
a const SkTArray.  Once the Expectations object is created, nobody will be able
to modify that list.

If you think it would be better, I could create Expectations::matches(const
Checksum &), and call it here, but still walk through the array at line 565.

OR, I could add Expectations::matches(const Checksum &) and
Expectations::getAllowedChecksumsAsJsonArray(), and make fAllowedChecksums
private...

What do you think would be best?  I'm pretty open.

https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again...
File
gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt
(right):

https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again...
gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt:15:
"ignore-failure" : false
On 2013/01/11 15:04:06, EricB wrote:
> What if no-comparison and "missing reference images" became
missing-expectation?

I LIKE IT!

If it's OK with you, I'd like to make that change in a future CL (this one is
big enough already)

> Also, how often do we expect to use the failure-ignored case?  Would it be
okay
> for GM to consider those as failures, write the images, and then let the
> CompareGMs step read the expected-results to determine whether it should
ignore
> each failure?  Then we could remove failure-ignored altogether.

My goal is for the CompareGMs step to be combined into the GenerateGMs step. 
One call to the gm binary that generates images, compares them against
expectations, and writes any mismatching ones to disk for human examination.

The most common failure case we would want to ignore is missing-expectation. 
But there is certainly also the situation of "this test used to work, and we
want the buildbots to go green so we notice other regressions while we figure it
out or rebaseline it".  So I definitely think we want to support a notation like
"here is the expected image; if it fails, show me the actual image but don't
turn the build red".

Given that we want to support failure-ignored, I think we want to specify it
within the JSON expectations alongside the checksums themselves... and given
that fact, it seems simplest to implement the should-I-ignore-this-failure logic
within gm.

> Just curious: Is there any reason why dashes are used here while underscores
> seem to be the norm elsewhere?

You mean in other JSON files used by our project?  Or other JSON files in
general?  It's fine with me to make it more consistent with other JSON files. 
Please show me the other JSON files we should become consistent with, and we can
do that in a separate CL.  Sound OK?

Sign in to reply to this message.

EricB

https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp File gm/gmmain.cpp (right): https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp#newcode489 gm/gmmain.cpp:489: } On 2013/01/11 17:38:54, epoger wrote: > On 2013/01/11 ...

12 years, 9 months ago (2013-01-11 18:39:59 UTC) #5

https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp
File gm/gmmain.cpp (right):

https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp#newcode489
gm/gmmain.cpp:489: }
On 2013/01/11 17:38:54, epoger wrote:
> On 2013/01/11 15:04:06, EricB wrote:
> > Is there any reason for the Expectations object not to perform this test? 
ie.
> > Add:
> > bool Expectations::matchesExpectation(const Checksum&)
> > 
> > then expectations.fAllowedChecksums could be private.
> 
> We could do that... but we would still need a way to read out the whole set of
> allowedChecksums for the JSON expectations summary (line 565 below).
> 
> To some extent, fAllowedChecksums is reasonably well encapsulated because it
is
> a const SkTArray.  Once the Expectations object is created, nobody will be
able
> to modify that list.
> 
> If you think it would be better, I could create Expectations::matches(const
> Checksum &), and call it here, but still walk through the array at line 565.
> 
> OR, I could add Expectations::matches(const Checksum &) and
> Expectations::getAllowedChecksumsAsJsonArray(), and make fAllowedChecksums
> private...
> 
> What do you think would be best?  I'm pretty open.

I like the idea of keeping the array hidden, but I guess it's a trade-off as to
whether you want to introduce the JSON business into that class.

https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again...
File
gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt
(right):

https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again...
gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt:15:
"ignore-failure" : false
On 2013/01/11 17:38:54, epoger wrote:
> On 2013/01/11 15:04:06, EricB wrote:
> > What if no-comparison and "missing reference images" became
> missing-expectation?
> 
> I LIKE IT!
> 
> If it's OK with you, I'd like to make that change in a future CL (this one is
> big enough already)
>  

SGTM

> > Also, how often do we expect to use the failure-ignored case?  Would it be
> okay
> > for GM to consider those as failures, write the images, and then let the
> > CompareGMs step read the expected-results to determine whether it should
> ignore
> > each failure?  Then we could remove failure-ignored altogether.
> 
> My goal is for the CompareGMs step to be combined into the GenerateGMs step. 
> One call to the gm binary that generates images, compares them against
> expectations, and writes any mismatching ones to disk for human examination.
> 

I'm not a fan of this idea.  I think it's nice that we are currently able to see
the RunGM step succeed and the CompareGMs step fail, rather than having
everything fail together.  Sure, you can drill down into the log to figure out
what happened, but that's a lot more work for me if, for example, I made a
change that might cause GM to crash but wouldn't have affected the pixels of any
images, where now I could easily see that it was the comparison which failed and
it's probably someone else's fault.

For the record, I think it's a bug that CompareAndUploadWebpageGMs is one step,
for the same reason.  Uploads may fail due to all sorts of flakiness, but the
comparison should be pretty stable.  Having them lumped together causes
confusion on failure.

> The most common failure case we would want to ignore is missing-expectation. 
> But there is certainly also the situation of "this test used to work, and we
> want the buildbots to go green so we notice other regressions while we figure
it
> out or rebaseline it".  So I definitely think we want to support a notation
like
> "here is the expected image; if it fails, show me the actual image but don't
> turn the build red".
> 

Right.  I was suggesting that the Comparison step figure that out with this
logic:

for each failure:
  if the expectation for this image allows me to ignore:
    continue
  log failure

So that failure-ignored can disappear from the set of actuals, since it's
redundant.

> Given that we want to support failure-ignored, I think we want to specify it
> within the JSON expectations alongside the checksums themselves... and given
> that fact, it seems simplest to implement the should-I-ignore-this-failure
logic
> within gm.
> 
> > Just curious: Is there any reason why dashes are used here while underscores
> > seem to be the norm elsewhere?
> 
> You mean in other JSON files used by our project?  Or other JSON files in
> general?  It's fine with me to make it more consistent with other JSON files. 
> Please show me the other JSON files we should become consistent with, and we
can
> do that in a separate CL.  Sound OK?
> 

See the json files under buildbot/slave/skia_slave_scripts/page_sets

Sign in to reply to this message.

epoger

https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt File gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt (right): https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt#newcode15 gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt:15: "ignore-failure" : false On 2013/01/11 18:39:59, EricB wrote: > ...

12 years, 9 months ago (2013-01-11 20:20:10 UTC) #6

https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again...
File
gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt
(right):

https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again...
gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt:15:
"ignore-failure" : false
On 2013/01/11 18:39:59, EricB wrote:
> SGTM

Future CL: combine "no-comparison" and "missing reference images" into
"missing-expectation".

> > My goal is for the CompareGMs step to be combined into the GenerateGMs step.

> > One call to the gm binary that generates images, compares them against
> > expectations, and writes any mismatching ones to disk for human examination.
> > 
> 
> I'm not a fan of this idea.  I think it's nice that we are currently able to
see
> the RunGM step succeed and the CompareGMs step fail, rather than having
> everything fail together.

We discussed this live... we will continue to compare the actual GM results
against the expected results here in gm, because that will save us LOTS of time
we would otherwise spend writing out actualResult images even when they match
our expectations.

Future CL: add a --noFailOnMismatches flag to gm, with which gm will not return
nonzero just because an actual test image didn't match the JSON expectations. 
Instead, we will postpone this failure until a separate comparison step, so that
the RunGMs step will only go red for "serious problems" (e.g., assertion
failures or differences between RTree and tiled output)

> Right.  I was suggesting that the Comparison step figure that out with this
> logic:
> 
> for each failure:
>   if the expectation for this image allows me to ignore:
>     continue
>   log failure
> 
> So that failure-ignored can disappear from the set of actuals, since it's
> redundant.

We discussed this live... we will keep the "failure-ignored" section, distinct
from the "failed" section, so that a human being can quickly see the list of
real failures.

> > > Just curious: Is there any reason why dashes are used here while
underscores
> > > seem to be the norm elsewhere?

Future CL: change the JSON key strings to use underscores instead of dashes,
like
https://code.google.com/p/skia/source/browse/buildbot/slave/skia_slave_script...

https://codereview.appspot.com/7088044/diff/9001/gm/gm_expectations.h
File gm/gm_expectations.h (right):

https://codereview.appspot.com/7088044/diff/9001/gm/gm_expectations.h#newcode95
gm/gm_expectations.h:95: Json::Value allowedChecksumsAsJson() const {
I considered making this return the complete Expectations object in JSON format
(both allowedChecksums and ignoreFailures), but to do that I would have to move
the kJsonKey_ExpectedResults_* definitions into this file.

https://codereview.appspot.com/7088044/diff/9001/gm/gm_expectations.h#newcode107
gm/gm_expectations.h:107: const SkTArray<Checksum> fAllowedChecksums;
Made the fields private, and added the necessary accessors above.

I actually think the code was cleaner before, when the information we pulled out
of Expectations was exactly what we put into it (just locked up behind a const
declaration so it is read-only).

But I will leave it completely up to you, Eric... the old way or the new way? 
Or some hybrid?

Sign in to reply to this message.

EricB

12 years, 9 months ago (2013-01-15 15:58:22 UTC) #7

LGTM

https://codereview.appspot.com/7088044/diff/9001/gm/gm_expectations.h
File gm/gm_expectations.h (right):

https://codereview.appspot.com/7088044/diff/9001/gm/gm_expectations.h#newcode95
gm/gm_expectations.h:95: Json::Value allowedChecksumsAsJson() const {
On 2013/01/11 20:20:10, epoger wrote:
> I considered making this return the complete Expectations object in JSON
format
> (both allowedChecksums and ignoreFailures), but to do that I would have to
move
> the kJsonKey_ExpectedResults_* definitions into this file.

I kind of like that idea; it might be nice if gmmain didn't have to know
anything about JSON and could just say "match" or "no match".  Let's wait until
this settles a bit before making that decision though.

https://codereview.appspot.com/7088044/diff/9001/gm/gm_expectations.h#newcode107
gm/gm_expectations.h:107: const SkTArray<Checksum> fAllowedChecksums;
On 2013/01/11 20:20:10, epoger wrote:
> Made the fields private, and added the necessary accessors above.
> 
> I actually think the code was cleaner before, when the information we pulled
out
> of Expectations was exactly what we put into it (just locked up behind a const
> declaration so it is read-only).
> 
> But I will leave it completely up to you, Eric... the old way or the new way? 
> Or some hybrid?

I like this better.  To me, cleaner implies simpler code from the caller's
standpoint, and this delivers that: no direct access of member variables, simple
interface.

Sign in to reply to this message.

Issue 7088044: gm: use new Expectations struct instead of comparison bitmaps (Closed)

Description

Patch Set 1 #

Patch Set 2 : #

Patch Set 3 : sync_to_head #

Messages