|
|
Created:
11 years, 11 months ago by epoger Modified:
11 years, 11 months ago CC:
skia-review_googlegroups.com, junov, bungeman, edisonn Base URL:
http://skia.googlecode.com/svn/trunk/ Visibility:
Public. |
Descriptiongm: use new Expectations struct instead of comparison bitmaps
Committed: https://code.google.com/p/skia/source/detail?r=7198
Patch Set 1 #
Total comments: 14
Patch Set 2 : #
Total comments: 4
Patch Set 3 : sync_to_head #
MessagesTotal messages: 7
This will allow us to compare against more complex expectations than one bitmap per test (multiple allowed checksums, whether to ignore failures per-test, etc.) For now, though, it still only compares against a single bitmap per test, read from disk. But adding additional implementations of ExpectationsSource will be easy after this CL... context: https://goto.google.com/ImprovingTheSkiaRebaseliningProcess https://codereview.appspot.com/7088044/diff/1/gm/gm_expectations.h File gm/gm_expectations.h (right): https://codereview.appspot.com/7088044/diff/1/gm/gm_expectations.h#newcode20 gm/gm_expectations.h:20: // The actual type we use to represent a checksum is hidden in here. (in case we want to switch it to be an MD5 hash string instead of a uint64_t, for example) https://codereview.appspot.com/7088044/diff/1/gm/gm_expectations.h#newcode26 gm/gm_expectations.h:26: static SkString make_filename(const char path[], moved from gmmain.cpp https://codereview.appspot.com/7088044/diff/1/gm/gm_expectations.h#newcode80 gm/gm_expectations.h:80: class IndividualImageExpectationsSource : public ExpectationsSource { and coming soon: class JsonExpectationsSource : public ExpectationsSource https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp File gm/gmmain.cpp (left): https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp#oldcode569 gm/gmmain.cpp:569: // both NULL (and thus no images are read from or written to disk). This is why I split this method into two: compare_test_results_to_stored_expectations() similar to handle_test_results() for... - non-NULL readPath - blank renderModeDescriptor - NULL referenceBitmap compare_test_results_to_reference_bitmap() similar to handle_test_results() for... - NULL readPath - non-blank renderModeDescriptor - non-NULL referenceBitmap https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp File gm/gmmain.cpp (right): https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp#newcode467 gm/gmmain.cpp:467: * JSON summary. We may wish to change that. Setting the default to TRUE would make the JSON summary a whole lot bigger, which would be a mixed blessing. Pro: we would see exactly which drawing methods (RTree, tiled, etc.) ran for each test, and verify that they all generated the same checksums. (Multiple drawing methods are already verified now, but currently you can't tell which comparisons actually occurred by looking at the JSON summary or stdout.) Con: bigger file, more noise. https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp#newcode477 gm/gmmain.cpp:477: const char* completeName = completeNameString.c_str(); for example: baseNameString = "8888/dashing2" or "comparison/dashing2" completeNameString = "comparison/dashing2-pipe" or "comparison/dashing2-replay" https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again... File gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt (right): https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again... gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt:15: "ignore-failure" : false Changed this to be more consistent with the existing stdout, which yields: reading from gm/tests/inputs/empty-dir writing to gm/tests/outputs/compared-against-empty-dir/output-actual/images drawing... dashing2 [640 480] FAILED to read gm/tests/inputs/empty-dir/8888/dashing2.png Ran 1 tests: 0 passed, 0 failed, 1 missing reference images (I figured it made more sense for this to show up in "no-comparison" rather than "failure-ignored".) In the long run, if we want to keep both stdout summary and JSON report, maybe we should make the result types the same in both. Right now they are: JSON: - failed - failure-ignored - no-comparison - succeeded stdout: - passed - failed - missing reference images
Sign in to reply to this message.
https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp File gm/gmmain.cpp (right): https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp#newcode489 gm/gmmain.cpp:489: } Is there any reason for the Expectations object not to perform this test? ie. Add: bool Expectations::matchesExpectation(const Checksum&) then expectations.fAllowedChecksums could be private. https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again... File gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt (right): https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again... gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt:15: "ignore-failure" : false On 2013/01/11 07:58:09, epoger wrote: > Changed this to be more consistent with the existing stdout, which yields: > > reading from gm/tests/inputs/empty-dir > writing to gm/tests/outputs/compared-against-empty-dir/output-actual/images > drawing... dashing2 [640 480] > FAILED to read gm/tests/inputs/empty-dir/8888/dashing2.png > Ran 1 tests: 0 passed, 0 failed, 1 missing reference images > > (I figured it made more sense for this to show up in "no-comparison" rather than > "failure-ignored".) > > In the long run, if we want to keep both stdout summary and JSON report, maybe > we should make the result types the same in both. Right now they are: > > JSON: > - failed > - failure-ignored > - no-comparison > - succeeded > > stdout: > - passed > - failed > - missing reference images What if no-comparison and "missing reference images" became missing-expectation? Also, how often do we expect to use the failure-ignored case? Would it be okay for GM to consider those as failures, write the images, and then let the CompareGMs step read the expected-results to determine whether it should ignore each failure? Then we could remove failure-ignored altogether. Just curious: Is there any reason why dashes are used here while underscores seem to be the norm elsewhere?
Sign in to reply to this message.
CCing junov, bungeman, edi
Sign in to reply to this message.
Thanks, Eric... https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp File gm/gmmain.cpp (right): https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp#newcode489 gm/gmmain.cpp:489: } On 2013/01/11 15:04:06, EricB wrote: > Is there any reason for the Expectations object not to perform this test? ie. > Add: > bool Expectations::matchesExpectation(const Checksum&) > > then expectations.fAllowedChecksums could be private. We could do that... but we would still need a way to read out the whole set of allowedChecksums for the JSON expectations summary (line 565 below). To some extent, fAllowedChecksums is reasonably well encapsulated because it is a const SkTArray. Once the Expectations object is created, nobody will be able to modify that list. If you think it would be better, I could create Expectations::matches(const Checksum &), and call it here, but still walk through the array at line 565. OR, I could add Expectations::matches(const Checksum &) and Expectations::getAllowedChecksumsAsJsonArray(), and make fAllowedChecksums private... What do you think would be best? I'm pretty open. https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again... File gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt (right): https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again... gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt:15: "ignore-failure" : false On 2013/01/11 15:04:06, EricB wrote: > What if no-comparison and "missing reference images" became missing-expectation? I LIKE IT! If it's OK with you, I'd like to make that change in a future CL (this one is big enough already) > Also, how often do we expect to use the failure-ignored case? Would it be okay > for GM to consider those as failures, write the images, and then let the > CompareGMs step read the expected-results to determine whether it should ignore > each failure? Then we could remove failure-ignored altogether. My goal is for the CompareGMs step to be combined into the GenerateGMs step. One call to the gm binary that generates images, compares them against expectations, and writes any mismatching ones to disk for human examination. The most common failure case we would want to ignore is missing-expectation. But there is certainly also the situation of "this test used to work, and we want the buildbots to go green so we notice other regressions while we figure it out or rebaseline it". So I definitely think we want to support a notation like "here is the expected image; if it fails, show me the actual image but don't turn the build red". Given that we want to support failure-ignored, I think we want to specify it within the JSON expectations alongside the checksums themselves... and given that fact, it seems simplest to implement the should-I-ignore-this-failure logic within gm. > Just curious: Is there any reason why dashes are used here while underscores > seem to be the norm elsewhere? You mean in other JSON files used by our project? Or other JSON files in general? It's fine with me to make it more consistent with other JSON files. Please show me the other JSON files we should become consistent with, and we can do that in a separate CL. Sound OK?
Sign in to reply to this message.
https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp File gm/gmmain.cpp (right): https://codereview.appspot.com/7088044/diff/1/gm/gmmain.cpp#newcode489 gm/gmmain.cpp:489: } On 2013/01/11 17:38:54, epoger wrote: > On 2013/01/11 15:04:06, EricB wrote: > > Is there any reason for the Expectations object not to perform this test? ie. > > Add: > > bool Expectations::matchesExpectation(const Checksum&) > > > > then expectations.fAllowedChecksums could be private. > > We could do that... but we would still need a way to read out the whole set of > allowedChecksums for the JSON expectations summary (line 565 below). > > To some extent, fAllowedChecksums is reasonably well encapsulated because it is > a const SkTArray. Once the Expectations object is created, nobody will be able > to modify that list. > > If you think it would be better, I could create Expectations::matches(const > Checksum &), and call it here, but still walk through the array at line 565. > > OR, I could add Expectations::matches(const Checksum &) and > Expectations::getAllowedChecksumsAsJsonArray(), and make fAllowedChecksums > private... > > What do you think would be best? I'm pretty open. I like the idea of keeping the array hidden, but I guess it's a trade-off as to whether you want to introduce the JSON business into that class. https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again... File gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt (right): https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again... gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt:15: "ignore-failure" : false On 2013/01/11 17:38:54, epoger wrote: > On 2013/01/11 15:04:06, EricB wrote: > > What if no-comparison and "missing reference images" became > missing-expectation? > > I LIKE IT! > > If it's OK with you, I'd like to make that change in a future CL (this one is > big enough already) > SGTM > > Also, how often do we expect to use the failure-ignored case? Would it be > okay > > for GM to consider those as failures, write the images, and then let the > > CompareGMs step read the expected-results to determine whether it should > ignore > > each failure? Then we could remove failure-ignored altogether. > > My goal is for the CompareGMs step to be combined into the GenerateGMs step. > One call to the gm binary that generates images, compares them against > expectations, and writes any mismatching ones to disk for human examination. > I'm not a fan of this idea. I think it's nice that we are currently able to see the RunGM step succeed and the CompareGMs step fail, rather than having everything fail together. Sure, you can drill down into the log to figure out what happened, but that's a lot more work for me if, for example, I made a change that might cause GM to crash but wouldn't have affected the pixels of any images, where now I could easily see that it was the comparison which failed and it's probably someone else's fault. For the record, I think it's a bug that CompareAndUploadWebpageGMs is one step, for the same reason. Uploads may fail due to all sorts of flakiness, but the comparison should be pretty stable. Having them lumped together causes confusion on failure. > The most common failure case we would want to ignore is missing-expectation. > But there is certainly also the situation of "this test used to work, and we > want the buildbots to go green so we notice other regressions while we figure it > out or rebaseline it". So I definitely think we want to support a notation like > "here is the expected image; if it fails, show me the actual image but don't > turn the build red". > Right. I was suggesting that the Comparison step figure that out with this logic: for each failure: if the expectation for this image allows me to ignore: continue log failure So that failure-ignored can disappear from the set of actuals, since it's redundant. > Given that we want to support failure-ignored, I think we want to specify it > within the JSON expectations alongside the checksums themselves... and given > that fact, it seems simplest to implement the should-I-ignore-this-failure logic > within gm. > > > Just curious: Is there any reason why dashes are used here while underscores > > seem to be the norm elsewhere? > > You mean in other JSON files used by our project? Or other JSON files in > general? It's fine with me to make it more consistent with other JSON files. > Please show me the other JSON files we should become consistent with, and we can > do that in a separate CL. Sound OK? > See the json files under buildbot/slave/skia_slave_scripts/page_sets
Sign in to reply to this message.
https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again... File gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt (right): https://codereview.appspot.com/7088044/diff/1/gm/tests/outputs/compared-again... gm/tests/outputs/compared-against-empty-dir/output-expected/json-summary.txt:15: "ignore-failure" : false On 2013/01/11 18:39:59, EricB wrote: > SGTM Future CL: combine "no-comparison" and "missing reference images" into "missing-expectation". > > My goal is for the CompareGMs step to be combined into the GenerateGMs step. > > One call to the gm binary that generates images, compares them against > > expectations, and writes any mismatching ones to disk for human examination. > > > > I'm not a fan of this idea. I think it's nice that we are currently able to see > the RunGM step succeed and the CompareGMs step fail, rather than having > everything fail together. We discussed this live... we will continue to compare the actual GM results against the expected results here in gm, because that will save us LOTS of time we would otherwise spend writing out actualResult images even when they match our expectations. Future CL: add a --noFailOnMismatches flag to gm, with which gm will not return nonzero just because an actual test image didn't match the JSON expectations. Instead, we will postpone this failure until a separate comparison step, so that the RunGMs step will only go red for "serious problems" (e.g., assertion failures or differences between RTree and tiled output) > Right. I was suggesting that the Comparison step figure that out with this > logic: > > for each failure: > if the expectation for this image allows me to ignore: > continue > log failure > > So that failure-ignored can disappear from the set of actuals, since it's > redundant. We discussed this live... we will keep the "failure-ignored" section, distinct from the "failed" section, so that a human being can quickly see the list of real failures. > > > Just curious: Is there any reason why dashes are used here while underscores > > > seem to be the norm elsewhere? Future CL: change the JSON key strings to use underscores instead of dashes, like https://code.google.com/p/skia/source/browse/buildbot/slave/skia_slave_script... https://codereview.appspot.com/7088044/diff/9001/gm/gm_expectations.h File gm/gm_expectations.h (right): https://codereview.appspot.com/7088044/diff/9001/gm/gm_expectations.h#newcode95 gm/gm_expectations.h:95: Json::Value allowedChecksumsAsJson() const { I considered making this return the complete Expectations object in JSON format (both allowedChecksums and ignoreFailures), but to do that I would have to move the kJsonKey_ExpectedResults_* definitions into this file. https://codereview.appspot.com/7088044/diff/9001/gm/gm_expectations.h#newcode107 gm/gm_expectations.h:107: const SkTArray<Checksum> fAllowedChecksums; Made the fields private, and added the necessary accessors above. I actually think the code was cleaner before, when the information we pulled out of Expectations was exactly what we put into it (just locked up behind a const declaration so it is read-only). But I will leave it completely up to you, Eric... the old way or the new way? Or some hybrid?
Sign in to reply to this message.
LGTM https://codereview.appspot.com/7088044/diff/9001/gm/gm_expectations.h File gm/gm_expectations.h (right): https://codereview.appspot.com/7088044/diff/9001/gm/gm_expectations.h#newcode95 gm/gm_expectations.h:95: Json::Value allowedChecksumsAsJson() const { On 2013/01/11 20:20:10, epoger wrote: > I considered making this return the complete Expectations object in JSON format > (both allowedChecksums and ignoreFailures), but to do that I would have to move > the kJsonKey_ExpectedResults_* definitions into this file. I kind of like that idea; it might be nice if gmmain didn't have to know anything about JSON and could just say "match" or "no match". Let's wait until this settles a bit before making that decision though. https://codereview.appspot.com/7088044/diff/9001/gm/gm_expectations.h#newcode107 gm/gm_expectations.h:107: const SkTArray<Checksum> fAllowedChecksums; On 2013/01/11 20:20:10, epoger wrote: > Made the fields private, and added the necessary accessors above. > > I actually think the code was cleaner before, when the information we pulled out > of Expectations was exactly what we put into it (just locked up behind a const > declaration so it is read-only). > > But I will leave it completely up to you, Eric... the old way or the new way? > Or some hybrid? I like this better. To me, cleaner implies simpler code from the caller's standpoint, and this delivers that: no direct access of member variables, simple interface.
Sign in to reply to this message.
|