Issue 52057: option to run the tests in a random order

Zhanyong Wan

Hi Josh, I apologize for the massive delay. In general I like what I'm seeing. ...

16 years, 8 months ago (2009-06-16 06:35:11 UTC) #1

Zhanyong Wan

One thing I learned from this is that big patches can take a long time ...

16 years, 8 months ago (2009-06-18 21:10:15 UTC) #2

Josh Kelley

I uploaded a reduced patch set, as you suggested, that only implements parsing, managing, and ...

16 years, 8 months ago (2009-06-25 02:07:02 UTC) #3

I uploaded a reduced patch set, as you suggested, that only implements parsing,
managing, and displaying the flags.  (I realize that a couple of aspects of
managing and displaying may change depending on your review here.)

I tried to address your other comments, too, but feel free to ignore those until
those patches are ready.

I honestly didn't think that shuffling the list was that bad (although the unit
tests for shuffling were pretty ugly).  To help me understand better, could you
explain your concerns?  (Of course, if the event listener implementation will be
replacing List with something randomly accessible anyway, then it's irrelevant.)

You mentioned that many more tests are needed.  I know that I'd originally
suggested adding a Python script gtest_shuffle_test.py that does the following:
1) Runs --gtest_shuffle --gtest_repeat=3 and verifies non-repeating seeds.
2) Runs --gtest_shuffle --gtest_random_seed=n and verifies that the order does
in fact change.
3) Runs a test suite containing death tests 10 times or so and verifies that
death tests always occur before non-death tests.
What other tests would you like to see?

http://codereview.appspot.com/52057/diff/1/3
File include/gtest/internal/gtest-internal.h (right):

http://codereview.appspot.com/52057/diff/1/3#newcode761
Line 761: explicit Random(UInt32 state = 1) : state_(state) {}
On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> Google's C++ style guide bans default arguments.

I saw that the guide also bans overloading used to simulate default arguments,
so I wasn't certain what to do.

Is
Random() : state_(1) {}
explicit Random(UInt32 state) : state_(state) {}
okay even though it's overloaded to (sort of) simulate default arguments?

http://codereview.appspot.com/52057/diff/1/3#newcode763
Line 763: int Generate(int range);
On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> Why aren't the range and return type UInt32?

I saw that the style guide strongly discourages the use of unsigned, and so it
seemed better to treat the unsignedness and bit width as implementation details
and use the default int for the public interface.

http://codereview.appspot.com/52057/diff/1/5
File src/gtest-internal-inl.h (right):

http://codereview.appspot.com/52057/diff/1/5#newcode1373
Line 1373: internal::Random* random_;
On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> Why not use an object (instead of a pointer)?

I think that was left over from an earlier design I'd tried.  Thanks for
catching this.

http://codereview.appspot.com/52057/diff/1/6
File src/gtest.cc (right):

http://codereview.appspot.com/52057/diff/1/6#newcode257
Line 257: const UInt32 kM = 0x7fffffffu;  // analogous to RAND_MAX
On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> Why isn't this 0xffffffff?

This (as well as kA and kC) came from glibc's implementation of rand().

After rereading the Wikipedia article on LCGs, it looks like other numbers would
give better results, but only if I use a 64-bit temporary.  I assume that's not
worth doing.

http://codereview.appspot.com/52057/diff/1/6#newcode264
Line 264: return int(double(state_) / (double(kM) + 1) * range);
On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> Is this any better than state_ % range?

state_ % range introduces a downward bias, since values for state_ in the range
of kM - kM % range through kM, inclusive, result in numbers in the lower portion
of [0, range).

Converting via double also introduces bias, since the number of values of state_
that map to each value of [0, range) isn't consistent, but the bias is at least
spread throughout.

Converting via double should also help avoid the problem that some LCGs have
where the lower bits have relatively little randomness.

http://codereview.appspot.com/52057/diff/1/6#newcode2677
Line 2677: "Note: Randomizing tests' orders with a seed of %i\n",
On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> This info is better printed in OnUnitTestStart() as we want it to be easily
> findable.

Printing in OnUnitTestStart() would look nicer, but printing it here reflects
any changes that the user makes in a test environment.

Which do you think is better?

http://codereview.appspot.com/52057/diff/1/6#newcode4377
Line 4377: GTEST_FLAG(random_seed) = static_cast<Int32>(GetTimeInMillis());
On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> It's best to treat the random_seed flag as read-only by gtest.  We don't want
to
> lose the information that the user set it to 0, for example.

I don't follow.  If it defaults to 0, then how does treating it as read-only
preserve the information that the user set it to 0?  If 0 means "use a
default/time-based seed," then what's wrong with overwriting 0, whether 0 came
from the user or the default?

If it is tentatively a goal to let the user manipulate the shuffling-related
flags in global and test case setup and teardown, then I thought that reusing or
overwriting random_seed would be the best way to do this.

Sign in to reply to this message.

Zhanyong Wan

http://codereview.appspot.com/52057/diff/1/3 File include/gtest/internal/gtest-internal.h (right): http://codereview.appspot.com/52057/diff/1/3#newcode761 Line 761: explicit Random(UInt32 state = 1) : state_(state) {} ...

16 years, 7 months ago (2009-06-29 21:20:16 UTC) #4

http://codereview.appspot.com/52057/diff/1/3
File include/gtest/internal/gtest-internal.h (right):

http://codereview.appspot.com/52057/diff/1/3#newcode761
Line 761: explicit Random(UInt32 state = 1) : state_(state) {}
On 2009/06/25 02:07:02, Josh Kelley wrote:
> On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> > Google's C++ style guide bans default arguments.
> 
> I saw that the guide also bans overloading used to simulate default arguments,
> so I wasn't certain what to do.
> 
> Is
> Random() : state_(1) {}

Why do we have to have this default ctor?  We can always initialize Random
objects with an explicit initial state.
> explicit Random(UInt32 state) : state_(state) {}
> okay even though it's overloaded to (sort of) simulate default arguments?

http://codereview.appspot.com/52057/diff/1/3#newcode762
Line 762: void Reseed(UInt32 state) { state_ = state; }
"seed" and "state" are different concepts.  The projection from seeds to states
doesn't have to be a bijection.  For example, we limit the valid seeds to the
range [1, 9999], but the range of valid states can be much larger.

Therefore the argument should be renamed seed.  I think we should keep the state
an implementation detail, as there's no need for the user to know about it. 
This means the ctor's argument should be "seed" too.

http://codereview.appspot.com/52057/diff/1/3#newcode763
Line 763: int Generate(int range);
On 2009/06/25 02:07:02, Josh Kelley wrote:
> On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> > Why aren't the range and return type UInt32?
> 
> I saw that the style guide strongly discourages the use of unsigned, and so it
> seemed better to treat the unsignedness and bit width as implementation
details
> and use the default int for the public interface.

I think Random is one of the places where you do care about the signedness and
width of the value.  A generic int type isn't good enough.  Suppose you use the
code on an architecture where sizeof(int) is 8, you will get very bad result as
Random can only generate results in [0, 2^32 - 1].  We should make this fact
explicit in the type.  That means we should use UInt32 as the input and output
type.

http://codereview.appspot.com/52057/diff/1/6
File src/gtest.cc (right):

http://codereview.appspot.com/52057/diff/1/6#newcode257
Line 257: const UInt32 kM = 0x7fffffffu;  // analogous to RAND_MAX
On 2009/06/25 02:07:02, Josh Kelley wrote:
> On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> > Why isn't this 0xffffffff?
> 
> This (as well as kA and kC) came from glibc's implementation of rand().
> 
> After rereading the Wikipedia article on LCGs, it looks like other numbers
would
> give better results, but only if I use a 64-bit temporary.  I assume that's
not
> worth doing.

Correct, it's not worth doing.

What I'd like to see if a comment on how the constant was picked, such that the
reader knows it's not arbitrary.

http://codereview.appspot.com/52057/diff/1/6#newcode264
Line 264: return int(double(state_) / (double(kM) + 1) * range);
On 2009/06/25 02:07:02, Josh Kelley wrote:
> On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> > Is this any better than state_ % range?
> 
> state_ % range introduces a downward bias, since values for state_ in the
range
> of kM - kM % range through kM, inclusive, result in numbers in the lower
portion
> of [0, range).
> 
> Converting via double also introduces bias, since the number of values of
state_
> that map to each value of [0, range) isn't consistent, but the bias is at
least
> spread throughout.

Yes.  However I doubt this bias matters for our purpose.

> Converting via double should also help avoid the problem that some LCGs have
> where the lower bits have relatively little randomness.

I think % is good enough for our purpose.  I prefer to keep it simple.

http://codereview.appspot.com/52057/diff/1/6#newcode2677
Line 2677: "Note: Randomizing tests' orders with a seed of %i\n",
On 2009/06/25 02:07:02, Josh Kelley wrote:
> On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> > This info is better printed in OnUnitTestStart() as we want it to be easily
> > findable.
> 
> Printing in OnUnitTestStart() would look nicer, but printing it here reflects
> any changes that the user makes in a test environment.
> 
> Which do you think is better?

To keep it simple, I think gtest should only look at the values of the shuffle
and random_seed flags once at the beginning of the test program.  Modifying the
flags afterwards should have no effect.  Therefore we should print the message
in OnUnitTestStart().

http://codereview.appspot.com/52057/diff/1/6#newcode4377
Line 4377: GTEST_FLAG(random_seed) = static_cast<Int32>(GetTimeInMillis());
On 2009/06/25 02:07:02, Josh Kelley wrote:
> On 2009/06/16 06:35:11, Zhanyong Wan wrote:
> > It's best to treat the random_seed flag as read-only by gtest.  We don't
want
> to
> > lose the information that the user set it to 0, for example.
> 
> I don't follow.  If it defaults to 0, then how does treating it as read-only
> preserve the information that the user set it to 0?  If 0 means "use a
> default/time-based seed," then what's wrong with overwriting 0, whether 0 came
> from the user or the default?

By "treating it as read-only" I mean that gtest should never modify the flags
(except in the flag parser).  We want to have a record on what the user
specified and we don't want to lose that information.

I don't care to distinguish between explicit --gtest_random_seed=0 and no
gtest_random_seed flag.  They both mean that the user wants to use the default,
time-based, random seed.  In other words, the user's intention is the same in
the two cases.

However, I do care about the difference between --gtest_random_seed=1234 and
--gtest_random_seed=0 where the clock happens to tell gtest to pick 1234, as the
user intention is different in the two cases.

What gtest should do is to use a separate actual_random_seed variable somewhere
to hold the actual random seed.

> If it is tentatively a goal to let the user manipulate the shuffling-related
> flags in global and test case setup and teardown, then I thought that reusing
or
> overwriting random_seed would be the best way to do this.

No, this is not a goal.

http://codereview.appspot.com/52057/diff/4001/5004#newcode223
Line 223: 
Also say that 0 means to get the seed from the current time.

Also explain what the valid range is.

http://codereview.appspot.com/52057/diff/4001/5004#newcode2754
Line 2754: fflush(stdout);
We should normalize the seed into the range [0, 9999].

http://codereview.appspot.com/52057/diff/4001/5004#newcode4467
Line 4467: 
Function names should normally be verb phrases.  Also we should make it obvious
that the seed is obtained from the current time.  GetRandomSeedFromTime()?

http://codereview.appspot.com/52057/diff/4001/5004#newcode4468
Line 4468: 
Change 10000 to 9999 to match the name "max default random seed".

http://codereview.appspot.com/52057/diff/4001/5004#newcode4469
Line 4469: 
static_cast<Int32>((GetTimeInMillis() % kMaxDefaultRandomSeed) + 1)

- The cast should enclose the entire expression.
- The result range should be [1, 9999] as 0 is invalid (if we pick 0, the user
won't be able to reproduce the failure as he cannot specify 0 as the seed).

Sign in to reply to this message.

Josh Kelley

A general question: I assume that assertions are a good idea, even if existing code ...

16 years, 7 months ago (2009-07-10 02:16:44 UTC) #5

Zhanyong Wan

Thanks for keeping working on this, Josh! http://codereview.appspot.com/52057/diff/9001/9003 File test/gtest_list_tests_unittest.py (right): http://codereview.appspot.com/52057/diff/9001/9003#newcode57 Line 57: BarDeathTest. ...

16 years, 5 months ago (2009-09-22 05:46:24 UTC) #6

Josh Kelley

http://codereview.appspot.com/52057/diff/9001/9003 File test/gtest_list_tests_unittest.py (right): http://codereview.appspot.com/52057/diff/9001/9003#newcode57 Line 57: BarDeathTest. On 2009/09/22 05:46:24, Zhanyong Wan wrote: > ...

16 years, 5 months ago (2009-09-23 03:35:19 UTC) #7

http://codereview.appspot.com/52057/diff/9001/9003
File test/gtest_list_tests_unittest.py (right):

http://codereview.appspot.com/52057/diff/9001/9003#newcode57
Line 57: BarDeathTest.
On 2009/09/22 05:46:24, Zhanyong Wan wrote:
> Why this change?

This (and the change in gtest_list_tests_unittest_.cc) were so that a Python
test script could call gtest_list_tests_unittest_ and verify that death tests
were shuffled properly.  I reverted this change in my most recent patch since
you said you'd take care of the Python scripts.

http://codereview.appspot.com/52057/diff/11001/12003
File src/gtest-internal-inl.h (right):

http://codereview.appspot.com/52057/diff/11001/12003#newcode405
Line 405: GTEST_CHECK_(0 <= from && from <= size_)
On 2009/09/22 05:46:24, Zhanyong Wan wrote:
> from should be < size.

Since to == size is permitted (to shuffle the entire list), and since from == to
is permitted (shuffle an empty range), then from == size should be permitted. 
If from cannot equal size, then we'd have to add special cases to handle, e.g.,
shuffling a test suite containing only death test cases.

http://codereview.appspot.com/52057/diff/11001/12003#newcode420
Line 420: const int k = random->Generate(n - from) + from;  // from <= k < n
On 2009/09/22 05:46:24, Zhanyong Wan wrote:
> It's slightly more intuitive like this:
> 
> for (int range_width = to - from; range_width >= 2; range_width--) {
>   const int last_in_range = from + range_width - 1;
>   const int selected = from + random->Generate(range_width);
>   Swap(selected, last_in_range);
> }
> 
> - The termination condition is more intuitive: the range must contain >= 2
> elements to be worth shuffling.
> - Give the two elements being swapped meaningful names.
> 

I took the code with almost no modification from Wikipedia, which apparently
took the algorithm (including the variable names n and k) from Durstenfield's
and Knuth's work.  I see how your approach is clearer for those new to the code,
but I would guess that people familiar with the algorithm or reading the
complete Wikipedia article would find Wikipedia's approach clearer.  Thoughts?

http://codereview.appspot.com/52057/diff/11001/12003#newcode976
Line 976: // Gets the random seed used at the start of the current test run.
On 2009/09/22 05:46:24, Zhanyong Wan wrote:
> Won't it be more useful for this to be the seed used at the beginning of the
> current test *iteration*?

My terminology was poor.  It's for the current iteration.

http://codereview.appspot.com/52057/diff/11001/12004
File src/gtest.cc (right):

http://codereview.appspot.com/52057/diff/11001/12004#newcode3703
Line 3703: random_(1),
On 2009/09/22 05:46:24, Zhanyong Wan wrote:
> Why 1?

A default c'tor for Random didn't really seem appropriate, so I needed some seed
to use until we got a random seed from flags or from the clock, and 0 is a
special case and not a valid seed, so I picked 1.  Although I guess seemingly
magic numbers are bad too.  What would be a better approach?

http://codereview.appspot.com/52057/diff/11001/12004#newcode3921
Line 3921: random()->Reseed(random_seed_);
On 2009/09/22 05:46:24, Zhanyong Wan wrote:
> If I see a failure in the 700-th iteration, how can I repro it without running
> the first 699 iterations?
> 
> I think we should restore the original test order at the end of an iteration,
> and then shuffle with a new seed.  This allows the user to reconstruct the
test
> order in any iteration without running all iterations before it.

Shoot.  I failed to consider how each shuffling currently depends on the results
of the previous shuffle.  I'll have to fix that.

http://codereview.appspot.com/52057/diff/11001/12001
File test/gtest_unittest.cc (right):

http://codereview.appspot.com/52057/diff/11001/12001#newcode790
Line 790: static const int kVectorSize = 20;
On 2009/09/22 05:46:24, Zhanyong Wan wrote:
> Style: data members should be defined at the end of the class.

The style guide says constants go towards the beginning?

http://codereview.appspot.com/52057/diff/11001/12001#newcode840
Line 840: if (i != vector.GetElement(i)) {
On 2009/09/22 05:46:24, Zhanyong Wan wrote:
> This is more strict than "is sequential".  The sequence "2 3 4" and "1 2 3"
> (starting at location 2) are both sequential, but the function only returns
true
> for the former.
> 
> Rename to RangeIsUnshuffled?

I was having trouble thinking of a good name.  RangeIsUnshuffled is good. 
Thanks.

http://codereview.appspot.com/52057/diff/11001/12001#newcode890
Line 890: EXPECT_EQ(kVectorSize - 1, vector_.GetElement(kVectorSize - 1));
On 2009/09/22 05:46:24, Zhanyong Wan wrote:
> Why this?

It's sort of a precondition, or a test of the test code, to make sure that the
list is set up as this test expects and that the assertions below won't
accidentally pass.

http://codereview.appspot.com/52057/diff/11001/12001#newcode898
Line 898: // there are no off-by-one problems in our shuffle algorithm.
On 2009/09/22 05:46:24, Zhanyong Wan wrote:
> I don't understand what problem the following two assertions are catching.

I wanted to make sure that the shuffle algorithm changes the first and last
elements of the range; if so, then it seems safe to assume that it shuffles the
entire range.  The Wikipedia article I was using as a reference had a warning
about common implementation errors such as an off-by-one mistake that could
cause the last element to be shuffled incorrectly.  Reading that prompted me to
add these tests, although the comments I wrote aren't very descriptive.  Sorry.

Sign in to reply to this message.

Zhanyong Wan

16 years, 5 months ago (2009-09-23 05:28:08 UTC) #8

http://codereview.appspot.com/52057/diff/11001/12003
File src/gtest-internal-inl.h (right):

http://codereview.appspot.com/52057/diff/11001/12003#newcode405
Line 405: GTEST_CHECK_(0 <= from && from <= size_)
On 2009/09/23 03:35:19, Josh Kelley wrote:
> On 2009/09/22 05:46:24, Zhanyong Wan wrote:
> > from should be < size.
> 
> Since to == size is permitted (to shuffle the entire list), and since from ==
to
> is permitted (shuffle an empty range), then from == size should be permitted. 
> If from cannot equal size, then we'd have to add special cases to handle,
e.g.,
> shuffling a test suite containing only death test cases.

Makes sense!

http://codereview.appspot.com/52057/diff/11001/12003#newcode420
Line 420: const int k = random->Generate(n - from) + from;  // from <= k < n
On 2009/09/23 03:35:19, Josh Kelley wrote:
> On 2009/09/22 05:46:24, Zhanyong Wan wrote:
> > It's slightly more intuitive like this:
> > 
> > for (int range_width = to - from; range_width >= 2; range_width--) {
> >   const int last_in_range = from + range_width - 1;
> >   const int selected = from + random->Generate(range_width);
> >   Swap(selected, last_in_range);
> > }
> > 
> > - The termination condition is more intuitive: the range must contain >= 2
> > elements to be worth shuffling.
> > - Give the two elements being swapped meaningful names.
> > 
> 
> I took the code with almost no modification from Wikipedia, which apparently
> took the algorithm (including the variable names n and k) from Durstenfield's
> and Knuth's work.  I see how your approach is clearer for those new to the
code,
> but I would guess that people familiar with the algorithm or reading the
> complete Wikipedia article would find Wikipedia's approach clearer.  Thoughts?

The code should be as self-contained and self-explaining as possible.  It's a
distraction to ask the reader to open a web page and read it.

This algorithm is simple enough that one doesn't need to read the wikipedia
entry to understand how it works.  We should optimize for people who don't read
the article.  Also, the code I wrote is implementing the same algorithm - just a
different encoding.  I don't think people who already know about the wiki
article or Knuth's work will be confused, unless they memorize it word by word
without understanding the essence of it - in that case I don't feel sorry for
them. :-)

http://codereview.appspot.com/52057/diff/11001/12004
File src/gtest.cc (right):

http://codereview.appspot.com/52057/diff/11001/12004#newcode3703
Line 3703: random_(1),
On 2009/09/23 03:35:19, Josh Kelley wrote:
> On 2009/09/22 05:46:24, Zhanyong Wan wrote:
> > Why 1?
> 
> A default c'tor for Random didn't really seem appropriate, so I needed some
seed
> to use until we got a random seed from flags or from the clock, and 0 is a
> special case and not a valid seed, so I picked 1.  Although I guess seemingly
> magic numbers are bad too.  What would be a better approach?

The fact that 0 is not a seed usable by the user makes it a good choice here. 
This tells the reader "we are just initializing random_ with an invalid seed
here, and we'll reseed it later, so don't read too much into the seed value."

I'd suggest to initialize it with 0 and add a comment that we'll reseed it
before first use.

http://codereview.appspot.com/52057/diff/11001/12001
File test/gtest_unittest.cc (right):

http://codereview.appspot.com/52057/diff/11001/12001#newcode790
Line 790: static const int kVectorSize = 20;
On 2009/09/23 03:35:19, Josh Kelley wrote:
> On 2009/09/22 05:46:24, Zhanyong Wan wrote:
> > Style: data members should be defined at the end of the class.
> 
> The style guide says constants go towards the beginning?

You're absolutely right.  Ashamed that I didn't do my homework. :)

http://codereview.appspot.com/52057/diff/11001/12001#newcode890
Line 890: EXPECT_EQ(kVectorSize - 1, vector_.GetElement(kVectorSize - 1));
On 2009/09/23 03:35:19, Josh Kelley wrote:
> On 2009/09/22 05:46:24, Zhanyong Wan wrote:
> > Why this?
> 
> It's sort of a precondition, or a test of the test code, to make sure that the
> list is set up as this test expects and that the assertions below won't
> accidentally pass.

In general, we don't test the test code, as it obscures the intention of the
test.  (It confused me in this case, for example.)  Instead, the test code
itself should be obviously correct.  If that's not the case, we probably have a
bigger problem.

Pretty much all tests in VectorShuffleTest depends on vector_ being {0,1,2,...}.
 We don't assert it else where.  I don't think we need to make a special case
here.

Sign in to reply to this message.

Issue 52057: option to run the tests in a random order

Patch Set 1 #

Patch Set 2 : test shuffling - flag parsing and management only #

Patch Set 3 : option to run the tests in a random order #

Patch Set 4 : option to run the tests in a random order #

Patch Set 5 : final patch implementing running tests in a random order #

Messages