I've used test.generative[1], data.generators[2], and most recently,
simple-check[3] to perform specification-based testing of various libraries
for some time now, and now consider such tools indispensable.
However, one thing that's never sat well with me about them is the
implicit objective of producing repeatable streams of randomized test data.
For example, docs for the default java.util.Random instance in
data.generators[4] (which is initialized using a constant seed) reads:
"Random instance for use in generators. By consistently using this
instance you can get a repeatable basis for tests."
There's certainly a tradition of defaulting to trying to obtain reliably
repeatable test data; e.g. Haskell's QuickCheck (with which I only have
extremely shallow/recent experience) also makes it easy to use the same
seed for pseudorandom data generation.
My question is, why? Perhaps someone can help explain why this is
considered a quality objective. A number of reasons for doing exactly
the opposite (i.e. always using a new seed for pseudorandom data
generation) occur to me:
* There are a number of circumstances where using the same seed will
nevertheless produce a different set of data for test run 2 than was
generated for test run 1:
- Implementation differences / bugs between JDKs / operating systems; I've
seen issues crop up in applications and libraries that depend upon a
particular pattern of randomized data (cite needed, I know, can't find the
references right now)
- Simply adding, removing, or changing what your tests or code under test
does will end up obtaining different random data in a different order
- Changes outside of the codebase can do the same, e.g. Leiningen changing
the order in which it tests namespaces, or testing a single specification
alone that was previously in the middle of a full run
* If the objective of repeatable test data is to effectively provide a
regression test (to use a unit testing term), it seems that a far more
efficient route would be to simply add previously-failing test data to a
retained file or other datastore, and use it as a prefix or suffix to
broader, fully randomized testing. This is particularly easy to do if you
are using a shrinker (as provided in simple-check), which can help to
significantly minimize the raw size of failing test cases, making them much
easier to handle in general (i.e. you find a vector of three small strings
that fail, rather than having to cart around a vector of 400 massive strings
that originally are what caused the fault).
* Finally, while you have the option of e.g. rebinding or bashing out
`clojure.data.generators/*rnd*` to use a `java.util.Random` with a fresh
seed on each test run, the "spirit" of specification/generative testing
would seem to call for casting as wide a net as possible to find failing
cases, rather than constantly retreading the same ground over and over
again.
(tl;dr: repeatable randomized streams of data are fragile side effects of your
codebase and tools, manually-curated sets of regression test data may serve
the same purposes more efficiently, and, "Why not test more different datasets
rather than the same ones over and over?")
Perhaps this is better off as a blog post, but I'd love to hear some
perspectives on this here.
Thoughts?
Cheers,
- Chas
[1] https://github.com/clojure/test.generative
[2] https://github.com/clojure/data.generators
[3] https://github.com/reiddraper/simple-check
[4]
https://github.com/clojure/data.generators/blob/master/src/main/clojure/clojure/data/generators.clj#L18
--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.