[not sure if this will make it through to the list]

On 06/09/16 21:35, Jack Moffitt wrote:
I haven't quite settled on my dissertation topic, but my top contender at
the moment involves property-based (i.e. QuickCheck style) generation of
random web pages/stylesheets.

A sort of subtask of this which would be extremely useful is taking a
known rendering problem and producing a minimal reproduction of it.
For example, many issues are discovered in existing pages with perhaps
hundreds of kilobytes of extraneous data. It would be nice to reduce
the failing example to a minimal size. One issue is how to make an
oracle here. It would probably be an improvement to have it be only
semi-automated, where it does some shrinking then asks and repeats.

There is some prior art here e.g. [1]. I wrote a similar tool that was specialised to reducing js code whilst at Opera, but that doesn't ever seem to have been released. In both cases you either had to write a one-off function to determine if the testcase was a pass or fail, or have a human judge it. Obviously the latter is impractically slow if your input is large.

The oracle would be a cluster of browsers
(multiple vendors/variants) driven by WebDriver/Selenium that would render
the test cases and screenshot them. Significant discrepancies between
renderings would be considered a failing test case and then standard
QuickCheck-style shrinking would be used to reduce the test case HTML/CSS
to a minimal-ish reproducer.

Each browser renders things slightly differently, so pixel by pixel
comparison across browsers is probably not going to work well. For our
own testing of this kind we instead produce the same result using two
different techniques, or in a few cases we make reference images.
However making reference images can't account for all rendering
differences (like text) and so we avoid it if possible. I imagine it
would be quite difficult if the reference image was from another
engine, not our own.

Yes, I imagine specifically font rendering will be a problem, along with antialiasing in general and tegitimate-per-CSS variations in properties such as outline.

However I think you might make progress with some sort of consensus-based approach e.g. take a testcase and render it in gecko/blink/webkit/edge. If the difference by some metric (e.g. number of differing pixels, although more sophisticated approaches are possible) is within some threshold then check whether servo is within the same threshold. If it is consider that a pass otherwise a fail.

Is this idea of interest to the Servo team? Would it be useful for Servo
development/testing? Or perhaps redundant with existing testing I'm not
aware of?

The main kind of testing we do is reference testing where the
reference is the same content achieved by different means. This is
pretty robust to things like font rendering changing slightly between
versions. We have some JS level testing where JS APIs are invoked and
then results verified, but it sounds like you are more focused on the
visual testing aspect. As an aside, I think quickchecking JS  APIs is
likely to find a ton of bugs and be useful too, plus it probably
doesn't have the oracle problems.

But this is also a good idea :)

[1] http://www.squarefree.com/2007/09/15/introducing-lithium-a-testcase-reduction-tool/
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Reply via email to