ML for VRT - Part 1 (was: Machine Learning vs. Screenshot Comparing?)

Wolfram Kriesing - August 4, 2020 - tagged with: #testing #machine learning #automation #visual regression test #tidbit

I broke this site, and thanks to @Holger reporting the error I figured out I should have done more testing instead of just tweeting that I should :).

I can create golden master tests now, that screenshot all the pages of my blog now, I refactor and should end up with the same screenshots.
And then I can do the changes that I want to do, which will be simple after the refactor.
I should!
— @wolframkriesing August 2, 2020

Machine Learning FTW?

So I asked @Kamal a great colleague, who I learned a lot about Machine Learning from, among other really skilled colleagues as HolidayCheck (sad day yesterday).

So I proposed an idea to Kamal, I said, what about discovering bugs when rendering my website using some simple machine learning (not sure if such a thing exists). This was the rendering issue Holger pointed out yesterday:

What is wrong with the screenshot above? Well, I want it:

to not wrap the navigation at the top
to have a margin on the left of the text
to have both columns be equal, or in a portrait mode just be a single column.

His first input was unexpected for me, since I just don't have much experience with ML (as if this is totally logical) to use black and white images and let some machine learning do the job.

Why Apply Machine Learning Here?

Ok, let me take a step back.

If all this above was too fast, let me explain my thoughts how I got to the conclusion of why ML can help here. As I had learned in AI for everyone (that Kamal suggested to me) every problem that we humans can spot in one second or less can also be solved by an AI (or here ML).

I was able to find the issue on my site in less than one second, and since I had seen amazing things being done by machine learning I thought this might be a good fit and something I could potentially tackle. Besides it being a problem to have fun with and learn ML with, I also was postponing screenshot testing for my own site all the time, because for the last 10 years I have spent endless hours in screenshot testing, adjusting thresholds, reviewing errors and flaky tests and so did others.

Finally there was a glimpse to fix this problem and hopefully solve it with a new approach. I don't want to:

continuously adjust master screenshots
find visual problems that are no real problems
compare images manually to figure out it was just a tiny diff in the font rendering
and there are many other issues with screenshot comparing imho.

Besides the above I am convinced those kind of VRTs should not be blocking a deployment pipeline but be done after deploy, they are (still) slow. But that is just a side note.

Learning if ML fits

Kamal suggested keras to "it will get you started" he says. Also this for understanding Image classification from scratch. What do all these functions mean? (see Build the model)

model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

I am all lost. I thought it was as simple as throwing some images somewhere and call run(). I was naiv. Kamal says "maybe this helps Introduction to Keras for Engineers".

The Plan to Solve my Problem

Leaving all the ML internals aside. Let me try to explain what is my current plan on how to make use of ML for solving this problem.

I have a plan now how to create images to train the model with, the images i would generate using my site:

images always for different screen sizes, 1024x768, 1400x1900, etc.
images with the latest blog posts
images with older blog posts (1st...10th, 2nd...11th, 3rd..12th, ...) to have different looking sites
images for the different media-queries (portrait, landscape)
images for different font sizes (since you can configure that in your browser)

Of course I need right and wrong images, I need to tag them accordingly first, so the model can learn. Should it be 50/50? Ideally it should be, but doesnt have to be.

What's Next?

Read part 2 about what I tackled next and how I started to go through the keras tutorial to hope to learn how to train my model.

I still have questions like:

Will it really be "easy" and possible to train a model to "understand" my screenshots?
Is this task not too big?
How do I know ML does the task well?
When will I see useful results?

If you are curious read Part 2 about learning Keras.