In part 1
I started my naive investigation on how to apply machine learning for
making visual regression tests (VRT) better.
I described the problem to solve, explored Keras very superficially and did also touched on the complexity
of doing ML myself as opposed to having colleagues who are experts and who throw phrases
like "train a model" and "predict" etc. around.
Oh boy, did I underestimate this.
Keras - A Deep Learning API
The above paragraph is gibberish? Let's take a step back again.
Since Kamal had pointed me to Keras I go with the flow, I trust his expertise and I start reading what it is.
Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result as fast as possible is key to doing good research.
- Prepare your data before training a model
- Do data preprocessing
- Build a model that turns your data into useful predictions
- Train your model
- Evaluate your model on a test data
- Speed up training by leveraging multiple GPUs.
- Refine your model
I guess I have to start taking some screenshot, to do step 1 "Prepare data".
The next step is preprocessing data, the guide says:
In general, you should seek to do data preprocessing as part of your model as much as possible, not via an external data preprocessing pipeline.
On the other hand this might cause a lot of data, imagine every image has a million pixels, won't that be slow as hell? So I asked Kamal again, since that was not that clear from the guide:
Me: How do I preprocess my screenshots?
Kamal: Keras preprocessing will do that for you.
Me: I expect images to have many pixels and also varying sizes, do I have to preprocess those?
Kamal: The library takes care of it.
The answer came later in the guide too:
In Keras, you do in-model data preprocessing via preprocessing layers
The key advantage of using Keras preprocessing layers is that they can be included directly into your model, either during training or after training, which makes your models portable.
Makes sense to me. But still feels like it will be computation intensive. But let's see. The guide then lists some code, that looks readable but what's under the hood is magic to me. But let me get through the process first and eventually it will reveal it's magic, I learned that. The alternative would be to go deep into the science behind it, but then I would not get done in the next two years ;).
This is just step three of the eight steps listed above.
A "layer" is a simple input-output transformation (such as the scaling & center-cropping transformations above).
You can think of a model as a "bigger layer" that encompasses multiple sublayers and that can be trained via exposure to data.
Sounds like docker, hehe. Next, some code, I understand:
# "To build models with the Functional API, you start by specifying the shape" # Let's say we expect our inputs to be RGB images (3) of arbitrary size (None, None) inputs = keras.Input(shape=(None, None, 3))
Next are some details about building a model, which will have multiple inputs
model = keras.Model(inputs=inputs, outputs=outputs) but I will need to understand better
once I am coding it.
The next step is to train your model on your data.
# "fit" the data to the model model.fit(numpy_array_of_samples, numpy_array_of_labels, batch_size=32, epochs=10)
Besides the data, you have to specify two key parameters: the batch_size and the number of epochs (iterations on the data). Here our data will get sliced on batches of 32 samples, and the model will iterate 10 times over the data during training.
I am assuming that the labels are where the indicating of right and wrong goes, but I am not sure. I mean I will need to tell the machine what images are good and which ones are bad ones. Reading on, maybe it will reveal soon.
I am getting overwhelmed by reading the next parts, I only understand half of what is going on. Eventually I will have to learn the underlyings I feel.
# I understand this ... though I assume I won't need any of # those when I have screenshots of a website. x = CenterCrop(... x = Rescaling(... # Now it gets tricky. x = layers.Conv2D(... x = layers.MaxPooling2D(... x = layers.GlobalAveragePooling2D(... outputs = layers.Dense(...
Oh my gosh. There is stuff in there that I have no idea what it means and how I would need to adjust it for my use case.
Once you have defined the directed acyclic graph of layers
What did I do?
I found this video https://www.youtube.com/watch?v=qFJeN9V1ZsI which explains in three hours all the things I think I need to know. And it seems a bit more fitting to my (low) state of knowledge.