crafting (and) JavaScript

Face Recognition, Part 1: What is it?

I am trying to manage our family pics so family members can find those they like, build calendars, show the right ones to others and so on. Face recognition is a part of it. To say "show me the best pics with A, B and C" face recognition is a solution. I had started and tried it a lot of times, but I never got to satisfying results. I believe now the tooling exists, the learning material is accessible for non AI experts, like me, and I just want to get it done and learn ML/AI on a topic that I care about.

This article (or series) is just describing my learning path, I don't claim all terms and descriptions are fully correct, though I am trying to read and understand what things mean. I am no ML/AI expert, I am just a developer who wants to understand tools I use. And I often get it wrong.


It was some time ago, that I tried to dive into machine learning or call it AI, as you like. I stopped learning and writing after the fourth part of the series of posts, since my focus had shifted to the new job back then. Now I think it was wrong leaving the topic that early. Damn it.

Back to face recognition. As described in the introduction I want to sort our family pics and part of it is knowing who is on which pic. I believe this is what face recognition is about, but the topic is still a bit blurry to me. Blurry in the sense of how it works and how I can use it exactly. Theoretically I get it, and it seems like a simple AI problem, but I want to get practical. So I believe I need to get a grip on the topic itself and understand what "face recognition" is.

Exploring Face Recognition

I started to ask ChatGPT a lot of times over the last months for tools, strategies and instructions on how to do face recognition right. I had installed a lot of projects, libraries and tools that always felt like not doing the job well enough.
I think one of the first ones I came across was face.api I tried to run it on a couple of images. I actually still was under the impression that I just throw images in a folder, and it somehow does the magic, by sorting faces together and just letting me name them. I had a fuzzy picture of how it all works. But it did for some reason not work out as I wanted, I don't remember what was the reason. It might have been the bloat that comes with the project, I am not sure. I like to use slim, focused tools and libraries and once you need megabytes of code to do one thing I get scared and feel dirty. Maybe that's what made me conclude this is not the way.

FaceDetector API, in the Browser

At the beginning of 2024 I came across the new browser API FaceDetector. It is part of an unofficial draft, so it's not a W3C standard (yet), it was published by the Web Platform Incubator Community Group. Still if you are interested to read the specification it is available as Accelerated Shape Detection in Images. In short, there is hope its coming to the browser soonish. In some chromium based browsers it already works.

I was able to use it right away in Chrome Canary, and it told me where on a pic are the faces. The API is quite concise:

const faceDetector = new FaceDetector();
const faces = await faceDetector.detect($img);
faces.forEach(face => {
  const {top, left, width, height} = face.boundingBox;
  // use/draw the rect for one face

Try it out, in case your browser supports the API.

Concert crowd, image to find faces on
Image by Nicholas Green, on unsplash

I wrote a small gist face-detector.js, feel free to use it in the browser.

Finding the FaceDetector API came as a surprise for me, since I stumbled over it by accident. How can I move on from here? Doing all the heavy lifting in the browser seems not really the solution. Also, I would need to show each picture in the browser and let the browser run its face detection and send it to a server. It might be possible to automate this using a tool like pupeteer, but this does not feel like the right way. Though it's tempting, since the browser is where I am at home. But no this way is a dead end. So I need to dig into server side solutions. And besides detecting faces I can't yet match them, or compare them.

But I learned something important: face detection is part of face recognition.

Server-Side is what I need

So I kept digging, searching, reading and not much later I posted that I came across a bit of code, that I was able to see the parts of face recognition that lead me to the goal. It was after long hours of searching repos, reading code, dismissing seemingly complex solutions that I found Gautam Singh's FaceComparison.py that made me realize what I need.

What is Face Recognition?

Any face recognition tool can not know who is on the picture, of course. But it can tell me person1 is on pic1, pic3 and pic17. I can label the people and person1 will have a name and I can start searching for people on pictures. Until that moment I had thought I had to train a neural network with the pictures of people to make it learn who is who. I was quite wrong. Or better said, I don't necessarily need to do that.

So I got to play with FaceRecognition.py and I saw the function compare_faces() which looked like it does what I may need. I modified the code to take two images of mine and spit out something useful the print(listof) line pretty much at the bottom showed me results like these:

[(True, 0.46), (False, 0.78)]

Because I was running it inside a docker container (I don't fancy trashing my computer with libraries I install and never use again) I had no visual output, so I got an error for the lines below, that wanted to show the image and wait for a key input cv2.imshow() and cv2.waitKey(). So I start modifying the code and learn by doing so, that's also how I like to approach problems, using code, refactoring, naming things and eventually seeing the pattern that had been hidden before.

In the source code I read for the first time something about "facial landmarks", the variable here was called landmarks. I can imagine what this is, but I prefer to know:

In computer science, landmark detection is the process of finding significant landmarks in an image.

wikipedia says. It's basically finding out and giving back in numbers coordinates for where the nose, lips, eyes, etc. are. Aha, very interesting.

So there is a step of face detection and finding landmarks, before even recognizing a face. That makes sense. Now I also know what the landmarks data that the in-browser FaceDetector API returns are good for 🤯.

Face Recognition is ...

Slowly Face Recognition starts making sense. It is split into multiple steps:

  1. Detection: Detecting a face and extracting it from an image.
  2. Identification: Make a face identifiable for example by its facial landmarks.
  3. Recognition: Comparing a face to other faces to find out how similar they are to one another and persons might be recognized depending on the similarity of their faces.

By performing these three steps, I can recognize a face, thus achieving face recognition.

I don't understand how each of the steps works and how much machine learning is needed, but I am getting a feeling for it. I can imagine that step 3, comparing the faces is a pure math problem, we might get some vectors, matrices or alike and "just" need to compare how alike they are to each other. I can imagine the (True, 0.46) does tell me something like that. The True definitely is that some faces match and the 0.46 might be how alike. So I need to find out more about this.

👏 Face recognition deciphered. And the term makes sense now. Now I feel I am on a path to enlightenment 🤩.