As Charlie learns to recognize faces and make educated guesses about age and gender, we want to share some things we learned during our implementation. As always, we open-sourced our test project for anyone to reproduce our findings. Feel free to shoot us a message if you have any questions.
It is not surprising that properly aligning/normalizing an input face will dramatically improve face recognition accuracy. This has already been demonstrated in academic papers and in practice. For example, Guo et al.  noted in their recent paper that face recognition accuracy improves from 95.4% to 97.1% with just five 2D facial landmarks and to 98.5% using 68 3D landmarks. For the sake of efficiency and framerate, we decided to use 5 facial landmarks to preprocess our face input.
What is surprising to us, however, is that proper face alignment actually really helps with age detection.
Left: No alignment, Right: Aligned using 5 facial landmarks
Through our own testing, we observed that the age output is usually very inaccurate and fluctuates constantly by about 8 years on either side. The aligned output is much more accurate and stable. All I can say is that the age output on the right side is much closer to my actual age without revealing how old I am. Also, the fluctuation is smaller at around 4 years on either side.
Because the OAK-D is a stereo camera, we realize that it would be relatively easy to perform liveliness detection. We can potentially use the OAK-D's left grayscale image to perform face detection and then find the corresponding depth values for each of the facial landmarks. By comparing these depth values to a, we can detect whether the input face image is printed on a piece of paper or a "real" face. This is something that we may implement in the future to make Charlie less prone to being fooled.
Additionally, we can also beef up our liveliness detection by asking the subject to smile, make a sad face, and so on. Then, we can use an emotion recognition model to observe the changes in emotion. If the change is too abrupt, then the face is probably fake.
 Guo J., Deng J., Xu, N., and Zafeiriou S., "Stacked Dense U-Nets with Dual Transformers for Robust Face Alignment", arXiv preprint arXiv:1812.01936, 2018 - arxiv.org.