My journey into the world of IoT began 3 years ago when I was awarded an Intel Galileo from Microsoft for my project idea for the Windows Development Program for IoT, not long after that I was hooked on IoT development and had many projects planned out, in particular, LinGalileo, or now known as TASS. The plan for LinGalileo was to have a live stream from an integrated webcam using OpenCV, facial recognition, assistance from TOA the TechBubble Online Assistant, proximity/sound and heat sensor integration, 3d printed casing using the Micro 3d printer which I was a BETA tester for, integration with what was then known as oIsCore (Online Intelligent Systems Core) - now known as the TechBubble A.I. Platform, and all controlled via the TechBubble GUI.
As I was a lone developer at that point development was slow and steady, work on LinGalileo gradually progressed, TOA advanced steadily, and the TechBubble GUI started to take shape. At the time I was very interested in Artificial Intelligence thanks to my buddy and mentor in the early days, Shawn E.Carter, creator of Annabot and News By Shawn, a news platform powered by A.I. Looking back at my A.I. projects then, oIsCPP, it is almost laughable in comparison to my current projects, but even in the early days I was fully aware of how A.I. would shape the future of oIsCore and TechBubble, and indeed the world.
Fast tracking a year, I had slightly changed the direction of LinGalieo and created TASS, the TechBubble Assisted Security System. Many of the early features were now in place, voice recognition and synthesis through TOA, CCTV using OpenCV and all managed by the TechBubble GUI.
Jumping forward another year, I had left the project behind due to work load and focusing on the A.I. platform, but a chance arose for me to revive the project. My landlady's house was broken into one night while they were asleep, as she knew that I was involved with home automation, she asked me if I could develop her a system that would make her home more secure, and of course I had just the system. I began to work on the project again and within a few weeks I had a prototype and application ready.
THE IOT SOLUTIONS WORLD CONGRESS AND INTEL / MICROSOFT HACKATHON:
Not long after that in October last year I visited the IoT Solutions World Congress and took part in the Intel / Microsoft Hackathon. At the hackathon I joined a team lead by Amir H. Bakhtiary, former Google and UAB intern and current Doctoral Researcher at the Open University of Catalonia, researching the qualities of datasets that makes them suitable for deep learning. Our team developed a project called Project H.E.R. which won the Intel Experts Award at the hackathon for building a deep learning neural network on an Intel Joule, we were the first in the world to ever accomplish this.
After the hackathon the original team gradually went their separate ways, with exception to Katerina Zalamova, now the CMO of TechBubble Technologies Limited. A couple of friends, and very skilled developers, that I met through the HarvardX CS50x course, Andrej Petelin and Irene Naya, finally joined the TechBubble team and we set about pushing forward the progress made on the integration of A.I. and IoT under TechBubble Technologies, more specifically, TASS.
Our first implementation was fairly close to the original version of TASS, although this time we integrated the ability for local training of an eigenface model using Haarcascades on a Raspberry Pi via MQTT messages sent through the TechBubble IoT JumpWay. For a week or so we were quite happy with the progress, but the accuracy was just not good enough. Having previously taken the Stanford Machine Learning course on Coursera, I was eager to put what I had learnt to the test so began researching into different machine learning libraries that would allow our device to become more accurate. Computer vision is still very much a research subject, even more so when you are talking about training a neural network on a device not much bigger than a credit card, our work was cut out for us but, we accepted the challenge and dived in head first.
After some research I decided to move towards Tensorflow, more specifically the Inception V3 model. With this particular model you can use what is called transfer learning to retrain the final layer of the network with your own data set. Google's Inception V3 is a Covolutional Neural Network trained on the ImageNet data set, to put it into easy to understand terms, it was trained for weeks on some of the worlds fastest GPUs, and resulted in a very accurate model that was able to classify images into classes. Through the use of transfer learning, you are able to train your image data set in a matter of minutes or hours on a CPU and achieve high accuracy at classification.
On the 23rd of December after being out with friends for a meal and in a "merry" state, I decided to take the plunge and went on a 24 hour coding spree, with the intentions of not only classifying human faces using the Inception model, but also carrying out the transfer learning directly on a Raspberry Pi. In the early hours of Xmas Eve, I had successfully created a system on a Raspberry Pi 3 that could take a data set of photos of me and Arnold Schwarzenegger and classify between the two of us with almost 100% accuracy. The activation function used for the Inception model is Softmax, this activation function basically calculates the probability that data belongs to a certain class, in the case of Tensorflow's implementation a prediction will return the highest 5 classification attempts which combined will add up to 1, or 100%.
For our solution I had made a dataset of images of myself and Arnold Schwarzenegger labeled in my case by my TechBubble user ID, and in Arnold's case his name, on correct identification of myself, the device would send an MQTT message through the TechBubble IoT JumpWay which would allow the platform to identify me and send autonomous commands to a few IoT devices we had developed such as TAL, the IoT light I had previously demonstrated TOA controlling earlier in the year. For the first few days I was going off the fact that the system could accurately identify whether it was me or Arnie, or any person it had been trained with with amazing accuracy, forgettiing to test what the outcome would be if I passed it an image from a class it had not been trained on, and when I realized this, tested and saw the outcome, it revealed a rather large issue with our project. Our model was still classifying between myself and other people trained with almost 100% accuracy, but was also classifying a panda, Donald Trump and Theresa May as me.
At this point, I had promoted the progress I had made massively, I spent the rest of Xmas and New Year holidays reviewing my training notes, implementing several other methods to rectify the issue including training more data, adding thresholds, swapping the activation functions, SVM's, MNIST & CIFAR-10 implementations (Which were actually pretty successful on the RPI but that is a conversation for another day), and a number of other development boards and languages. According to the internet, we were the first people to ever successfully execute successful transfer learning on a Raspberry Pi, and the progress that the team had made in the last two weeks under very pressured circumstances, was in itself a huge accomplishment, but the setback worried me a lot.
After frantically trying to solve a problem that is still a known boundary in the world of Artificial Intelligence and machine learning, let alone in the world of embedded devices, I remembered some things that Amir said to me during our 24 hour coding spree at the hackathon:
Life is hard!
DUDE, this is still research!
Today I was more focused on fixing the existing issue rather than searching for every other answer other than solving our 1 main (we still have some other existing minor issues), and very annoying fault on our system. I put my attention back onto our Inception implementation. During my research over the last couple of days I read that Tensorflow resizes the images fed to the network to 299 x 299 pixels, (This is not apparent on first glances of the source code), but does not do it in constraint, meaning the images are badly distorted, and hard for the network to classify correctly.
My next step was to create a Python module that would loop through my existing data set, identify faces and crop the images to 299 x 299 pixels leaving the entire image filled by the face and deleting any images that were not recognized using OpenCV and Haarcascades, I then retrained the model on the new version of the data set. Next was to reprogram the classification module, instead of feeding the capture of the live feed directly into the classifier, I modified the program to use the same implementation for modifying the original dataset and tested different thresholds. This resulted in getting pretty accurate results again, which was a massive relief. The results can be seen in the Putty screen shot right/below, and are a massive improvement from the shock horror stage, I only tested against a small set of images in a loop and was my first choice of threshold, increasing the threshold and varying the images will not doubt increase the test results, in addition to more work on the core system and improving on the data sets.
To give myself an ego boost I developed a function that would test the newly trained model against images that it has not been trained on to see how well it is doing, kind of like the evaluation stage but more transparent, this part of the program has been running for the duration of me writing this article and I am fairly happy with the result in comparison to my previous feeling that the sky was falling down.
SO WHAT HAVE I LEARNT?
Life is hard!
DUDE, this is still research!
Even a human has to look more than twice sometimes ;)