Artificial Intelligence is becoming unusually popular. Wherever we turn, we can hear that the surrounding devices are becoming intelligent. And that’s really true – Artificial Intelligence (AI) and Machine Learning (ML) have made tremendous progress in the recent years. All this sound like Science Fiction – things that 10 years ago were thought to be incredibly difficult, you can now try to resolve (better or worse) using unusually comfortable tools. This opens for us – programmers – incredible possibilities. One of the popular topics in the AI domain is teaching machines to see.
Image recognition by machines has a long history. One of the most interesting initiatives in this field is the global challenge based on a huge set of photographs (about 1,4 m images) initiated by ImageNet – Image Net Large Scale Visual Recognition Challenge (ILSCVRC). The goal of the competition is to correctly recognize the theme of the photo and assign an appropriate description (category), out of about 1000 possibilities. Every year challenge results show how dynamically this is evolving. Since 2010 till today the effectiveness of the models has increased from ~72% to ~97%. To give an idea of how difficult this is, some time ago it was decided to test the effectiveness of a human and it was at about 95%. This means that image classification algorithms outdo (and if not, almost match) the possibilities of human perception. Isn’t that incredible?
Technological Progress – where we are today…
The results of image recognition research are publicly available. ML and AI are hitting the streets. Popular frameworks and technologies (such as PyTorch, Tensorflow + Keras), that developers and analysts can use are publicly available. There are sandpit cloud environments available for training AI models, giving access to computing power on demand that until not long ago was available only to a narrow group of scientists. And finally, there are pre-trained models available, capable of identifying thousands of objects on images and classifying with a high probability. You can just start using them, right? Not exactly – it’s not that easy.
Is there one pattern?
Artificial Intelligence can resolve a wide range of image recognition problems. And – unfortunately – very often standard, publicly available models that can resolve the most popular ones (ex. people and cars) aren’t sufficient. Imagine recognizing disease symptoms on RTG or USG images, discover a crack in a blade of an aircraft engine on a photo done by an endoscope or try to find out from a picture what a car driver is doing at a given point in time. None of the available models can give us an answer for any of the above questions, but it is true that we can partially use their results (Transfer Learning is a topic for a separate article). Specific problems have to be addressed individually. And this is where we come in – AI/ML Programmers and Data Analysts. And there is no one universal way to get there. They all depend on many factors, such as:
- quality and quantity of data we are working on
- model requirements (often a compromise between “speed” and “precision”)
- the difficulty of the problem (amount of information that has to be processed from the image
Each of the above factors can be addressed individually, nevertheless, selecting the right strategy for building a model requires a lot of work and often many experiments. And let’s be clear, this work could never end, especially for AI/ML enthusiast. Such is the nature of Artificial Intelligence, that we can always improve what we’ve already created. It doesn’t mean that we cannot feel excited and satisfied – sometimes even from the first, unexpected effects of our work. Ultimately, even when we feel that we could endlessly improve our models, it often gives our customers enough value that they can say “we’ve got this”.
Gfi and Machine Learning
GFI is working on innovative topics related to Artificial Intelligence in FabLab laboratories spread all over the world. Image recognition, that we are working on among others are:
- Visual identity verification in areas of increased access control – a solution that triggers alarms in case of an unidentified individual or a mismatch between badge data and camera image
- Monitoring of the number of people at gatherings, ex. monitoring queue lengths in shops
- Experiments around identifying microcracks in endoscopic images of aircraft engines
- Experiments around identifying people behaviour, ex. monitoring elders (identifying a “fall” event at home), or identifying what is a car driver doing (ex. raise an alarm when using a phone while driving).
Artificial Intelligence in image recognition will be constantly evolving and Gfi has ambitions to be an important part of this history. Do you have a lot of data and an interesting idea in image recognition? Contact us, together we can build a solution and create value that not so long ago, nobody had even dreamt of.