Computer vision and machine learning have been a victim of glorification since inception but we have come a long way since Torch was released in 2002.
It can do wonders, until you hit a limitation you can’t overcome today.
I have been leading a year long project between an AI unicorn company based out of US and a food and beverage company.
The idea is to analyse millions of images that are regularly collected from retail market and calculate market share of products placed in shelves, racks and coolers by leveraging object detection, classification and segmentation techniques in machine learning.
It was pleasantly convenient to get to a stage where products were identified with up to 65% accuracy and then we started to realise the real magnitude of limitations.
Does it know how empty a box is?
Not quite so often! The pictures we take from our favourite smartphones are typically flat. It is not simple for computer vision to estimate how deep a shelf is, or how empty a box is, in terms of a percentage.
It is a complex science with not-so-promising results. Good news is that smartphone and camera manufacturers are now embedding sophisticated depth sensors in high-end devices and it is spreading fast.
Does it know if its a large water bottle or a small one?
Not always! Imagine a small bottle placed on a table, a human will instantly know it’s a small one because of its relativity to the size of the table.
Computer vision first needs identify the object as a table and then it needs to know the dimensions of the table, just to identify if it’s a small bottle or a large one. Let’s say we train it for the table, but the bottle can be anywhere right? The possibilities are endless.
How easy it is to train your model?
It starts with an easy and exciting win when you train your machine learning program for the first time with a few hundred images until you realise that you need thousands of images, may be even hundreds of thousands, to get to the result you expect and the project starts to get really expensive.
Machine learning is as good as it can be today and its applications are endless even with the limitations.
Anas Wahab MBCS CITP, ITCP.
Director of Product Development at S4 Digital