See how we are responding to COVID-19 and supporting our employees and customers

Deep Learning for Product Recognition on Retail Store Shelves

Product recognition with deep learning

In retail, there has always been a demand to improve customers’ shopping experience and automate business processes. A series of previous tech breakthroughs like retail product recognition has shaped the in-shop retail industry to the state we all have already got used to.

Probably, the latest commonly used technology in retail is barcode recognition. It made the management of products easier as well as allowed for self-checkout. However, the fact that a barcode on a product can be placed in some random position may slow down the buying process and thus worsen the shopping experience. In addition, it still doesn’t solve the problem that supermarkets need a huge amount of human workforce to conduct inventory and goods management.

Product Image Recognition and Object Detection in Retail

However, with recent advances in AI and machine learning, it now becomes possible to solve those problems and improve the whole retail industry even further. Nowadays, more and more retailers are spending their money on AI, and specifically, on product recognition AI. Based on the study from Juniper Research, global expenditures on AI services will rise from $3.6 billion in 2019 to $12 billion in 2023.

Without a doubt, AI will help revolutionize the retail industry in the near future, but let’s now focus on the problems that can already be solved with the help of AI.

With various electronic devices installed in almost every shop (e.g. CCTV cameras), a tremendous amount of visual data has already been gathered. Hence, the current focus in the retail industry is how to apply computer vision techniques to process and analyse data. Thus, computer vision has become the core domain of interest for the retail industry. There is a strong interest in modern image detection algorithms that could be used to detect individual products on shelves as well as recognition ones that are used to classify the detected object.

Let’s now talk about the problems that could be solved with those techniques:

Planogram Compliance of Products on the Shelf

For instance, product detection can be used to find missing items and notify store staff about this. Such a system could also be used as a notification system when products are misplaced.

Image Recognition
Source: Unsplash

Stock Tracking

It was shown that manual auditing of retail shelves is very time-consuming as well as very error-prone (with up to 20% error rate). To solve this issue, product image recognition and detection technologies can help automate the stock tracking procedure and provide retailers with more reliable data to draw their business processes upon.

Improved Self-Checkout

Robust product detection and recognition can improve the user experience as well as cut down the checkout time. Less waiting means increased customer loyalty, and more sales. Self-checkout encourages upselling. For example, you can offer sunscreen when they purchase flip-flops and beach towels.

Self checkout in retail
Source: Unsplash

Help Visually-Impaired People

It could be very difficult for a visually impaired person to identify packaged foods in a supermarket as well as at home without artificial intelligence image recognition, as many foods share similar packaging and only differ in the text printed on the box and label. Nowadays, this group of people needs special assistance while they are shopping. This makes an ordinary food buying procedure a very challenging task for them. However, with image detection and recognition technologies this experience may become a lot less painful. For instance, an application that reads the label and text on the box out loud can vastly help those people and allow them to do shopping independently.

Fake Product Detection

Using machine learning technologies we can deploy a system that is capable of detecting counterfeit products with a reasonably high accuracy, thus making this process automatic as well. Image detection and product recognition AI solutions help retail businesses authenticate counterfeits and prevent fraud without adding human resources. This not only decreases the risk of human error but cuts costs.

Product fake detection
Source: Unsplash

As we can see, there is an impressive list of directions to improve the retail experience. Now let’s briefly sketch major components of such a product recognition pipeline.

Building a Product Recognition System

There are two key components required to build a product recognition system. They are image detection and recognition.

Product recognition process

So, the first step in such a system is accurate product detection. When choosing the model, you should keep in mind several issues that you may face:

1. Since products on the shelves are usually very densely packed, even state-of-the-art systems may generate many false detections. To mitigate the problem, several works have proposed different pose-processing stages to filter false detection. Here is the result of this post-processing stage proposed in this paper:

False detection filtering

Upper image: Results from original Retina-Net;
Lower image: Results after proposed post-processing step.

Here are the results we obtained with modified Retina-Net model on test images from SKU-110K Dataset:

Test images

Test image

2. Some applications require to perform the detection step in real time. So, you will have to choose the deep learning model appropriately. If speed is an issue in your pipeline, it may be worth considering Yolov5 – one of the most novel architectures for real-time object detection.

Let’s now look into the second step in our pipeline – object recognition. Under object recognition the prediction of the object category (class label) is understood.

Before the advances in deep learning in computer vision this task has been solved using methods based on hand-crafted local image features, e.g. SIFT and SURF algorithms. Although capable of providing reasonably good results, these methods aren’t robust enough for real-world cases, with possible poor lighting conditions and blur. What is more, the locality of the hand-crafted features used in those algorithms doesn’t allow them to capture global information about a product’s appearance.

Another possible direction to perform object recognition is to use ML-based optical character recognition system (OCR). It will extract all the textual information on the product package. To perform recognition, the retrieved text could be matched to the database. The matching procedure should be robust to failures in word recognition, spelling mistakes that could occur.

In more detail, for the matching procedure, N-Grams are commonly used. However, it should be noted that for this direction matching procedure is the main bottleneck. It may cause a lot of false matching among products of similar type, as there are usually a lot of common words on the packaging of such products.

Last but not least, one of the most prominent directions for retail object recognition is visual image search. One may probably wonder why it is visual image search that we propose to use for this task instead of using standard classification initially provided by all detectors. The answer is simple: the number of distinct products we have to classify in a supermarket is so immense that the capacity of object detectors would not be enough to achieve satisfactory results.

Moreover, this standard classification approach could not be extended to novel classes. However, in the case of retail product detection we actually need a classification that could be easily generalized to novel classes, as the product assortment of the supermarket may change quite frequently. For these reasons, CNN-based visual image search algorithms are commonly used. You can find a more detailed discussion of visual search here. To put it short, for visual image search we have the following:


A CNN-based embedder is commonly used to get the fixed-length embedding for each region of interest after the detection stage. The embedding space is constructed in such a way that embeddings for images of the same product lie closer in that space than ones for images of different products. This concept is illustrated in the following image:


The undoubted pros of this approach is that it could generalize to novel products without any retraining of the embedder network. However, it should be noted that it will be true only if the constructed embedding space is good enough.

Database Search

After we obtain an embedding for some region of interest, we still need to perform a search through a product database to find the best match. The matching algorithm usually draws upon k-nearest neighbors search and returns the k top-ranked products from the existing database.

Final Note

Deep Learning and Artificial Intelligence have a huge potential to revolutionize the retail industry. There are a lot of applications that have been already improved with product recognition technology advances. Now they are mainly connected to utilizing modern computer vision algorithms for product detection and recognition.

We briefly described major components of the product recognition pipeline. We also shared our thoughts on what models and technologies could be used to efficiently solve the task of product recognition AI.

Maximize Profitability of Your Retail Business with Custom AI Solutions

Need to develop a product recognition app for your retail store? Set up a call with our tech team to discuss your business challenge.