This blog has been dealing with the chat bot topic frequently. We have written about how to build a chatbot using a bot development framework, how to create a chatbot user interface, paying special attention to the conversation flow and language, and even about handing over chatbots built on a specific chatbot platform.
An actual case of building a chatbot was still to be written. In this post, we’ll show how Onix team made one for Telegram and Facebook messenger. Since the demo project included work with an image object detection API, we’ll dedicate the first chapter to image processing/object detection. The second will tell about Dogbi, Onix’ object detection app. If you’re interested in building a chatbot in Python using object recognition, we hope it will be useful.
The technology is related to image processing and computer vision and is used in face detection and recognition, video object co-segmentation, and similar tasks. It deals with detecting instances of objects of a certain class in digital images or videos. All items in a class have particular features and an input image can be compared with a specific object model. For example, shape-based object detection uses the items’ similar shapes to classify them.
Object detection methods are based on either machine learning or deep learning. The ML-based approach defines the class’ features and eventually uses a support vector machine or other techniques for the classification. Deep learning techniques can detect an object in an image without specifically defining the features. They are typically based on convolutional neural networks.
In the project mentioned above, Onix team utilized TensorFlow Object Detection API. It’s a research library whose object detectors follow the deep learning approach. The API empowers developers to build, train, and deploy object detection models for various uses.
The creation of an object detection application starts with assembling a dataset, a collection of images with labels. ImageNet is a valuable service for machine learning and object recognition projects. Moreover, it provides a solution for another need - the bounding box necessary for specifying the location of the object in each image.
Alternatively, use the open-source LabelImg or other tools. Whatever you choose, it must provide you with a folder with .jpg (data)images and .xml (label)file. The latter are eventually converted to .csv format.
If you wish to avoid training an object detection model from scratch for weeks, with a high-end graphics processing unit, use one of available pre-trained models. Choose one, download it, and retrain with your custom dataset, replacing that model’s classes with yours. When you stop TensorFlow training, you can export the latest checkpoint file to a graph file and perform live inference with it.
The resulting model can be used in many ways, e.g., for building a real-time iOS object recognition application or, converted to TensorFlow Lite format, for building an app for Android. Onix team utilized it for a chatbot whose description follows.
The goal was to create a chatbot app which can identify dog breeds using a photo. The user sends a photo of a dog, and Dogbi analyzes the image and responds with an estimate of similarity. The input image is not necessarily a picture of a dog only, and the system’s task is twofold:
1) detect a dog in a picture;
2) guess the breed of the dog using the available knowledge.
Dogbi is using a pre-trained model for dog breeds recognition. The team took advantage of the TensorFlow model and ready-made instructions. (Here you can find the scripts and data for reproducing the breed classification model training, analysis, and inference.) The bot API is written in Python.
You basically have to download the Inception model (a deep neural network pre-trained by Google) and the Stanford Dogs dataset. The latter is using 20K images of 120 dog breeds with class labels and bounding boxes from ImageNet to facilitate a fine-grained image classification.
NB: Assuming that the accuracy of a trained object detection model is directly proportional to the number of images, the team initially used 2-5K images for each dog breed. The dataset was downloaded with a script crawling large sets on ImageNet. It was validating, converting, and changing the file sizes for some ten hours. Surprisingly, a model handling a smaller dataset (100-200 images per breed) turned out to work way better. The result may depend on the dataset quality and the weights in the model.
On top of the Inception model, you have to build a dog breed classification neural network model, train it, and then freeze. The duration depends on the depth of your model and number of epochs. A CSV file with predicted vs. actual breed should be used to analyze precision on the training data. The frozen model is used to classify an image either available on the filesystem or downloadable as an HTTP resource.
The frozen model is ready to be used for image classification tasks. The project can be dockerized with premade docker-compose and .dockerfile. There’s a Docker file in our repository to build the Docker image and run the application.
Here’s how you can build a chatbot for Telegram on your own:
That's it! Dogbi is running in a Docker container on your server!
The function and the model used in Dogbi are simple, but don’t think the technology can be used only to analyze photos or create chatbots. Object detection using Python can recognize foods or work in a system that tells what it sees in real time, to name a few.
Object detection has been used for vehicle detection, security systems, and self-driving cars. The use cases should be extended to image segmentation and distance estimation soon. Real-time object recognition on the camera screen (e.g., to help visually impaired individuals) and more challenging applications and goals lay ahead.