Image segmentation & Detectron 2

As part of an ongoing collaboration with a couple of clients I recently had to explain what is possible with image segmentation to help speed up their data annotations, which will underpin the data looking to fine tune their own generative AI models. Personally, this is an area of data science that I love so I I was happy to write down and explain each of the different segmentation approaches and examples of where I have used them recently.

Classification

As a quick explainer for anyone who is not familiar with computer vision tasks, image segmentation starts at the most basic level at classification, does this image contain a specific object. In the example below it would be something like does this image contain a horse?

Personally, I use this check to fail fast in my algorithms to only pass along the images that need to be analysed in more detail when they contain the object/s I am looking for and save on GPU resources and processing time.

Semantic segmentation

The next level up from classification is semantic segmentation. Semantic segmentation is where every pixel in an image is is evaluated to detect if it belongs to the same class as another pixel & separates the image into each class. What does this actually mean? As shown in the picture below, there were several people identified and all of them have been identified in the same class and highlighted in red.

Obviously this model, did not identify the horse, but it did identify the sidewalk, the fence, the pole and trees (labeled vegetation) in the background and coloured coded each class.

Semantic segmentation grouping all pixels into classes

Object detection

Object detection goes one level higher and identifies all distinct objects within the image. These normally sit in the foreground of the image. Object detection draws a bounding box around the specific object and gives it a label as shown in the picture below.

Instance detection

Detectron - 2 goes another step further than object detection to identify the exact shape of each object by the pixel boundary of it and thus tracks it shapes, rather than just a place a box around the object.

Distinguishing the pixel boundary of each object is referred to as instance segmentation and can be seen clearly in the horse in the above picture.

Panoptic segementation

If we compare instance and semantic segmentation images, we can see the differences. Semantic covers every pixel in the image and even segments the background classes within the image. Instance segmentation focuses on the instances, identifies the number of them and their exact shape without focusing on the background classes or areas without a distinct instance.

If we combine both, Panoptic segmentation is achieved. Panoptic looks to segment images for both classes and instances. It identifies all the distinct objects and classifies the remaining aspects of the picture into classes. By accounting for every pixel and segmenting by either a class or an instance, panoptic segmentation also therefore doesn’t allow instances to overlap rather the instance in the foreground gets assigned to the class. It also gives the most information about the image where we can pull out background classes, distinct objects with their exact boundaries and labels.

The above image was segmented via Detectron 2 and was an image from Microsoft’s COCO (Common Objects in Context) dataset.

Now that we understand and appreciate the differences between each segmentation approach, we can now test out the panoptic detectron 2 model to evaluate how good it is on our use cases or whether we need to fine tune the model to suit our use cases.

Detectron 2 - panoptic segmentation

The images above was taken from the COCO dataset which Detectron 2 was trained on. It is not surprising that it did very well to segment the objects and classes as required. The first test I wanted to take the model through, was looking at the classes within the COCO dataset and find an image which contained those classes and not within the COCO dataset.

Given our time at work exploring brand mockups as part of the work for Seamless Studio, this was the first use case I wanted to test Detectron 2 against.

Brand mockups

The image taken from Mockup Maison was chosen as a test case because all the objects were contained within the COCO dataset. The image itself if pretty simple with the laptop placed on a chair, next to a wooden table. The test was to determine if the model would identify:

Laptop
Chair
Table
Wall (background class)

Results:

Looking at the results, the model has done a pretty accurate job to identify the instance and classes and found a few more:

Laptop
Keyboard
Chair
Wall
Table

It easily found the laptop class and got the boundary correct. It got the chair mostly correct, but added in a table object underneath the laptop, rather than part of the chair. It also got the wall background class correct but introduced a wall-wood class on the table leg with part of the table correct. It also did blend the table into the wall class. All in all, this is pretty good for out of the box results.

Initially, we would require minimal fine tuning on the table & chair and the results are most likely due to images within the training dataset.

Characters

The next test I wanted to explore was pulling out well-recognised Mickey Mouse character. I was curious to see how the model would identify the character and whether it would classify it was a person, animal or something else. Mickey Mouse was a not a distinct class on its own nor as far as I could tell was any input image in the COCO dataset.

Results:

This was really interesting to see a few things jump off the page. It got confused with the all-white background and labelled more than 50% of it as sky. The disproportionate shoes were labelled as frisbees and Mickey himself was not identified at all. Again this is not not a large concern, but simply shows that to correctly detect Mickey, we would need to fine-tune the model and teach it.

Thanks for following along!