Where Do I even start?

Ok now that I read into this more than 5 seconds I understand that YOLO is implemented by many Machine Learning frameworks but I’m sticking to the OG Darknet implementation as that sounds really edgy and their logo is a pentagram. Nice.

Setup Darknet
Setup tagging software
Get 2000 Images to train
Label 100 Images
Train Yolo on the 100 Images
Use okaish-100-image-model to help label the rest so I only have to sit there and adjust the bounding boxes
Train Yolo on the full 2000 images

Spoiler: If you can, spend money on getting someone else to do the tagging for you. There are multiple services that do that ( amazon mechanical turk for example). Every second sitting there and labeling the 100 image batch I thought to myself: Is this what my life has come to? Why am I not doing literally anything else right now?

Setup Darknet

CMake >= 3.12
CUDA >= 10.0
cuDNN >= 7.0

Just 3 Dependencies? Easy Peasy. Not really. Because Nvidia makes it a pain to the a hold of the cuDNN dependencies without signing up. Also a colleague of mine told me that even though you may think that you have the right cuDNN version many ML most frameworks will not indicate if you are using the right one or not. Which results in longer training times while you believe that is the fastest you can go

Get 2000 Images to train

Done. I’m a data hoarder and have around 1300 comic like books, most already english and some in japanese, so I will simply move 2000 images out of my zip files with a little helper program. I tried to achieve a mix of 1000 japanese and 1000 non japanese so the detection gets more varied.

Sudden Summary

I forgot how the rest went

I worked on the speech bubble detection and wrote half of this blog post in November of 2020, right now we have January 2021 and I cannot remember half of what I did to get my speech bubble detection going. Due to end of the year crunch I was too stressed out to work on this in my free time and I also got myself a new Desktop at home and manage to forget to backup one itsy bitsy tiny set of data from my old PC: All the training temp folders I made. I blindly trusted to have everything related to my projects under version control but with ML there’s a huge caveat: ML models are too big for any free VC like github. Well at least I can speak about the issue on how to host the ML model :v

The result

I remember that even with just the 100 image model the accuracy of the detection wasn’t half bad. It was enough that the detected regions were slightly worse by about 10% compared to google. Which was at that time good enough for me. I tried to resolve myself on labeling about 100 images a day and did that for about 4 days but then the work crunch hit and I lost any motivation. If anybody is interested in the “finished” scuffed Yolo Models:

What it can detect is: Speech Bubbles, Text Boxes, and Text. (They overlap most of the time or sometimes not)

https://cdn.aris.moe/yolo/v1/yolo-manga.cfg

https://cdn.aris.moe/yolo/v1/yolo-manga.names

https://cdn.aris.moe/yolo/v1/yolo-manga.weights

The Detection rate isn’t great and I might revisit this in the future now that I have a non shit GPU but yeah.

Labeling tools I tried:

After looking around the label tool I wanted should have fulfilled the following requirements:

Good Keyboard/Mouse Workflow

When you are labeling 3 classes per image, occurring around 15 times, and are going for 2000 images, you want the workflow to be as efficient as possible. Any unnecessary click you save is a neuron more saved from insanity.

“Reinforced training”/ Auto Labeling

What would be cool is to instantly use the existing 100 image model to help pre select everything in future unlabeled images and only having to either adjust the classes or bounding boxes

Not be a pain in the ass to setup or use with darknet

THATS IT. there’s nothing more I wanted and for some reason nothing I could would fulfill any of them without either compromises or additional scripting work.

Microsoft Vott

Pro

REALLY great UX/Workflow, hands down best there is
Easy to setup and manage data
Supports auto labeling with trained model

Cons

Doesn’t support Yolo v3 or v4 export. Would need to write own converter
Couldn’t figure out how to setup the auto labeling

DeepLabel

Pro

Good Workflow
Easy to setup
Supports auto labeling with trained model and works with yolo out of the box

Cons

A bit peculiar in how it wants its data organized
Can’t resize bounding boxes

The following I looked into and dismissed either due to Workflow issues:

Alturos.ImageAnnotation

BMW-Labeltool-Lite

Final Choice

DeepLabel

Really the only viable solution out of them all.

Translation Overlay 5: Training YOLO to detect speech bubbles

TOC

Where Do I even start?

Setup Darknet

Get 2000 Images to train

Sudden Summary

I forgot how the rest went

The result

Labeling tools I tried:

Pro

Cons

Pro

Cons

Final Choice

FEATURED TAGS