Over the past few weekends, I built an app which will detect any object you point at it and translate it to another language. For example, if you point your phone at a cup, it will translate it to taza for Spanish. This all happens offline and in realtime. You can download the app free for android.
Is this all really offline?
Yes, using Google’s ML Kit for Android. The model I am using (efficientnet_lite_4) is bundled with the app. However, on startup, you will be prompted to download the language model that you want to use because I did could not find a way to bundle it with the app. But afterwards, all the classification and translation is done offline.
How well does it work?
Eh…good, but not great. It’s very accurate in recognizing objects which are geometrically unambiguous. For example, cars, bananas and computer keyboards are all easy. But for other things, it returns results which are either somewhat related ot plainly wrong. I am using the efficientnet_lite_4 model and it’s trained to detect objects of 1,000 different classes. Of course, there are more than 1,000 different things in the world so this is sort of a limitation within itself.
As for the translation, that’s much better. I just feed it the label (in English) from what was detected, and it outputs the text in another language.
I have only tested this on my Galaxy S10 Plus. The performance on it is pretty good, considering all of what’s happening under the hood. Though, my phone does get a little warm after using the app for a couple of minutes.
This was Surprisingly Easy to build
Speaking of overlays, that bounding box was the hardest thing about the app. It took me awhile to understand how I had to rotate the dimensions and scale it to the rectangular bounds that ML kit was producing for the object. That alone took me a few days to guess and understand.
Right now, the app has a lot of limitations. For starters, if you want to change the language from what you selected, then you will need to go into the app settings and clear the data. This was a little laziness on my part because the support for other languages was a last minute thing. I’m currently learning Spanish so that was my focus. Also, I want to improve the classification and detection of objects. This might require me to modify the model and add in my own training data and re-train. I’m not sure yet.
Still, it was a fun thing to build and I was very surprised that it actually works on some level.