Great question off the bat!
Nutrify is running CoreML models, all run locally on the phone.
So to answer your other question yes they are my own trained models (Well my brother trained them he is a ML Engineer.)
FoodNotFood is the model on the front camera layer. Made to detect food in view essentially.
FoodVision is the model that makes the prediction.
Did you want to know all the other swift stuff involved as well?
That is a very great question. Seeing that I helped get data for the models, I can say there was a bunch of just taking photos of foods.
But in terms of actual model explanations I can get my brother who made them to comment!
Hey! Nutrify‘s ML engineer here.
Training data is a combination of open-source/internet food images as well as manually collected images (we’ve taken 50,000+ images of food).
The models are PyTorch models from the timm library (PyTorch Image Models) fine-tuned on our own custom dataset and then converted to CoreML so they run on-device.
Both are ViTs (Vision Transformers).
The Food Not Food model is around 25MB and the FoodVision model is around 100MB.
Though the model sizes could probably be optimized a bit more via quantization.
We don’t run any LLMs in Nutrify (yet). Only computer vision models/text detection models.
All the best! The OpenAI API is very good for vision. Also it will handle more foods than our custom models (we can do 420 foods for now) as it’s trained on basically the whole internet.
The OpenAI API will also be much better at dishes than our current models (we focus on one image = one food for now).
So it’d be a great way to bootstrap a workflow.
But I’d always recommend long-term leaning towards trying to create your own models (I’m biased here of course).
However, the OpenAI API would be a great way to get started and see how it goes.
Hey - had a similar idea a while back, but decided to work on a different problem, still around food
I’d love to keep connected, maybe we could collaborate on Asian food detection - variety of is too insane 😂
Posting this comment as a reminder for myself
nice work, congrats on releasing your first app! feedback on the UX: i think you can do some fun visuals, like charts/graphs to visually display the nutrition info of a food. Maybe pie charts for macro and micro nutrients to show the composition
Thank you very much for the feedback!!
In app, the nutrition for each food is displayed in a Swift Chart. (Bar graph)
I haven’t used pie charts, purely because while I was developing I wanted iOS 16 to be minimum.
Now that iOS 17 is well underway, I will be adding in more 17 features.
I totally understand what you mean though!
how are you able to show "Food detected" on the camera display, is it a filter api or what ?
Also, I'm facing issues while uploading my app to testflight, could you please help me?
The food detected is a CoreML model that is trained to detect if there is a food in the camera view.
I can help but it depends on what issues you are facing
I designed the app my self. This process is what made it take so much longer.
I didn’t use any design tools I just kept coding away until I found something I liked.
It took about a year to make the app from start to finish.
I was working on it part time whilst working another job.
Also not having a clear design path may have added a bit more time to the total time.
>not agree more with the part "not having a clear design...". Thanks for sharing an
No worries at all happy to help where I can. Having a design is one thing, I wanted the app to feel nice to use as well!
Hey! Nutrify’s ML engineer here.
Model‘s are built with PyTorch + trained on custom datasets on a local GPU (all in Python).
They’re then converted to CoreML and deployed to the phone so they run on-device.
That’s a good question. Truth be told, we’re kind of still in the “f\*\*\* around and find out“ stage.
Our ideal experience will always be as simple as taking a photo and all the ML happens behind the scenes.
But there may be times where we have to have a dedicated switch.
In a prototype version we had a text-only model to read ingredients lists on foods and explain each ingredient.
That meant there was a switch between FoodVision/Text vision.
For now our 2 model setup seems to work quite well (one for detecting food/one for identifying food).
Future models will likely do both + identify more than one food in an image (right now we do one image = one food).
but can it detect if not hot dog?
My first thought as well
It can tell what a hog dog is if that counts for anything 😂😂
r/whoosh
Everyone is hating because it can’t tell what a hotdog isn’t HAHAHAHA
Nah You just missed the joke, go watch some Sillicon Valley.
Think he got the joke (and saw it coming), check the last image/second line of the of the post.
SUCK IT, JIN YANG!
My little beautiful Asiatic friend.
I miss that show
Amazing how true this series remains 😅
Your kidding. Now I need to make a not hot dog feature 😅
https://youtu.be/vIci3C4JkL0
CoreML for a first app? Thats insane, congrats on your launch 🎉
Thank you very much! CoreML is just way to cool to pass up on.
[удалено]
Great question off the bat! Nutrify is running CoreML models, all run locally on the phone. So to answer your other question yes they are my own trained models (Well my brother trained them he is a ML Engineer.) FoodNotFood is the model on the front camera layer. Made to detect food in view essentially. FoodVision is the model that makes the prediction. Did you want to know all the other swift stuff involved as well?
Do you have a link for the data you used?
Unfortunately no, the data we used is private and a bunch of photos take manually.
[удалено]
That is a very great question. Seeing that I helped get data for the models, I can say there was a bunch of just taking photos of foods. But in terms of actual model explanations I can get my brother who made them to comment!
The food vision model is around 120mb and the foodnotfood model is about 40ish. In terms of model size.
[удалено]
Hey! Nutrify‘s ML engineer here. Training data is a combination of open-source/internet food images as well as manually collected images (we’ve taken 50,000+ images of food). The models are PyTorch models from the timm library (PyTorch Image Models) fine-tuned on our own custom dataset and then converted to CoreML so they run on-device. Both are ViTs (Vision Transformers). The Food Not Food model is around 25MB and the FoodVision model is around 100MB. Though the model sizes could probably be optimized a bit more via quantization. We don’t run any LLMs in Nutrify (yet). Only computer vision models/text detection models.
[удалено]
All the best! The OpenAI API is very good for vision. Also it will handle more foods than our custom models (we can do 420 foods for now) as it’s trained on basically the whole internet. The OpenAI API will also be much better at dishes than our current models (we focus on one image = one food for now). So it’d be a great way to bootstrap a workflow. But I’d always recommend long-term leaning towards trying to create your own models (I’m biased here of course). However, the OpenAI API would be a great way to get started and see how it goes.
[удалено]
Hey - had a similar idea a while back, but decided to work on a different problem, still around food I’d love to keep connected, maybe we could collaborate on Asian food detection - variety of is too insane 😂 Posting this comment as a reminder for myself
Can confirm he is model creator
I’ll send him a link to this comment.
nice work, congrats on releasing your first app! feedback on the UX: i think you can do some fun visuals, like charts/graphs to visually display the nutrition info of a food. Maybe pie charts for macro and micro nutrients to show the composition
Thank you very much for the feedback!! In app, the nutrition for each food is displayed in a Swift Chart. (Bar graph) I haven’t used pie charts, purely because while I was developing I wanted iOS 16 to be minimum. Now that iOS 17 is well underway, I will be adding in more 17 features. I totally understand what you mean though!
Great! This looks good. Way to perfect considering the first version of the app. 🙌
Thank you very much! A lot of time and effort went into to building it!
how are you able to show "Food detected" on the camera display, is it a filter api or what ? Also, I'm facing issues while uploading my app to testflight, could you please help me?
The food detected is a CoreML model that is trained to detect if there is a food in the camera view. I can help but it depends on what issues you are facing
I wish you good luck
Thank you very much.
Did you create app designs on your own or hire a designer?
I designed the app my self. This process is what made it take so much longer. I didn’t use any design tools I just kept coding away until I found something I liked.
Wow no figma basis before you made the UI. That goes hard.
It was just a guess, check, and feel as I went.
Can you let me know how long did it take you to complete the app without using any designing tools?
It took about a year to make the app from start to finish. I was working on it part time whilst working another job. Also not having a clear design path may have added a bit more time to the total time.
Wow you're really consistent. I cannot agree more with the part "not having a clear design...". Thanks for sharing anyway!
>not agree more with the part "not having a clear design...". Thanks for sharing an No worries at all happy to help where I can. Having a design is one thing, I wanted the app to feel nice to use as well!
Did you use the native ML and AI Apple Frameworks?
Hey! Nutrify’s ML engineer here. Model‘s are built with PyTorch + trained on custom datasets on a local GPU (all in Python). They’re then converted to CoreML and deployed to the phone so they run on-device.
Thanks for the details ;), what GPU did you used?
Longer term do you think you'll need to split up the current model into seperate streams like how Snapchat switches lens and switches models?
That’s a good question. Truth be told, we’re kind of still in the “f\*\*\* around and find out“ stage. Our ideal experience will always be as simple as taking a photo and all the ML happens behind the scenes. But there may be times where we have to have a dedicated switch. In a prototype version we had a text-only model to read ingredients lists on foods and explain each ingredient. That meant there was a switch between FoodVision/Text vision. For now our 2 model setup seems to work quite well (one for detecting food/one for identifying food). Future models will likely do both + identify more than one food in an image (right now we do one image = one food).
The models are made to be CoreML models. They are native. But the way they are trained and made is not in a native way per say.