A lesson from building a recognition model
One nuance we ran into with food recognition specifically is that ingredients rarely appear in a clean, “textbook” state. In the real world, ingredients are sliced, diced, mashed, squeezed, crushed, cooked, or partially combined. That meant we had to train the model to recognise ingredients across many different forms and states, not just whole items. The biggest challenge wasn’t the concept, it was volume and variety: a lot of ingredients, in a lot of conditions, across a lot of images. It reinforced how important real-world data is versus idealised datasets when you want recognition to actually work day-to-day. Curious how others handling vision models deal with highly variable inputs like this.