Dining on Details: LLM-Guided Expert Networks for Fine-Grained Food Recognition

In the field of fine-grained food recognition, subset learning-based methods offer a strategic approach that groups classes into subsets to guide the training process. Our study introduces a novel approach, referred to as the Dining on Details (DoD), an innovative expert learning framework for food classification. This method ingeniously harnesses the power of large language models to construct subsets of classes within the dataset. The Dining on Details's efficacy is rooted in the robustness of the ImageBind multi-modality embedding space, which can identify meaningful similarities across varied categories. Trained through an end-to-end multi-task learning process, this method enhances performance in the fine-grained food recognition task, showing exceptional prowess with highly similar classes. A key advantage of DoD is its universal compatibility, allowing it to be applied seamlessly to any existing classification architecture. Our comprehensive validation of this method on various food datasets and backbones, both convolutional and transformer-based, reveals competitive results with significant performance gains ranging from 0.5% to 1.61%. Notably, it achieves state-of-the-art results on the Food-101 dataset.

PDF

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Fine-Grained Image Classification Food-101 DoD (SwinV2-B) Accuracy 94.9 # 4

Methods


No methods listed for this paper. Add relevant methods here