Improving Detection of Similarly Designed Retail Products With Size Variants
Abstract
Retail environments require accurate and consistent shelf monitoring to support inventory
management and planogram compliance. However, manual inspection remains inefficient and
error-prone, particularly in modern grocery stores that contain thousands of Stock Keeping
Units (SKUs). As a result, automated vision-based monitoring systems have been increasingly
adopted to replace manual shelf inspection and improve scalability. Moreover, a persistent
challenge in automated retail monitoring systems arises when product variants from the same
brand share nearly identical packaging designs and differ only in weight or grammage. In such
fine-grained scenarios, object detection models frequently confuse product variants, leading to
unreliable inventory data and missed detections. This research addresses the problem of fine-
grained retail product detection under real-world constraints, where data scarcity and class
imbalance limit the effectiveness of deep learning models. To mitigate these challenges, an
optimization-driven detection pipeline is proposed, integrating systematic hyperparameter
tuning, dataset-level context-aware augmentation, and ensemble model post-processing. Two
complementary object detection architectures, YOLOv12 and Faster R-CNN are employed to
capture diverse feature representations. Model training is optimized using hierarchical
hyperparameter tuning with the Optuna Tree-Structured Parzen Estimator (TPE), while data
limitations are addressed through Context-Aware Copy-Paste augmentation. Furthermore,
prediction outputs are fused using Weighted Boxes Fusion (WBF) to improve detection
consistency. Experimental evaluation on real-world retail shelf images demonstrates that the
proposed pipeline reduces error rates compared to baseline detection configurations. Analysis
using confusion matrices shows a reduction in misclassification among product variants. These
findings indicate that combining architecture diversity, systematic optimization, data-centric
augmentation, and ensemble model provides a solution for fine-grained retail product detection
in data-limited retail environments.
Collections
- Informatics Engineering [2553]
