| dc.description.abstract | Retail environments require accurate and consistent shelf monitoring to support inventory
management and planogram compliance. However, manual inspection remains inefficient and
error-prone, particularly in modern grocery stores that contain thousands of Stock Keeping
Units (SKUs). As a result, automated vision-based monitoring systems have been increasingly
adopted to replace manual shelf inspection and improve scalability. Moreover, a persistent
challenge in automated retail monitoring systems arises when product variants from the same
brand share nearly identical packaging designs and differ only in weight or grammage. In such
fine-grained scenarios, object detection models frequently confuse product variants, leading to
unreliable inventory data and missed detections. This research addresses the problem of fine-
grained retail product detection under real-world constraints, where data scarcity and class
imbalance limit the effectiveness of deep learning models. To mitigate these challenges, an
optimization-driven detection pipeline is proposed, integrating systematic hyperparameter
tuning, dataset-level context-aware augmentation, and ensemble model post-processing. Two
complementary object detection architectures, YOLOv12 and Faster R-CNN are employed to
capture diverse feature representations. Model training is optimized using hierarchical
hyperparameter tuning with the Optuna Tree-Structured Parzen Estimator (TPE), while data
limitations are addressed through Context-Aware Copy-Paste augmentation. Furthermore,
prediction outputs are fused using Weighted Boxes Fusion (WBF) to improve detection
consistency. Experimental evaluation on real-world retail shelf images demonstrates that the
proposed pipeline reduces error rates compared to baseline detection configurations. Analysis
using confusion matrices shows a reduction in misclassification among product variants. These
findings indicate that combining architecture diversity, systematic optimization, data-centric
augmentation, and ensemble model provides a solution for fine-grained retail product detection
in data-limited retail environments. | en_US |