Reports

It seems that the poor performance in real-world images is mainly due to overexposure on the keyboard and background interference (it looks like the training data has a very uniform background). In practical applications, you may first apply white balance to the image to address the overexposure issue. As for the background, you can initially use Grounding DINO + SAM2 to detect the keyboard area and then use your trained model for detection.

Regarding the model itself, the training phase seems to have performed quite well. Adding real-world data could enhance its robustness. Additionally, you might consider fine-tuning a pre-trained model like YOLO (https://docs.ultralytics.com/zh/tasks/segment/). Wishing you success in your development!

79408395