I was able to resolve the issues in my Java implementation of YOLOv5 object detection after some debugging and corrections. Issues Faced and Solutions:
(1) Incorrect Class IDs:
Initially, all detections returned class ID 0. As pointed out in the helpful answer by Ariya (thank you so much!), I was mistakenly extracting the class ID using int classId = (int) row[5];. This was incorrect because YOLOv5 outputs multiple class confidence scores per detection:
Row format: [x, y, w, h, objectness, class1_conf, class2_conf, ..., class_n_conf]
The correct approach is to:
Extract the maximum confidence score among the class probabilities.
Identify the index of the highest confidence, which corresponds to the class ID. (2)Bounding Box Offset
The bounding boxes were incorrectly placed due to improper scaling. The original Java implementation did not correctly map the YOLO output coordinates back to the original image dimensions. The solution involved:
Adjusting for YOLOv5’s normalization and scaling.
Applying proper padding correction using the original width and height.
Fixed Code:
public static int getPredictedClassID(float[] row) {
int classID = -1;
float maxConfidence = -Float.MAX_VALUE;
for (int i = 5; i < row.length; i++) {
if (row[i] > maxConfidence) {
maxConfidence = row[i];
classID = i - 5; // Adjust index to match class label
}
}
return classID;
}
(2)Bounding Box Offset
The bounding boxes were incorrectly placed due to improper scaling. The original Java implementation did not correctly map the YOLO output coordinates back to the original image dimensions. The solution involved:
Adjusting for YOLOv5’s normalization and scaling.
Applying proper padding correction using the original width and height.
Fixed code:
float cx = row[0] * origWidth / inputSize;
float cy = row[1] * origHeight / inputSize;
float w = row[2] * origWidth / inputSize;
float h = row[3] * origHeight / inputSize;
float x1 = cx - w / 2;
float y1 = cy - h / 2;
float x2 = cx + w / 2;
float y2 = cy + h / 2;
(3)Implemented Non-Maximum Suppression (NMS)
To filter out redundant overlapping boxes, I implemented NMS to keep only the best detections per object.
NMS Implementation:
public static List<float[]> nonMaxSuppression(List<float[]> detections, float confThreshold, float iouThreshold) {
List<float[]> filteredDetections = new ArrayList<>();
detections.sort((a, b) -> Float.compare(b[4], a[4])); // Sort by confidence
while (!detections.isEmpty()) {
float[] best = detections.remove(0);
filteredDetections.add(best);
detections.removeIf(det -> iou(best, det) > iouThreshold);
}
return filteredDetections;
}
I hope this helps others facing similar issues!
Note: These code snippets are not fully optimized and can be improved. The final confidence score in my case is objection*class_i_confidence.
Sources: https://medium.com/@pooja123tambe/yolov5-inferencing-on-onnxruntime-and-opencv-dnn-d20e4c52dc31