For my application, the output of yolo model differs from the camera input tensor, because output contains confidence scores and bounding boxes. It is just "description" for your image and you can't visualise them on image without your image, as it is not the case with yolo segmentation model whose output can be interpreted to see every pixel (but in android app reduced to 160x160px).
Further analysis of your output is similar to another I just answered, so it can be seen here: How to interpret output tensor from YOLOv8 web model.