Reports

The response above is generally correct as long as:

You differentiate the first video packet and the first audio packet. This is important because audio packets generally have the same PTS and DTS (and never use as a start dts a 0 value).

Let me give you a concrete example where this could fail (real example from OBS):

Packet 1 (Video): PTS: 33, DTS: 0 (start_pts=33, start_dts=0) => PTS: 0, DTS: 0 (Here, there is already an error because you are overlapping the decoding time with the presentation time)
Packet 2 (Video): PTS: 100, DTS: 17 => PTS: 67, DTS: 17
Packet 3 (Video): PTS: 66, DTS: 33 => PTS: 33, DTS: 33 (another overlap)
Packet 4 (Audio): PTS: 33, DTS: 33 => PTS: 0, DTS: 33, you are asking the decoder to decode it in the future and present it now.

The error that appears in this case is pts (0) < dts (2970) in stream 1

79356287