(Have not enough reputation to write this as a comment... so here my message as an "answer")
How does longwaller's answer solve the problem? How do I identify those "slices" and put them together?
My 4K HEVC video stream contains a lot of fragmented NAL units, i.e. NAL units that are spread over multiple RTP payloads (where each payload starts with a header that identifies them as such and two flags, one set for the first fragment and one for the last). As I understand it, I need to collect all fragments and concatenate them to a valid NAL unit before sending them to the decoder. But where do "slices" come into play?