This question was asked forever ago, but for posterity and in case you still want an answer, PaliGemma's segmentation outputs are special "soft" tokens that come from a special visual encoder described here.
To parse them into meaningful coordinates, Google has an example here.