Label smoothing is to fill the same value into distribution except 'true label index' and 'pad index'. So -2 comes from true index and pad index.