The base values for each sample represent the model's output for that sample when its features did not exist.
I conducted a test. For each sample, you mask the features (words) from this specific sample in the dataset split. Then, you evaluate that specific sample. The resulting posterior probabilities serve as the base values.
I hope that helps :)