Reports

The issue occurs because SHAP’s scatter function may improperly handle missing data when using xgb.DMatrix, as it might convert the sparse matrix to dense, leading to zero imputation. To correctly display missing values (e.g., as rug plot markers), you should use the raw input data (numpy array or pandas.DataFrame) instead of xgb.DMatrix when calculating SHAP values. While the model can be trained with DMatrix, passing the original X to the SHAP explainer ensures proper handling of NaN values and accurate scatter plots.

79302461