79814616

Date: 2025-11-09 05:44:33
Score: 0.5
Natty:
Report link

Dimensionality reduction ended up working a lot, stopped my code from crashing and reduced the column size down to 150 from 34000 (after encoding with one hot encoder). I used a pipeline with columntransformer with a sparse output and truncated svd, code posted below:

categorical = ["developers", "publishers", "categories", "genres", "tags"]
numeric = ["price", "windows", "mac", "linux"]

ct = ColumnTransformer(transformers=[("ohe", OneHotEncoder(handle_unknown='ignore', sparse_output=True), categorical)],
    remainder='passthrough',   
    sparse_threshold=0.0       
)
svd = TruncatedSVD(n_components = 150, random_state=42) 
pipeline = Pipeline([("ct", ct), ("svd", svd), ("clf", BernoulliNB())]) 
X = randomizedDf[categorical + numeric]
y = randomizedDf['recommendation']

this brought my shape down to (11200, 300) for training data.

Reasons:
  • Long answer (-0.5):
  • Has code block (-0.5):
  • Self-answer (0.5):
  • Low reputation (1):
Posted by: Luciano Elish