No, the models will be different, including:
Different number of columns will result in different model weights (coefficients)
Intercepts will usually be different
Prediction results and model explanatory power (e.g. R²) will also be different
Because when you add more features, the model will readjust the contribution of all variables to minimize the overall error, which will also affect the optimal solution for the intercept.
LinearRegression()
is a linear regression model, and its learning formula is
y_hat = w0 + w1 * x1 + w2 * x2 + ... + wn * xn
w0
is the intercept
w1
~ wn
is the weight of each feature column
x1
~ xn
features (like: TV、Radio)
therefor, When you change the number of columns in X (that is, the number of features fed into the model), for a linear regression model, it completely affects the learning results of the entire model.
example
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
# Create sample data set (simulated marketing budget and sales
data = pd.DataFrame({
'TV': [230.1, 44.5, 17.2, 151.5, 180.8, 8.7, 57.5, 120.2, 8.6, 199.8],
'Radio': [37.8, 39.3, 45.9, 41.3, 10.8, 48.9, 32.8, 19.6, 2.1, 2.6],
'Newspaper': [69.2, 45.1, 69.3, 58.5, 58.4, 75.0, 23.5, 11.6, 1.0, 21.2],
'Sales': [22.1, 10.4, 9.3, 18.5, 12.9, 7.2, 11.8, 13.2, 4.8, 10.6]
})
# Prepare X, y separately (single feature vs multiple features)
X1 = data[['TV']]
X3 = data[['TV', 'Radio', 'Newspaper']]
y = data['Sales']
# Data segmentation (maintain consistency)
X1_train, X1_test, y_train, y_test = train_test_split(X1, y, test_size=0.3, random_state=42)
X3_train, X3_test, _, _ = train_test_split(X3, y, test_size=0.3, random_state=42)
# Build and train the model
model1 = LinearRegression().fit(X1_train, y_train)
model3 = LinearRegression().fit(X3_train, y_train)
# predict
y_pred1 = model1.predict(X1_test)
y_pred3 = model3.predict(X3_test)
# Output comparison
print("Univariate Model:")
print(f" Intercept: {model1.intercept_:.4f}")
print(f" TV Coefficient: {model1.coef_[0]:.4f}")
print(f" R² : {r2_score(y_test, y_pred1):.4f}")
print("\nMultivariate Model:")
print(f" Intercept: {model3.intercept_:.4f}")
print(f" Coefficients (TV, Radio, Newspaper): {model3.coef_}")
print(f" R²: {r2_score(y_test, y_pred3):.4f}")