Seems like I missed to set the following environment variables,
PYSPARK_PYTHON=path\python
PYSPARK_DRIVER_PYTHON=path\python
After setting above variables in the env, everything works fine.
but I still wonder why only the DF created using spark.createDataFrame()
failed, but the one created using spark.read()
worked when the above env variables are missing. Please let me know.