Even with spark.sql.caseSensitive set to False, the schema must match the data structure exactly( As the PySpark documentation shows, this configuration applies to only Spark SQL).
The simple way is to transform the schema dynamically without modifying the data. Here is the working code.
from pyspark.sql.types import StructType, StructField, StringType
schema = StructType([
StructField("name", StringType(), True),
StructField("address", StructType([
StructField("addressline1", StringType(), True),
StructField("addressline2", StringType(), True)
]), True)
])
data = [
{
"name": "some name 1",
"address": {
"addressLine1": "some address line 1",
"addressLine2": "some address line 2"
}
}
]
def transform_keys(obj):
if isinstance(obj, dict):
return {k.lower(): transform_keys(v) for k, v in obj.items()}
elif isinstance(obj, list):
return [transform_keys(i) for i in obj]
return obj
transformed_data = [transform_keys(record) for record in data]
df = spark.createDataFrame(transformed_data, schema)
df.show(truncate=False)
Please let me know if you need more information.