79197507

Date: 2024-11-17 15:31:44
Score: 2
Natty:
Report link

Even with spark.sql.caseSensitive set to False, the schema must match the data structure exactly( As the PySpark documentation shows, this configuration applies to only Spark SQL).

The simple way is to transform the schema dynamically without modifying the data. Here is the working code.

from pyspark.sql.types import StructType, StructField, StringType

 
schema = StructType([
    StructField("name", StringType(), True),
    StructField("address", StructType([
        StructField("addressline1", StringType(), True),
        StructField("addressline2", StringType(), True)
    ]), True)
])

 
data = [
    {
        "name": "some name 1",
        "address": {
            "addressLine1": "some address line 1",
            "addressLine2": "some address line 2"
        }
    }
]

 def transform_keys(obj):
    if isinstance(obj, dict):
        return {k.lower(): transform_keys(v) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [transform_keys(i) for i in obj]
    return obj

 transformed_data = [transform_keys(record) for record in data]

 
df = spark.createDataFrame(transformed_data, schema)
 
df.show(truncate=False)

Please let me know if you need more information.

Reasons:
  • RegEx Blacklisted phrase (2.5): Please let me know
  • Long answer (-1):
  • Has code block (-0.5):
  • Low reputation (1):
Posted by: Srinimf