I am setting up a CDC system that includes Kafka and Debezium to capture changes on the source table (mysql). Then, I will use pyspark to clean the data and finally insert the data into the destination table. Suppose I need to stop the pyspark script for 2 days. When I turn it back on, it will no longer capture the data that appeared during those 2 days. Is there a solution for this issue? `
df = spark.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers", kafka_bootstrap_servers) \
.option("subscribe", topic) \
.option("startingOffsets", "latest") \
.option("group.id", consumer_group_id) \
.load()
`