79355927

Date: 2025-01-14 17:57:47
Score: 0.5
Natty:
Report link

What about spark sql only?

Well other way to solve it could be by spark sql and using window functions and group by.

df.registerTempTable('dfTable')
spark.sql("""
    with grp_cte as (
        select 
            id,
            time - row_number() over (partition by id order by time) as grp
        from dfTable           
    ),
    final as (
        select 
            id, count(grp) cnt
        from grp_cte
        group by id, grp
    )
    select 
        id, max(cnt) time
    from final
    group by id
""").show()
Reasons:
  • Long answer (-0.5):
  • Has code block (-0.5):
  • Contains question mark (0.5):
  • Starts with a question (0.5): What
  • Low reputation (0.5):
Posted by: Sr Jefers