@Lakasz: Those MIN_BY/MAX_BY functions appear to be exactly what I am looking for... but they don't seem to be supported in Pyspark 2.3 (which is in HDP 2.6.5). That's unfortunate.