However, swapping the order of the tables makes it run in 20-25 sec:
when you do
FROM car_locations
LEFT JOIN app_opening
probably the table car_locations
is much bigger than the app_opening
and Athena could figure it out and do a broadcast join. Check the optimizing join section in this link.
\> Choosing the right join order is critical for better query performance. When you join two tables, specify the larger table on the left side of the join and the smaller table on the right side. For the most common type of joins that use equality conditions, Athena builds a lookup table from the table on the right and distributes it to the worker nodes. It then streams the table on the left, joining rows by looking up matching values in the lookup table. This is called a distributed hash join. Because the lookup table built from the table on the right side is kept in memory, the smaller that table is, the less memory will be used, and the faster the join will run.
This video also has a very good explanation about query plans in Trino, which is the engine that Athena uses underneath. (https://www.youtube.com/watch?v=Yem1eKcj5f8&t=210s)