79621632

Date: 2025-05-14 14:00:16
Score: 0.5
Natty:
Report link

However, swapping the order of the tables makes it run in 20-25 sec:

when you do

FROM car_locations
LEFT JOIN app_opening

probably the table car_locations is much bigger than the app_opening and Athena could figure it out and do a broadcast join. Check the optimizing join section in this link.
\> Choosing the right join order is critical for better query performance. When you join two tables, specify the larger table on the left side of the join and the smaller table on the right side. For the most common type of joins that use equality conditions, Athena builds a lookup table from the table on the right and distributes it to the worker nodes. It then streams the table on the left, joining rows by looking up matching values in the lookup table. This is called a distributed hash join. Because the lookup table built from the table on the right side is kept in memory, the smaller that table is, the less memory will be used, and the faster the join will run.

This video also has a very good explanation about query plans in Trino, which is the engine that Athena uses underneath. (https://www.youtube.com/watch?v=Yem1eKcj5f8&t=210s)

Reasons:
  • Blacklisted phrase (1): youtube.com
  • Blacklisted phrase (1): this link
  • Blacklisted phrase (1): This video
  • Long answer (-1):
  • Has code block (-0.5):
  • High reputation (-1):
Posted by: Felipe