The surprising result you're seeing where an O(nlogn) algorithm performs faster than an O(n) algorithm is due to several practical factors:
Constant Factors and Lower-Level Operations: Even though the theoretical time complexity of sorting is O(nlogn), the constants involved in sorting (like in Timsort, which is the algorithm used by Python's sort()
) can sometimes outperform O(n) solutions, especially when the input size is small or when the implementation of the O(n) solution involves costly operations.
Efficient Sorting Algorithms: The Timsort algorithm used in Python is highly optimized for practical use cases. It is particularly fast on real-world data, especially if there are ordered or partially ordered sequences in the input. Even though the sorting step theoretically has higher time complexity, in practice, it can run faster because of optimizations that reduce the constant factors.
Set Operations Overhead: In your O(n) solution, you're relying heavily on set operations, specifically in
and add
. While these operations are average O(1), they can sometimes take more time than expected because of factors like hash collisions, dynamic resizing, or poor cache locality when iterating over the set. These operations might not be as fast as they theoretically should be, especially when you're performing a lot of lookups or insertions.
Repeated Operations in the First Algorithm: In your first algorithm, you're doing the following:
while (num + 1) in s:
num += 1
current_streak += 1
This loop could lead to repeated set lookups for numbers that are consecutive. Since you're iterating over nums
and performing a lookup operation for every number in the set, this could end up causing a lot of redundant work. Specifically, for each number, you're incrementing num
and repeatedly checking num + 1
. If there are a lot of consecutive numbers, this can quickly become inefficient.
The time complexity here might still be O(n) in theory, but due to the redundant operations, you're hitting a performance bottleneck, leading to TLE.
Efficiency of the Second Algorithm: In the second algorithm, you've made a few optimizations:
next_num = num + 1
while next_num in nums:
next_num += 1
Here, the check for next_num in nums
is still O(1) on average, and the update to next_num
skips over consecutive numbers directly without performing additional redundant lookups. This change reduces the number of unnecessary checks, improving the algorithm’s performance and avoiding redundant work.
Even though the theoretical time complexity is the same in both cases (O(n)), the second version is faster because it avoids unnecessary operations and works more efficiently with the set lookups.
Impact of Set Operations: In the first solution, you may have faced inefficiencies due to the use of the current_streak
variable and updating num
during iteration. Additionally, by modifying num
in the loop, you're creating potential confusion and inefficient memory access patterns (e.g., reusing the same variable and performing multiple lookups for numbers that are already part of the streak).
The second solution benefits from using next_num
as a separate variable, which simplifies the logic and makes the code more efficient by focusing on skipping over consecutive numbers directly without redundant checks.
O(nlogn) solutions can sometimes perform faster than O(n) in practice due to constant factors, the specific nature of the data, and the efficiency of underlying algorithms like Timsort.
Your first O(n) solution caused TLE due to redundant operations and inefficiencies in how consecutive numbers were processed.
Your second O(n) solution passed because it streamlined the logic, minimized redundant operations, and worked more efficiently with the set data structure.
Optimizing algorithms often involves reducing redundant operations and ensuring that you don't perform the same work multiple times. Even with the same time complexity, how you structure the code and the operations you choose can significantly affect performance.