Reports

Here’s what I’d try:

Instead of blocking on city OR postal code, block on city AND the first few digits of postal code — this cuts down your candidate pairs a ton.
Convert city and postal code columns to categorical types to speed up equality checks and save memory.
Instead of doing fuzzy matching row-by-row with apply(), try using Rapidfuzz’s batch functions to vectorize street name similarity.
Keep your early stopping logic, but order components by weight so you can bail out faster.
Increase the number of Dask partitions (like 500+) and if possible run on a distributed cluster for better parallelism.

79686199