I was thinking of something similar, it would be cool to have one of these! It looks like someone else is attempting one and has their frontend set up but maybe not the backend: https://mixtape.dj/
At any rate, before hitting the API, I would probably start by breaking the audio master into phrases. As you likely know almost all mixable tracks are in 4/4 time. Furthermore, most DJs will mix in phrasing blocks, so this would serve as a good segmentation point as per (1). Looks like these guys have built some functionality for doing this: https://github.com/mixxxdj/mixxx/wiki/ Might have to just mine the srcs looks like it's mostly compiled stuff.
The main issue I can see Shazam running into is deciphering songs during long blends. If you can figure out some way to identify which phrases are blends and remove them from the search area you'll be laughing. Fast cuts should be easy to do with FFT or even echoprint (https://github.com/spotify/echoprint-codegen) just look for a sudden change in the spectrum/print.
Once you've done that, you should have boundary points for the start of each new track. Then you can just feed tracks 1 x 1 into Shazam API.
Ultimately, I think the clincher is ID'ing those long blends. Maybe ML is an approach to that? Should be easy to train on by building a script to literally play 2 random songs at once, over and over in a bajillion diff permutations.
Best of luck! I will be curious as to how this works out :)