I implemented something similar (albeit with only two callers) via Twilio Stream Resources. Using these you create individual streams of calls distinguished via call sids. You can then feed these into a web socket server to tie them together and process them in any way you want.
You can find the docs here: https://www.twilio.com/docs/voice/api/stream-resource