They have a paper for the Mediapipe hand detection feature that talks about the general architecture and methods used. You can see the paper here: https://arxiv.org/abs/2006.10214
The exact weights and model architecture are still not available I am sure.