I faced the same issue in but a peer to peer federated learning architecture. Make sure to do multiple communication rounds because one round of averaging will not help the model learn all the important features, in fact it may dilute some of the features too.