Thank you very much, I've tried some things and indeed, adding "-O3" flag for optimization, lowered calculation time from around 3.4s per matrix to 0.02s per matrix - I've never seen such great change with that flag, so never bothered to use it since it makes debugging harder as I heard. Ni the end, with -O3 code is around twice as fast as python code, so exactly what I was counting on. Thank you!