Apologies but, having read the answers to the original question this one occurred to me.
Let's talk C instead of javascript and reference the Quake fast inverse square root algorithm...
What's the speed like if your multiply function takes 2 (or an array of) pointers as its input parameters and ...
Fkr each input pointer (/array of)
.. or just return the result as a (/array of) pointer(s) to floats with the caller reading it using a (array of) pointer to original type?