Reports

FYI I would convert to float16 (2x smaller, almost no loss).

But best practical answer (keeps float32 precision at read time, shrinks disk by ~4x, gives millisecond random row fetch):

quantize to float8
- 1 byte per value instead of 4 bytes -> 500gb -> 125gb
- benchmarks (Naamán Huerga-Pérez et al.) shows smaller than 0.3 quality loss.
Store as a single memory-mapped binary file
- one line gives you any subset of rows in ~1-2 ms with no servers no polars.

If you later want even smaller, combine light pca drop to 384 dims + float8 -> 60~GB total. thats the simplest, fastest route to both goals.

79723303