79296745

Date: 2024-12-20 09:32:18
Score: 1.5
Natty:
Report link

Here's an in-Python solution based on @0_0's suggestion (cross-post from HDF5 file grows in size after overwriting the pandas dataframe):

def cache_repack(cachefile='/tmp/influx2web_store.h5'):
    """
    Clean up cache, HDF5 does not reclaim space automatically, so once per run repack the data to do so. 
    See 
    1. https://stackoverflow.com/questions/33101797/hdf5-file-grows-in-size-after-overwriting-the-pandas-dataframe
    2. https://stackoverflow.com/questions/21090243/release-hdf5-disk-memory-after-table-or-node-removal-with-pytables-or-pandas
    3. https://pandas.pydata.org/docs/user_guide/io.html#delete-from-a-table
    4. http://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-hdf5-ptrepack  
    """
    from subprocess import call
    outcachefile = cachefile + '-repacked'
    command = ["ptrepack", "-o", "--chunkshape=auto", "--propindexes", "--complevel=9", "--complib=blosc", cachefile, outcachefile]
    call(command)

    # Use replace instead of rename to clobber target https://stackoverflow.com/questions/69363867/difference-between-os-replace-and-os-rename
    import os
    os.replace(outcachefile, cachefile)
Reasons:
  • Blacklisted phrase (1): stackoverflow
  • Probably link only (1):
  • Long answer (-1):
  • Has code block (-0.5):
  • User mentioned (1): @0_0's
Posted by: Tim