If you're unsure what process to kill (process doesn't show up with nvidia-smi
):
Use nvtop
to find PIDs of the dead processes still hogging VRAM and which device index (use nvtop
because nvidia-smi
may have filtered it)
Check fuser -v /dev/nvidia<device index>
to find user (change device index
to relevant integer)
Use htop -u <user>
and kill processes that seem to have hanged
This is a complement to [Kenan's answer](https://stackoverflow.com/a/46597252/15399131).