The problem is using .cuda() to move the model to the GPU when loading the model using the BitsAndBytesConfig. I was able to get the error to go away by using the device_map argument when loading the model, e.g.,
model = AutoModelForCausalLM.from_pretrained(
"ThetaCursed/Ovis1.6-Gemma2-9B-bnb-4bit",
trust_remote_code=True,
device_map='auto',
**kwargs
)