The problem is using .cuda()
to move the model to the GPU when loading the model using the BitsAndBytesConfig
. I was able to get the error to go away by using the device_map
argument when loading the model, e.g.,
model = AutoModelForCausalLM.from_pretrained(
"ThetaCursed/Ovis1.6-Gemma2-9B-bnb-4bit",
trust_remote_code=True,
device_map='auto',
**kwargs
)