79716585

Date: 2025-07-27 17:49:08
Score: 6.5
Natty:
Report link

I did it as posted above.

# Using a callback to trainer because Huggingface does not explicitly log the train accuracy
# Adding a custom callback which calls the evaluate() method with train_dataset at the end of every callback.
class CustomCallback(TrainerCallback):
    
    def __init__(self, trainer) -> None:
        super().__init__()
        self._trainer = trainer
    
    def on_epoch_end(self, args, state, control, **kwargs):
        if control.should_evaluate:
            control_copy = deepcopy(control) #If not deep copy control, the trainer would not evaluate the evaluation dataset
            self._trainer.evaluate(eval_dataset=self._trainer.train_dataset, metric_key_prefix="train")
            return control_copy

def my_compute_metrics2(eval_pred):
    metrics = ["accuracy", "bleu"] 
    metric={} 
    for i in metrics:
       metric[i] = evaluate.load(i)
    preds, labels = eval_pred
    predictions = np.argmax(preds, axis=1)
    metric_results={} # Create dictionary to store Accuracy and Bleu metrics
    for i in metrics:
       metric_results[i]=metric[i].compute(predictions=predictions, references=labels)[i]
    return metric_results


However, when running on Google Colab, the following error " out of GPU memory " occurred. Please see below: 


OutOfMemoryError: CUDA out of memory. Tried to allocate 4.10 GiB. GPU 0 has a total capacity of 14.74 GiB of which 732.12 MiB is free. Process 444503 has 14.02 GiB memory in use. Of the allocated memory 8.97 GiB is allocated by PyTorch, and 4.92 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. 



How to fix above OutOfMemoryError ? I look forward to hearing from you! 

Thanks in advance!  It is urgent for me to fix it. 
Reasons:
  • Blacklisted phrase (0.5): Thanks
  • RegEx Blacklisted phrase (3): Thanks in advance
  • RegEx Blacklisted phrase (1.5): How to fix above OutOfMemoryError ?
  • RegEx Blacklisted phrase (2): urgent
  • Long answer (-1):
  • Has code block (-0.5):
  • Low reputation (1):
Posted by: Dingjun Chen