You can use the huggingface Clip models (open_clip just wraps around huggingface libraries anyway), which has a output_hidden_states
parameter, which will return the outputs before the pooled layer.
See an example here https://github.com/huggingface/diffusers/blob/2432f80ca37f882af733244df24b46f2d447fbcf/src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py#L323