While the existing answers clearly explain how to update the intrinsic matrix for standard scaling/cropping operations, I want to provide a related perspective: how to construct a intrinsic for projection matrix of rendering when using crop, padding, and scaling, so that standard rendering pipelines correctly project 3D objects onto the edited image.
When performing crop, padding, or scaling on images, the camera projection matrix needs to be adjusted to ensure that 3D objects are correctly rendered onto the modified image.
Pixel space vs. Homogeneous space
Pixel space (CV): Uses actual pixel coordinates, e.g., a 1920×1080 image has x ∈ [0,1920] and y ∈ [0,1080].
Homogeneous space (Graphics): Normalized coordinates where x ∈ [-1,1] and y ∈ [-1,1], regardless of image size.
This distinction affects how image augmentations influence projections. For example, adding padding on the right:
In pixel space, left-side pixels do not move.
In homogeneous space, the entire x-axis is compressed because the total width increased.
A camera intrinsic matrix contains four main parameters:
fx, fy: focal lengths along x and y axes
cx, cy: principal point offsets along x and y axes
Translation (cx, cy)
Only cropping/padding on the left/top affects the principal point.
Right/bottom operations have no effect.
Scaling (fx, fy)
In CV pixel space: only scaling changes fx/fy.
In homogeneous space: crop and padding also affect fx/fy, because padding changes the image aspect ratio, which changes the mapping to normalized [-1,1] coordinates.
Pixel space rules:
Cropping/padding:
cx, cy decrease by the number of pixels cropped from left/top
Right/bottom cropping has no effect
fx, fy remain unchanged
Scaling:
fx, fy multiplied by scale s
cx, cy multiplied by scale s
Homogeneous space rules:
Cropping/padding changes image aspect ratio → requires extra scaling compensation
Compute compensation factors:
sx = s * (original_width / padded_width)
sy = s * (original_height / padded_height)
fx_new = fx * sx
fy_new = fy * sy
cx_new = cx * sx
cy_new = cy * sy
Note: This compensation only adjusts for the normalized coordinate system and does not change physical camera parameters.
To compute FOV consistent with the original image:
fov_x = 2 * arctan((original_width * s) / (2 * fx_new))
fov_y = 2 * arctan((original_height * s) / (2 * fy_new))
This ensures that rendering with crop, padding, and scaling produces objects at the correct location and scale, without relying on viewport adjustments.