"This is half Solved!!!"
In [ ]:
# I did my best in the so little time :(((( :(
Image Coloring Problem
In this project, you will tackle the challenge of image colorization, a process that
involves adding color to grayscale images. Image colorization has applications in
various fields, such as restoring old movies and photographs, enhancing satellite
imagery, and assisting in medical image analysis.
The goal is to build a deep learning model that can accurately predict the color
channels of an image given its grayscale version. You will use PyTorch, a popular
deep learning library, to construct and train your model. The project will be
structured around several key tasks, each contributing to the development and
evaluation of your colorization model.
U-Net Architecture
The neural network The neural network needs to take in a noised image at a
particular time step and return the predicted noise. Note that the predicted noise is
a tensor that has the same size/resolution as the input image. So technically, the
network takes in and outputs tensors of the same shape. What type of neural
network can we use for this?
What is typically used here is very similar to that of an Autoencoder, which you
may remember from typical "intro to deep learning" tutorials. Autoencoders have a
so-called "bottleneck" layer in between the encoder and decoder. The encoder first
encodes an image into a smaller hidden representation called the "bottleneck", and
the decoder then decodes that hidden representation back into an actual image.
This forces the network to only keep the most important information in the
bottleneck layer.
In terms of architecture, the DDPM authors went for a U-Net, introduced by
(Ronneberger et al., 2015) (which, at the time, achieved state-of-the-art results for
medical image segmentation). This network, like any autoencoder, consists of a
bottleneck in the middle that makes sure the network learns only the most
important information. Importantly, it introduced residual connections between the
encoder and decoder, greatly improving gradient flow (inspired by ResNet in He et
al., 2015).
Here's a description of the UNet architecture:
1. Contracting Path (Encoder):
• The input to the UNet is typically a grayscale or multi-channel image.
• The contracting path starts with a series of convolutional layers
followed by max-pooling layers.
• Each convolutional layer is usually followed by a rectified linear unit
(ReLU) activation function.
• The number of filters typically increases with the depth of the network,
capturing increasingly abstract features.
• Max-pooling layers progressively downsample the spatial dimensions of
the feature maps, allowing the network to learn hierarchical
representations.
2. Bottleneck:
• At the bottom of the U-shaped architecture lies the bottleneck or
central layer.
• It represents the point where the network switches from the contracting
path to the expanding path.
• The bottleneck layer typically consists of convolutional layers without
max-pooling, allowing the network to capture contextual information.
3. Expanding Path (Decoder):
• The expanding path involves upsampling the feature maps and
concatenating them with feature maps from the contracting path.
• Each step in the expanding path involves an upsampling operation
(e.g., transposed convolution or upsampling followed by convolution) to
increase the spatial resolution.
• The concatenated feature maps from the corresponding contracting
path stage serve as skip connections.
• Skip connections help preserve spatial information and assist in the
precise localization of segmentation boundaries.
This was adapted from Lukman Aliyu
Requirements
• Prepare the data
• Build a U-net architecture
• Train the model on the prepared dataset
• Display 5 images from the training set in 3 formats: original color,
grayscale, and the colorized
• Run inference on 10 images in the test set
• Display the 10 images in 3 formats: original color, grayscale, and the
colorized
1. Setup and Imports
In [ ]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset, random_split
from torchvision import datasets, transforms
from torchvision.transforms.functional import to_pil_image, resize
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
from tqdm import tqdm
import os
2. Load the Dataset
class ColorizationDataset(Dataset):
def __init__(self, dataset, transform_input=None, transform_target=None):
self.dataset = dataset
self.transform_input = transform_input
self.transform_target = transform_target
def __len__(self):
return len(self.dataset)
def __getitem__(self, idx):
color_img, _ = self.dataset[idx]
gray_img = transforms.functional.to_grayscale(color_img, num_output_channels
if self.transform_input:
gray_img = self.transform_input(gray_img)
if self.transform_target:
color_img = self.transform_target(color_img)
return gray_img, color_img
transform_input = transforms.Compose([transforms.Resize((32, 32)), transforms.ToTensor
transform_target = transforms.Compose([transforms.Resize((32, 32)), transforms.
base_train_dataset = datasets.CIFAR10(root='./data', train=True, download=True)
base_test_dataset = datasets.CIFAR10(root='./data', train=False, download=True)
train_full = ColorizationDataset(base_train_dataset, transform_input, transform_target
test_dataset = ColorizationDataset(base_test_dataset, transform_input, transform_target
train_size = int(0.8 * len(train_full))
val_size = len(train_full)-train_size
train_dataset, val_dataset = random_split(train_full, [train_size, val_size])
batch_size = 16
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
# Just Looking at the data and trying too visualize it
random_img_idx = torch.randint(0, 1000, (1,)).item()
print(train_dataset[0][0])
test_image = train_dataset[random_img_idx][0] # 0 for image part in (image, label) tuple.
test_image = resize(test_image, (250, 250), antialias=None) # better visualization
#print(test_image.shape)
#print('Number of channels in test_image: ', test_image.shape[0])
test_image.show()
#to_pil_image(test_image)
In [ ]:
In [ ]:
3. Define the Model Architecture
class Autoencoder(nn.Module):
def __init__(self):
super(Autoencoder, self).__init__()
self.encoder = nn.Sequential(
nn.Conv2d(1, 16, 3, stride=2, padding=1), nn.ReLU(),
nn.Conv2d(16, 32, 3, stride=2, padding=1), nn.ReLU(),
nn.Conv2d(32, 64, 3, padding=1), nn.ReLU(),
nn.Conv2d(64, 128, 3, padding=1), nn.ReLU()
)
self.decoder = nn.Sequential(
nn.ConvTranspose2d(128, 64, 3, padding=1), nn.ReLU(),
nn.ConvTranspose2d(64, 32, 3, padding=1), nn.ReLU(),
nn.ConvTranspose2d(32, 16, 4, stride=2, padding=1), nn.ReLU(),
nn.ConvTranspose2d(16, 3, 4, stride=2, padding=1), nn.Sigmoid()
)
def forward(self, x):
x = self.encoder(x)
x = self.decoder(x)
return x
class ComprehensiveLoss(nn.Module):
def __init__(self):
super(ComprehensiveLoss, self).__init__()
def forward(self, input, target):
input = torch.clamp(input, 1e-7, 1-1e-7) # Prevent log(0)
loss =-1 * (target * torch.log(input) + (1-target) * torch.log(1-input
return loss.mean()
def train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs
model.to(device)
for epoch in range(num_epochs):
model.train()
running_loss = 0.0
for gray_imgs, color_imgs in tqdm(train_loader, desc=f"Epoch {epoch+1}/
gray_imgs, color_imgs = gray_imgs.to(device), color_imgs.to(device)
outputs = model(gray_imgs)
loss = criterion(outputs, color_imgs)
optimizer.zero_grad()
loss.backward()
optimizer.step()
running_loss += loss.item()
print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss / len(train_loader
model.eval()
val_loss = 0.0
with torch.no_grad():
for gray_imgs, color_imgs in val_loader:
In [ ]:
gray_imgs, color_imgs = gray_imgs.to(device), color_imgs.to(device
outputs = model(gray_imgs)
val_loss += criterion(outputs, color_imgs).item()
print(f"Validation Loss: {val_loss / len(val_loader):.4f}")
4. Training the Model
def train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs
model.to(device)
for epoch in range(num_epochs):
model.train()
running_loss = 0.0
for gray_imgs, color_imgs in tqdm(train_loader, desc=f"Epoch {epoch+1}/
gray_imgs = gray_imgs.to(device)
color_imgs = color_imgs.to(device)
outputs = model(gray_imgs)
loss = criterion(outputs, color_imgs)
optimizer.zero_grad()
loss.backward()
optimizer.step()
running_loss += loss.item()
avg_loss = running_loss / len(train_loader)
print(f"Epoch [{epoch+1}/{num_epochs}], Training Loss: {avg_loss:.4f}")
# Validation loss
model.eval()
val_loss = 0.0
with torch.no_grad():
for gray_imgs, color_imgs in val_loader:
gray_imgs = gray_imgs.to(device)
color_imgs = color_imgs.to(device)
outputs = model(gray_imgs)
loss = criterion(outputs, color_imgs)
val_loss += loss.item()
val_loss /= len(val_loader)
print(f"Validation Loss: {val_loss:.4f}")
4.1 Loss function
In [ ]:
In [ ]:
In [ ]:
In [ ]:
# define your training loop with validation
# ----------------------------
5. Showing Performance on Training
data
def visualize_colorization(model, dataset, device='cpu', num_images=5):
model.eval()
fig, axs = plt.subplots(num_images, 3, figsize=(10, 4 * num_images))
with torch.no_grad():
for i in range(num_images):
gray, color = dataset[i]
gray = gray.unsqueeze(0).to(device)
output = model(gray).squeeze(0).cpu()
axs[i, 0].imshow(to_pil_image(color))
axs[i, 0].set_title("Original Color")
axs[i, 1].imshow(to_pil_image(gray.squeeze(0).cpu()), cmap='gray')
axs[i, 1].set_title("Grayscale Input")
axs[i, 2].imshow(to_pil_image(output))
axs[i, 2].set_title("Colorized Output")
for j in range(3): axs[i, j].axis("off")
plt.tight_layout()
plt.show()
6. Making Inferences
def visualize_colorization(model, dataset, device='cpu', num_images=5):
model.eval()
fig, axs = plt.subplots(num_images, 3, figsize=(10, 4 * num_images))
with torch.no_grad():
for i in range(num_images):
gray, color = dataset[i]
gray = gray.unsqueeze(0).to(device)
output = model(gray).squeeze(0).cpu()
axs[i, 0].imshow(to_pil_image(color))
axs[i, 0].set_title("Original Color")
axs[i, 1].imshow(to_pil_image(gray.squeeze(0).cpu()), cmap='gray')
axs[i, 1].set_title("Grayscale Input")
axs[i, 2].imshow(to_pil_image(output))
axs[i, 2].set_title("Colorized Output")
for j in range(3): axs[i, j].axis("off")
plt.tight_layout()
plt.show()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Autoencoder()
In [ ]:
In [ ]:
In [ ]:
In [ ]:
criterion = ComprehensiveLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-4)
# Train the model
train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs=10
# Visualize on training data
visualize_colorization(model, train_dataset, device=device, num_images=5)
# Visualize on test data
visualize_colorization(model, test_dataset, device=device, num_images=10