Reports

"This is half Solved!!!"
 In [ ]:
 # I did my best in the so little time :(((( :(
 Image Coloring Problem
 In this project, you will tackle the challenge of image colorization, a process that
 involves adding color to grayscale images. Image colorization has applications in
 various fields, such as restoring old movies and photographs, enhancing satellite
 imagery, and assisting in medical image analysis.
 The goal is to build a deep learning model that can accurately predict the color
 channels of an image given its grayscale version. You will use PyTorch, a popular
 deep learning library, to construct and train your model. The project will be
 structured around several key tasks, each contributing to the development and
 evaluation of your colorization model.
 U-Net Architecture
 The neural network The neural network needs to take in a noised image at a
 particular time step and return the predicted noise. Note that the predicted noise is
 a tensor that has the same size/resolution as the input image. So technically, the
 network takes in and outputs tensors of the same shape. What type of neural
 network can we use for this?
 What is typically used here is very similar to that of an Autoencoder, which you
 may remember from typical "intro to deep learning" tutorials. Autoencoders have a
 so-called "bottleneck" layer in between the encoder and decoder. The encoder first
 encodes an image into a smaller hidden representation called the "bottleneck", and
 the decoder then decodes that hidden representation back into an actual image.
 This forces the network to only keep the most important information in the
 bottleneck layer.
 In terms of architecture, the DDPM authors went for a U-Net, introduced by
 (Ronneberger et al., 2015) (which, at the time, achieved state-of-the-art results for
 medical image segmentation). This network, like any autoencoder, consists of a
 bottleneck in the middle that makes sure the network learns only the most
 important information. Importantly, it introduced residual connections between the
 encoder and decoder, greatly improving gradient flow (inspired by ResNet in He et
 al., 2015).
Here's a description of the UNet architecture:
 1. Contracting Path (Encoder):
 • The input to the UNet is typically a grayscale or multi-channel image.
 • The contracting path starts with a series of convolutional layers
 followed by max-pooling layers.
 • Each convolutional layer is usually followed by a rectified linear unit
 (ReLU) activation function.
 • The number of filters typically increases with the depth of the network,
 capturing increasingly abstract features.
 • Max-pooling layers progressively downsample the spatial dimensions of
 the feature maps, allowing the network to learn hierarchical
 representations.
 2. Bottleneck:
 • At the bottom of the U-shaped architecture lies the bottleneck or
 central layer.
 • It represents the point where the network switches from the contracting
 path to the expanding path.
 • The bottleneck layer typically consists of convolutional layers without
 max-pooling, allowing the network to capture contextual information.
 3. Expanding Path (Decoder):
 • The expanding path involves upsampling the feature maps and
 concatenating them with feature maps from the contracting path.
 • Each step in the expanding path involves an upsampling operation
 (e.g., transposed convolution or upsampling followed by convolution) to
 increase the spatial resolution.
 • The concatenated feature maps from the corresponding contracting
 path stage serve as skip connections.
 • Skip connections help preserve spatial information and assist in the
 precise localization of segmentation boundaries.
This was adapted from Lukman Aliyu
 Requirements
 • Prepare the data
 • Build a U-net architecture
 • Train the model on the prepared dataset
 • Display 5 images from the training set in 3 formats: original color,
 grayscale, and the colorized
 • Run inference on 10 images in the test set
 • Display the 10 images in 3 formats: original color, grayscale, and the
 colorized
 1. Setup and Imports
 In [ ]:
 import torch
 import torch.nn as nn
 import torch.optim as optim
 from torch.utils.data import DataLoader, Dataset, random_split
 from torchvision import datasets, transforms
 from torchvision.transforms.functional import to_pil_image, resize
 import matplotlib.pyplot as plt
 import numpy as np
 from PIL import Image
 from tqdm import tqdm
 import os
2. Load the Dataset
 class ColorizationDataset(Dataset):
 def __init__(self, dataset, transform_input=None, transform_target=None):
 self.dataset = dataset
 self.transform_input = transform_input
 self.transform_target = transform_target
 def __len__(self):
 return len(self.dataset)
 def __getitem__(self, idx):
 color_img, _ = self.dataset[idx]
 gray_img = transforms.functional.to_grayscale(color_img, num_output_channels
 if self.transform_input:
 gray_img = self.transform_input(gray_img)
 if self.transform_target:
 color_img = self.transform_target(color_img)
 return gray_img, color_img
 transform_input = transforms.Compose([transforms.Resize((32, 32)), transforms.ToTensor
 transform_target = transforms.Compose([transforms.Resize((32, 32)), transforms.
 base_train_dataset = datasets.CIFAR10(root='./data', train=True, download=True)
 base_test_dataset = datasets.CIFAR10(root='./data', train=False, download=True)
 train_full = ColorizationDataset(base_train_dataset, transform_input, transform_target
 test_dataset = ColorizationDataset(base_test_dataset, transform_input, transform_target
 train_size = int(0.8 * len(train_full))
 val_size = len(train_full)-train_size
 train_dataset, val_dataset = random_split(train_full, [train_size, val_size])
 batch_size = 16
 train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
 val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
 test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
 # Just Looking at the data and trying too visualize it
 random_img_idx = torch.randint(0, 1000, (1,)).item()
 print(train_dataset[0][0])
 test_image = train_dataset[random_img_idx][0] # 0 for image part in (image, label) tuple.
 test_image = resize(test_image, (250, 250), antialias=None) # better visualization
 #print(test_image.shape)
 #print('Number of channels in test_image: ', test_image.shape[0])
 test_image.show()
 #to_pil_image(test_image)
 In [ ]:
 In [ ]:
3. Define the Model Architecture
 class Autoencoder(nn.Module):
 def __init__(self):
 super(Autoencoder, self).__init__()
 self.encoder = nn.Sequential(
 nn.Conv2d(1, 16, 3, stride=2, padding=1), nn.ReLU(),
 nn.Conv2d(16, 32, 3, stride=2, padding=1), nn.ReLU(),
 nn.Conv2d(32, 64, 3, padding=1), nn.ReLU(),
 nn.Conv2d(64, 128, 3, padding=1), nn.ReLU()
 )
 self.decoder = nn.Sequential(
 nn.ConvTranspose2d(128, 64, 3, padding=1), nn.ReLU(),
 nn.ConvTranspose2d(64, 32, 3, padding=1), nn.ReLU(),
 nn.ConvTranspose2d(32, 16, 4, stride=2, padding=1), nn.ReLU(),
 nn.ConvTranspose2d(16, 3, 4, stride=2, padding=1), nn.Sigmoid()
 )
 def forward(self, x):
 x = self.encoder(x)
 x = self.decoder(x)
 return x
 class ComprehensiveLoss(nn.Module):
 def __init__(self):
 super(ComprehensiveLoss, self).__init__()
 def forward(self, input, target):
 input = torch.clamp(input, 1e-7, 1-1e-7) # Prevent log(0)
 loss =-1 * (target * torch.log(input) + (1-target) * torch.log(1-input
 return loss.mean()
 def train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs
 model.to(device)
 for epoch in range(num_epochs):
 model.train()
 running_loss = 0.0
 for gray_imgs, color_imgs in tqdm(train_loader, desc=f"Epoch {epoch+1}/
 gray_imgs, color_imgs = gray_imgs.to(device), color_imgs.to(device)
 outputs = model(gray_imgs)
 loss = criterion(outputs, color_imgs)
 optimizer.zero_grad()
 loss.backward()
 optimizer.step()
 running_loss += loss.item()
 print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss / len(train_loader
 model.eval()
 val_loss = 0.0
 with torch.no_grad():
 for gray_imgs, color_imgs in val_loader:
 In [ ]:
gray_imgs, color_imgs = gray_imgs.to(device), color_imgs.to(device
 outputs = model(gray_imgs)
 val_loss += criterion(outputs, color_imgs).item()
 print(f"Validation Loss: {val_loss / len(val_loader):.4f}")
 4. Training the Model
 def train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs
 model.to(device)
 for epoch in range(num_epochs):
 model.train()
 running_loss = 0.0
 for gray_imgs, color_imgs in tqdm(train_loader, desc=f"Epoch {epoch+1}/
 gray_imgs = gray_imgs.to(device)
 color_imgs = color_imgs.to(device)
 outputs = model(gray_imgs)
 loss = criterion(outputs, color_imgs)
 optimizer.zero_grad()
 loss.backward()
 optimizer.step()
 running_loss += loss.item()
 avg_loss = running_loss / len(train_loader)
 print(f"Epoch [{epoch+1}/{num_epochs}], Training Loss: {avg_loss:.4f}")
 # Validation loss
 model.eval()
 val_loss = 0.0
 with torch.no_grad():
 for gray_imgs, color_imgs in val_loader:
 gray_imgs = gray_imgs.to(device)
 color_imgs = color_imgs.to(device)
 outputs = model(gray_imgs)
 loss = criterion(outputs, color_imgs)
 val_loss += loss.item()
 val_loss /= len(val_loader)
 print(f"Validation Loss: {val_loss:.4f}")
 4.1 Loss function
 In [ ]:
 In [ ]:
 In [ ]:
 In [ ]:
# define your training loop with validation
 # ----------------------------
5. Showing Performance on Training
 data
 def visualize_colorization(model, dataset, device='cpu', num_images=5):
 model.eval()
 fig, axs = plt.subplots(num_images, 3, figsize=(10, 4 * num_images))
 with torch.no_grad():
 for i in range(num_images):
 gray, color = dataset[i]
 gray = gray.unsqueeze(0).to(device)
 output = model(gray).squeeze(0).cpu()
 axs[i, 0].imshow(to_pil_image(color))
 axs[i, 0].set_title("Original Color")
 axs[i, 1].imshow(to_pil_image(gray.squeeze(0).cpu()), cmap='gray')
 axs[i, 1].set_title("Grayscale Input")
 axs[i, 2].imshow(to_pil_image(output))
 axs[i, 2].set_title("Colorized Output")
 for j in range(3): axs[i, j].axis("off")
 plt.tight_layout()
 plt.show()
 6. Making Inferences
 def visualize_colorization(model, dataset, device='cpu', num_images=5):
 model.eval()
 fig, axs = plt.subplots(num_images, 3, figsize=(10, 4 * num_images))
 with torch.no_grad():
 for i in range(num_images):
 gray, color = dataset[i]
 gray = gray.unsqueeze(0).to(device)
 output = model(gray).squeeze(0).cpu()
 axs[i, 0].imshow(to_pil_image(color))
 axs[i, 0].set_title("Original Color")
 axs[i, 1].imshow(to_pil_image(gray.squeeze(0).cpu()), cmap='gray')
 axs[i, 1].set_title("Grayscale Input")
 axs[i, 2].imshow(to_pil_image(output))
 axs[i, 2].set_title("Colorized Output")
 for j in range(3): axs[i, j].axis("off")
 plt.tight_layout()
 plt.show()
 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 model = Autoencoder()
 In [ ]:
 In [ ]:
 In [ ]:
 In [ ]:
criterion = ComprehensiveLoss()
 optimizer = optim.Adam(model.parameters(), lr=1e-4)
 # Train the model
 train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs=10
 # Visualize on training data
 visualize_colorization(model, train_dataset, device=device, num_images=5)
 # Visualize on test data
 visualize_colorization(model, test_dataset, device=device, num_images=10
79594638