When we last trained our network, we built out quite a lot of functionality that allowed us to experiment with many different parameters and values, and we also made the calls need inside our training loop that would get our results into TensorBoard.

All of this work has helped, but our training loop is quite crowded now. In this exercise, we're going to clean up our training loop and set the stage for more experimentation up by using the RunBuilder class that we built last time and by building a new class called RunManager.

I also find this way of Hyperparameter Tuning more intuitive than TensorBoard. Also, as our number of parameters and runs get larger, TensorBoard will start to breakdown as a viable solution for reviewing our results.

However, calls have been made inside our RunManager class to TensorBoard, so it can be used as an added functionality. For reference, on how to use TensorBoard with PyTorch inside Google Collab, plese refer here.

The code also generates results in csv and json format, which can be used gor further analysis.

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms

from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
from IPython.display import display, clear_output
import pandas as pd
import time
import json

from itertools import product
from collections import namedtuple
from collections import OrderedDict

Designing the Neural Network

class Network(nn.Module):
  def __init__(self):
    super(Network,self).__init__()
    self.conv1 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5)
    self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5)

    self.fc1 = nn.Linear(in_features=16*5*5, out_features=120)
    self.fc2 = nn.Linear(in_features=120, out_features=84)
    self.out = nn.Linear(in_features=84, out_features=10)

  def forward(self, t):
    #Layer 1
    t = t
    #Layer 2
    t = self.conv1(t)
    t = F.relu(t)
    t = F.max_pool2d(t, kernel_size=2, stride=2)#output shape : (6,14,14)
    #Layer 3
    t = self.conv2(t)
    t = F.relu(t)
    t = F.max_pool2d(t, kernel_size=2, stride=2)#output shape : (16,5,5)
    #Layer 4
    t = t.reshape(-1, 16*5*5)
    t = self.fc1(t)
    t = F.relu(t)#output shape : (1,120)
    #Layer 5
    t = self.fc2(t)
    t = F.relu(t)#output shape : (1, 84)
    #Layer 6/ Output Layer
    t = self.out(t)#output shape : (1,10)

    return t

`RunBuilder` class

class RunBuilder():
    @staticmethod
    def get_runs(params):

        Run = namedtuple('Run', params.keys())

        runs = []
        for v in product(*params.values()):
            runs.append(Run(*v))

        return runs

`RunManager` class

class RunManager():
    def __init__(self):
        
        self.epoch_count = 0
        self.epoch_loss = 0
        self.epoch_num_correct = 0
        self.epoch_start_time = None
        
        self.run_params = None
        self.run_count = 0
        self.run_data = []
        self.run_start_time = None
        
        self.network = None
        self.loader = None
        self.tb = None
        
    def begin_run(self, run, network, loader):
        
        self.run_start_time = time.time()

        self.run_params = run
        self.run_count += 1
        
        self.network = network
        self.loader = loader
        self.tb = SummaryWriter(comment=f'-{run}')
        
        images, labels = next(iter(self.loader))
        grid = torchvision.utils.make_grid(images)

        self.tb.add_image('images', grid)
        self.tb.add_graph(
             self.network
            ,images.to(getattr(run, 'device', 'cpu'))
        )
        
    def end_run(self):
        self.tb.close()
        self.epoch_count = 0   

    def begin_epoch(self):
        self.epoch_start_time = time.time()
        
        self.epoch_count += 1
        self.epoch_loss = 0
        self.epoch_num_correct = 0

    def end_epoch(self):
        
        epoch_duration = time.time() - self.epoch_start_time
        run_duration = time.time() - self.run_start_time
        
        loss = self.epoch_loss / len(self.loader.dataset)
        accuracy = self.epoch_num_correct / len(self.loader.dataset)
                
        self.tb.add_scalar('Loss', loss, self.epoch_count)
        self.tb.add_scalar('Accuracy', accuracy, self.epoch_count)
        
        for name, param in self.network.named_parameters():
            self.tb.add_histogram(name, param, self.epoch_count)
            self.tb.add_histogram(f'{name}.grad', param.grad, self.epoch_count)
        
        results = OrderedDict()
        results["run"] = self.run_count
        results["epoch"] = self.epoch_count
        results['loss'] = loss
        results["accuracy"] = accuracy
        results['epoch duration'] = epoch_duration
        results['run duration'] = run_duration
        for k,v in self.run_params._asdict().items(): results[k] = v
        self.run_data.append(results)
        
        df = pd.DataFrame.from_dict(self.run_data, orient='columns')
        
        clear_output(wait=True)
        display(df)
        
    def track_loss(self, loss, batch):
        self.epoch_loss += loss.item() * batch[0].shape[0]
        
    def track_num_correct(self, preds, labels):
        self.epoch_num_correct += self._get_num_correct(preds, labels)
    
    def _get_num_correct(self, preds, labels):
        return preds.argmax(dim=1).eq(labels).sum().item()
    
    def save(self, fileName):
        
        pd.DataFrame.from_dict(
            self.run_data
            ,orient='columns'
        ).to_csv(f'{fileName}.csv')
        
        with open(f'{fileName}.json', 'w', encoding='utf-8') as f:
            json.dump(self.run_data, f, ensure_ascii=False, indent=4)

Loading the CIFAR-10 data and pre-processing

transform =transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

train_set = torchvision.datasets.CIFAR10(
    root='./data'
    ,train=True
    ,download=True
    ,transform=transform
)

Files already downloaded and verified

Training the Nueral Network

params = OrderedDict(
    lr = [.01]
    ,batch_size = [1000]
    ,shuffle = [True]
    ,num_workers = [0, 1, 2, 4, 8, 16]
)
m = RunManager()
for run in RunBuilder.get_runs(params):

    network = Network()
    loader = DataLoader(train_set, batch_size=run.batch_size, shuffle=run.shuffle, num_workers=run.num_workers)
    optimizer = optim.Adam(network.parameters(), lr=run.lr)
    
    m.begin_run(run, network, loader)
    for epoch in range(1):
        m.begin_epoch()
        for batch in loader:
            
            images, labels = batch
            preds = network(images) # Pass Batch
            loss = F.cross_entropy(preds, labels) # Calculate Loss
            optimizer.zero_grad() # Zero Gradients
            loss.backward() # Calculate Gradients
            optimizer.step() # Update Weights
            
            m.track_loss(loss, batch)
            m.track_num_correct(preds, labels)  
        m.end_epoch()
    m.end_run()
m.save('results')

Experimenting with DataLoader `num_workers` attribute

The num_workers attribute tells the data loader instance how many sub-processes to use for data loading. By default, the num_workers value is set to zero, and a value of zero tells the loader to load the data inside the main process.

This means that the training process will work sequentially inside the main process. After a batch is used during the training process and another one is needed, we read the batch data from disk.

Now, if we have a worker process, we can make use of the facility that our machine has multiple cores. This means that the next batch can already be loaded and ready to go by the time the main process is ready for another batch. This is where the speed up comes from. The batches are loaded using additional worker processes and are queued up in memory.

The main take-away from these results is that having a single worker process in addition to the main process resulted in a speed up of about twenty percent. However, adding additional worker processes after the first one didn't really show any further improvements.

Additionally, we can see with higher number of num_workers results in higher run times. Please go through this link to know more.

Summary

We have introduced a way to experiment with Hyperparameters to extract maximum efficiency for our model. This code can be scaled up or scaled down to change the Hyperparameters we wish to experiment upon.

This may be noted that, accuracy is not that high as we have trained our model for 1 epoch with each set of parameters. This has been purely done for experimentation purpose.

However, we might need to change our network architecture i.e. a deeper network for higher efficiency.

	run	epoch	loss	accuracy	epoch duration	run duration	lr	batch_size	shuffle	num_workers
0	1	1	1.869639	0.30556	18.566388	20.198124	0.01	1000	True	0
1	2	1	1.967650	0.26044	16.305351	18.226968	0.01	1000	True	1
2	3	1	1.930892	0.27954	15.927075	18.112834	0.01	1000	True	2
3	4	1	1.850479	0.30830	16.800490	19.740744	0.01	1000	True	4
4	5	1	1.822490	0.32622	17.848449	21.122425	0.01	1000	True	8
5	6	1	1.865499	0.30920	19.867091	24.908000	0.01	1000	True	16