CNN Training Loop Refactoring (Simultaneous Hyperparameter Testing)
When we last trained our network, we built out quite a lot of functionality that allowed us to experiment with many different parameters and values, and we also made the calls need inside our training loop that would get our results into TensorBoard.
All of this work has helped, but our training loop is quite crowded now. In this exercise, we're going to clean up our training loop and set the stage for more experimentation up by using the RunBuilder
class that we built last time and by building a new class called RunManager
.
I also find this way of Hyperparameter Tuning more intuitive than TensorBoard. Also, as our number of parameters and runs get larger, TensorBoard will start to breakdown as a viable solution for reviewing our results.
However, calls have been made inside our RunManager
class to TensorBoard, so it can be used as an added functionality. For reference, on how to use TensorBoard with PyTorch inside Google Collab, plese refer here.
The code also generates results in csv
and json
format, which can be used gor further analysis.
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
from IPython.display import display, clear_output
import pandas as pd
import time
import json
from itertools import product
from collections import namedtuple
from collections import OrderedDict
class Network(nn.Module):
def __init__(self):
super(Network,self).__init__()
self.conv1 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5)
self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5)
self.fc1 = nn.Linear(in_features=16*5*5, out_features=120)
self.fc2 = nn.Linear(in_features=120, out_features=84)
self.out = nn.Linear(in_features=84, out_features=10)
def forward(self, t):
#Layer 1
t = t
#Layer 2
t = self.conv1(t)
t = F.relu(t)
t = F.max_pool2d(t, kernel_size=2, stride=2)#output shape : (6,14,14)
#Layer 3
t = self.conv2(t)
t = F.relu(t)
t = F.max_pool2d(t, kernel_size=2, stride=2)#output shape : (16,5,5)
#Layer 4
t = t.reshape(-1, 16*5*5)
t = self.fc1(t)
t = F.relu(t)#output shape : (1,120)
#Layer 5
t = self.fc2(t)
t = F.relu(t)#output shape : (1, 84)
#Layer 6/ Output Layer
t = self.out(t)#output shape : (1,10)
return t
class RunBuilder():
@staticmethod
def get_runs(params):
Run = namedtuple('Run', params.keys())
runs = []
for v in product(*params.values()):
runs.append(Run(*v))
return runs
class RunManager():
def __init__(self):
self.epoch_count = 0
self.epoch_loss = 0
self.epoch_num_correct = 0
self.epoch_start_time = None
self.run_params = None
self.run_count = 0
self.run_data = []
self.run_start_time = None
self.network = None
self.loader = None
self.tb = None
def begin_run(self, run, network, loader):
self.run_start_time = time.time()
self.run_params = run
self.run_count += 1
self.network = network
self.loader = loader
self.tb = SummaryWriter(comment=f'-{run}')
images, labels = next(iter(self.loader))
grid = torchvision.utils.make_grid(images)
self.tb.add_image('images', grid)
self.tb.add_graph(
self.network
,images.to(getattr(run, 'device', 'cpu'))
)
def end_run(self):
self.tb.close()
self.epoch_count = 0
def begin_epoch(self):
self.epoch_start_time = time.time()
self.epoch_count += 1
self.epoch_loss = 0
self.epoch_num_correct = 0
def end_epoch(self):
epoch_duration = time.time() - self.epoch_start_time
run_duration = time.time() - self.run_start_time
loss = self.epoch_loss / len(self.loader.dataset)
accuracy = self.epoch_num_correct / len(self.loader.dataset)
self.tb.add_scalar('Loss', loss, self.epoch_count)
self.tb.add_scalar('Accuracy', accuracy, self.epoch_count)
for name, param in self.network.named_parameters():
self.tb.add_histogram(name, param, self.epoch_count)
self.tb.add_histogram(f'{name}.grad', param.grad, self.epoch_count)
results = OrderedDict()
results["run"] = self.run_count
results["epoch"] = self.epoch_count
results['loss'] = loss
results["accuracy"] = accuracy
results['epoch duration'] = epoch_duration
results['run duration'] = run_duration
for k,v in self.run_params._asdict().items(): results[k] = v
self.run_data.append(results)
df = pd.DataFrame.from_dict(self.run_data, orient='columns')
clear_output(wait=True)
display(df)
def track_loss(self, loss, batch):
self.epoch_loss += loss.item() * batch[0].shape[0]
def track_num_correct(self, preds, labels):
self.epoch_num_correct += self._get_num_correct(preds, labels)
def _get_num_correct(self, preds, labels):
return preds.argmax(dim=1).eq(labels).sum().item()
def save(self, fileName):
pd.DataFrame.from_dict(
self.run_data
,orient='columns'
).to_csv(f'{fileName}.csv')
with open(f'{fileName}.json', 'w', encoding='utf-8') as f:
json.dump(self.run_data, f, ensure_ascii=False, indent=4)
transform =transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
train_set = torchvision.datasets.CIFAR10(
root='./data'
,train=True
,download=True
,transform=transform
)
params = OrderedDict(
lr = [.01]
,batch_size = [1000]
,shuffle = [True]
,num_workers = [0, 1, 2, 4, 8, 16]
)
m = RunManager()
for run in RunBuilder.get_runs(params):
network = Network()
loader = DataLoader(train_set, batch_size=run.batch_size, shuffle=run.shuffle, num_workers=run.num_workers)
optimizer = optim.Adam(network.parameters(), lr=run.lr)
m.begin_run(run, network, loader)
for epoch in range(1):
m.begin_epoch()
for batch in loader:
images, labels = batch
preds = network(images) # Pass Batch
loss = F.cross_entropy(preds, labels) # Calculate Loss
optimizer.zero_grad() # Zero Gradients
loss.backward() # Calculate Gradients
optimizer.step() # Update Weights
m.track_loss(loss, batch)
m.track_num_correct(preds, labels)
m.end_epoch()
m.end_run()
m.save('results')
Experimenting with DataLoader num_workers
attribute
The num_workers
attribute tells the data loader instance how many sub-processes to use for data loading. By default, the num_workers
value is set to zero, and a value of zero tells the loader to load the data inside the main process.
This means that the training process will work sequentially inside the main process. After a batch is used during the training process and another one is needed, we read the batch data from disk.
Now, if we have a worker process, we can make use of the facility that our machine has multiple cores. This means that the next batch can already be loaded and ready to go by the time the main process is ready for another batch. This is where the speed up comes from. The batches are loaded using additional worker processes and are queued up in memory.
The main take-away from these results is that having a single worker process in addition to the main process resulted in a speed up of about twenty percent. However, adding additional worker processes after the first one didn't really show any further improvements.
Additionally, we can see with higher number of num_workers results in higher run times. Please go through this link to know more.
Summary
We have introduced a way to experiment with Hyperparameters to extract maximum efficiency for our model. This code can be scaled up or scaled down to change the Hyperparameters we wish to experiment upon.
This may be noted that, accuracy is not that high as we have trained our model for 1 epoch with each set of parameters. This has been purely done for experimentation purpose.
However, we might need to change our network architecture i.e. a deeper network for higher efficiency.