From chapter 1 - 12, this book explains lots of methods from linear to cnn and rnn. I am only interested in building a autograd framework from chapter 13 and beyond. Therefore, my approach to this book is to build a framework then apply it to the rest of the book instead of avoid using framework like chapter 1-12.
Chapter 2: how do machines learn?
- Deep Learning, Machine Learning, AI
- Parametric models and nonparametric models
- Supervised vs unsupervised
Chapter 3: forward propagation
- use numpy.dot to compute matmul
Chapter 4 + 5: gradient descent
- MSE Loss
Chapter 6: Introduction to backpropagation
- Full, batch and SGD
- Linear vs NonLinear
- RELU
Chapter 7: how to picture neural networks
Chapter 8: regularization and batching
- 3 layers network on MNIST
- Overfitting
- Early stopping
- Dropout: randomly turn off neurons (set to 0) during training
- Batch gradient descent
Chapter 9: Activations
- Sigmoid, Tanh
- Softmax
Chapter 10: CNN
Chapter 11: Embedding Layer
Chapter 12: RNN
- Char-RNN
Chapter 13: Build a deep learning framework.
Tensor
class Tensor (object):
- Version 1: wrap self.data = np.array(data)
- Version 2:
- add creation_op and creators to init
- add backward method (self, grad) which will :
- save self.grad
- check self.creation_op to call backward on self.creators
- in add method, add creators = [self, other] and creation_op = “add”
- Version 3: allow a tensor to have multiple children tensor (accumulate gradients)
- init:
- add self.children dictionary
- generate random id for each tensor
- for each creators:
- creator.children[self.id] += 1 # increase children count
- backward:
- decrease children count based on the id of the children
- accumulate gradients from several children
- if received gradients from all children:
- backward to creators based on each creation_op
- add:
- if self.autograd and other.autograd -> return Tensor(autograd=True) ???
- init:
- Version 4: add support for negation
- neg:
- check autograd
- return self.data * -1 and creation_op = “neg”
- backward:
- add backward for creation_op “neg”
- neg:
- Version 5: add subtraction, multiplication, sum, expand, transpose, matrix multiplication
Optimizer
class SGD(object):
- init(parameters, alpha=0.1):
- zero(self):
- for p in self.parameters:
- p.grad.data *= 0
- for p in self.parameters:
- step(self, zero=True):
- for p in self.parameters:
- p.data -= p.grad.data * self.alpha
- if(zero):
- p.grad.data *= 0
- for p in self.parameters:
Layer
class Layer(object):
- init(self): self.parameters = list()
- get_parameters(self): return self.parameters
class Linear(Layer):
- init(self, n_inputs, n_outputs):
- self.weight = Tensor(W, autograd=True)
- self.bias = Tensor(np.zeros(n_outputs), autograd=True)
- self.parameters.append(sell.weight, self.bias)
- forward(self, input):
- return input.mm(self.weight) + self.bias.expand(0, len(input.data))
class Sequential(Layer)
class MSELoss(Layer)
class Tanh(Layer), Sigmoid(Layer):
- add sigmoid and tanh operation to Tensor
- Tanh, Sigmoid layer will call the input.tanh() or input.sigmoid() in forward
class Embedding(Layer) = ???
- need
index_select
operation ?
class CrossEntropyLoss(Layer):
- add cross_entropy operation to Tensor
class RNNCell(Layer)