Grokking Dl Andrew

Last updated on Oct 3, 2019

From chapter 1 - 12, this book explains lots of methods from linear to cnn and rnn. I am only interested in building a autograd framework from chapter 13 and beyond. Therefore, my approach to this book is to build a framework then apply it to the rest of the book instead of avoid using framework like chapter 1-12.

Chapter 2: how do machines learn?

Deep Learning, Machine Learning, AI
Parametric models and nonparametric models
Supervised vs unsupervised

Chapter 3: forward propagation

use numpy.dot to compute matmul

Chapter 4 + 5: gradient descent

MSE Loss

Chapter 6: Introduction to backpropagation

Full, batch and SGD
Linear vs NonLinear
RELU

Chapter 7: how to picture neural networks

Chapter 8: regularization and batching

3 layers network on MNIST
Overfitting
Early stopping
Dropout: randomly turn off neurons (set to 0) during training
Batch gradient descent

Chapter 9: Activations

Sigmoid, Tanh
Softmax

Chapter 10: CNN

Chapter 11: Embedding Layer

Chapter 12: RNN

Char-RNN

Chapter 13: Build a deep learning framework.

Tensor

class Tensor (object):

Version 1: wrap self.data = np.array(data)
Version 2:
- add creation_op and creators to init
- add backward method (self, grad) which will :
  - save self.grad
  - check self.creation_op to call backward on self.creators
- in add method, add creators = [self, other] and creation_op = “add”
Version 3: allow a tensor to have multiple children tensor (accumulate gradients)
- init:
  - add self.children dictionary
  - generate random id for each tensor
  - for each creators:
    - creator.children[self.id] += 1 # increase children count
- backward:
  - decrease children count based on the id of the children
  - accumulate gradients from several children
  - if received gradients from all children:
    - backward to creators based on each creation_op
- add:
  - if self.autograd and other.autograd -> return Tensor(autograd=True) ???
Version 4: add support for negation
- neg:
  - check autograd
  - return self.data * -1 and creation_op = “neg”
- backward:
  - add backward for creation_op “neg”
Version 5: add subtraction, multiplication, sum, expand, transpose, matrix multiplication

Optimizer

class SGD(object):

init(parameters, alpha=0.1):
zero(self):
- for p in self.parameters:
  - p.grad.data *= 0
step(self, zero=True):
- for p in self.parameters:
  - p.data -= p.grad.data * self.alpha
  - if(zero):
    - p.grad.data *= 0

Layer

class Layer(object):

init(self): self.parameters = list()
get_parameters(self): return self.parameters

class Linear(Layer):

init(self, n_inputs, n_outputs):
- self.weight = Tensor(W, autograd=True)
- self.bias = Tensor(np.zeros(n_outputs), autograd=True)
- self.parameters.append(sell.weight, self.bias)
forward(self, input):
- return input.mm(self.weight) + self.bias.expand(0, len(input.data))

class Sequential(Layer)

class MSELoss(Layer)

class Tanh(Layer), Sigmoid(Layer):

add sigmoid and tanh operation to Tensor
Tanh, Sigmoid layer will call the input.tanh() or input.sigmoid() in forward

class Embedding(Layer) = ???

need index_select operation ?

class CrossEntropyLoss(Layer):

add cross_entropy operation to Tensor

class RNNCell(Layer)

Chapter 14: LSTM

Chapter 15: Federated Learning