Nueral Network

Methodology Overview

Create methods for forward and backward propagation for each layer

Chain each layer together and use the derivatives calculated through backward propagation to optimize the loss function using gradient descent

Implement the network using a linear layer, a ReLU layer, a linear layer, and a loss layer

Code Implimentation

Linear Layer

Forward: Consists of sending the data to the next layer after performing a matrix-matrix multiplication with the Weights matrix (weights matrix starts as a matrix full of random initial values).

Backward: The backward step takes the input of the derivative of the loss function with respect to the linear layer's output, and calculates two derivatives. In this step, we calculate both the derivative of the loss function with respect to X (the layer's input), and the derivative of the loss function with respect to W (the weights of the input) These values can be discovered through matrix-matrix multiplication.

ReLU Layer

Forward: The ReLU forward function takes input X, and outputs a matrix Y of the same size. It permutates through each element of X, and sets that element in Y equal to zero if it the element in X is less than zero, and sets the element in Y equal to the value of the element in X if the element in X is greater than zero. We implemented this methodology by flattening matrix X, and running each element through a lambda function before inputting it into Y. The lambda function returns the max of zero or X (whichever value is larger). We then reshape Y so that it is equal to the shape of the input X. In order to do so, we had to store the initial shape of X. Additionally, we store the input value of X for backwards computations.

Backwards: Our implementation of the backward ReLU computations is based on the fact that we can represent the partial derivative of loss with respect to X, the input of the ReLU function, by using the following formula.

We flatten the input matrix X and run each element through a lambda function that utilizes max(0, x) to set the value equal to zero if the element of X is less than one and equal to its value if the value is greater than zero.

MSE Loss Layer

Forward: To calculate the loss of the function, we first store the difference between the predicted and true output of the linear layer so that we can use that information for backwards computation. This difference is the result of the vector-vector subtraction of the true and predicted values,and the difference is stored as a vector.Next, we calculate and return the mean squared error which is equal to the sum of the each difference stored in the vector calculated in the previous step. We then divide this by the number of samples to calculate the means squared error.

Backwards: for each of the layers predicted values, we compared it to the true value. We took the difference between these two values and used it to solve for the gradient of each predicted value. each gradient values was saved into the same matrix and then returned by the function

Download Project Code

The graph above shows that our network can be effectively trained. The graph shows our loss, representative of the average discrepancy between predicted and expected values, decreasing as more of the training data was fed through the network.

Shows accuracy of neural networks classifications increasing in positive relation to training

Training A Hidden Markov Model and Traversing it with a Viterbi Algorithm

This program reads a sentance from a text file, and predicts the corresponding part of speech for each word in the sentance. The Program uses a Hidden Markov Model. The Hidden Markov Model is trained and based off of example text files. From these example files I extracted the frequency of words appearing as a certiant type of part of speech and the frequency at which parts of speech were followed by another particular part of speech. From this I found reasonable probabilities of a word being a certiant part of speech and the probability that any part of speech is to follow part of speech. The Hidden Markov Model used in this program is structured so that its states are part of speech tags, its transition weights are probability of one tag being followed by the next, and its observations are words with the probability of that word appearing as the states part of speech tags. The image below represents the form of the model.

In my viterbi algirithm I traverse the Hidden markov Model and find the highest probability of a part of speech tag for each word in a given sentance.

The image above shows the accuracy with which the program labeled the parts of speech of words in given text after being trained with two different different sets of example texts. Unseen punishment is the probability used by the viterbi algorithm to approximate the likelihood that a word it is encountering is being used as a POS that it has never seen it be used as before.

I completed this project with my friend and classmate Tucker Simpson '24

Download Project Code

Program Input

Program Output

Huffman Encoding

Constructed a tree such that each character that appears in a given file is a node and the path from the root to that character's node gives that character a unique code. The code is created by traversing the tree, where turning to a left child is interpreted as a 0 and to a right child as a 1. In the example tree below the code word for character "e" is 1101.

As seen in the leafs above...In creating the tree, the lowest frequency characters must be the deepest nodes in the tree, and hence have the longest bit codes, and the highest frequency characters must be near the top of the tree. This means that long codes appear less in the compressed file taking up less space. To acheive this I read through the text files, an dcounted the frequency of each word. I placed these findings in a map, mapping charcaters to their frequency of occurance. I then place these map items into a priority que and used a custom comparator to return items in order of highest to lowest frequency of occurance. In the order which they where returned by my priority que I inserted these map items into the tree. The entire tree is thus organized based on the frequency at which characters occur, if two charcters have the same parent leaf, their parent is then represented as the sum of both its childrens frequencies.

I used this tree to compress and decrompress files, reading from text files, writing into bit files and then reinterpreting my bits back into the original text.

Nueral Network

Methodology Overview

Code Implimentation

Linear Layer

ReLU Layer

MSE Loss Layer

Poison Blending

Cut and Paste

Poison Blending

Collaborative Graphical Editor

System Diagram

Side by Side Editor Views

Training A Hidden Markov Model and Traversing it with a Viterbi Algorithm

Program Input

Program Output

The Famous Bacon Game

Terminal Transcript of User Playing the Bacon Game

Make sure to view the game commands at the top of the file

Huffman Encoding

US Constitition Compression and Dcompression

Decompressed File Size: 45,119 bytes

Compressed File Size: 25,337 bytes

Point Quad Tree GUI and Dot Colllision Detector

Tree Creation (Insertion)

Blob Collission Detection (Traversal/Search)

Live Cam Paint

Program Output

Using Breadth First Search to Find the Quickest Route to a Destination On Dartmouth's Campus.

Screen Recording of Program Output Being Interacted With

Sorting World Cities By Population

Program output marking cities in order of population

Solar System Simulation

Program output showing the solar system moving at a speed such that 1 second of real time simulates 3 million seconds having passed in the solar system

Pong

Program output from a game played between my roomate and I

Asteroid Feild

Click to play the game