iiSM.ORG
Joinnonprofit

Neural Network Matrix Visualization

Gandalf Hudlow @GandalfHudlow bio
2021/03/01

I was browsing reddit the other day and ran across this practical explanation of the code for a handwritten number recognition neural network, and for whatever reason it engaged me in this weeks-long down the learning rabbit hole. So I made a 2.5D visualization of each matrix operation in that exact neural network reimplemented in C# using the NumSharp library. This visualization taught me some things that were not obvious from other visualizations I had seen, such as:

Video Tips: The visualization of the matrices is quite detailed and best viewed on a desktop, full screen so you can arrow key forward/back, and read the source code that is shown at top right for each step. Also if you don't see a red HD on the lower right youtube gear icon, click the gear and switch it from Auto 720 to HD 1080. Anything less than HD is just too blurry to use. Also, feel free to use youtubedl to download for improved learning workflow if youtube is just too slow to work with.

For your bandwidth convenience the visualization is rendered in 10fps and 60fps. In addition I've created a 10fps and 60fps table of links to easily navigate to different parts of the visualization.

The Neural Network

Now let's go through the code with links to the relevant points in the visualization. The full source is here. The source code also cross-references to the visualization to boost understanding and learning.

 

The code is a console application that starts out in Program.cs, but the meat of the neural network is in HelloWorld.cs. Program.cs takes care of some housekeeping like reading in saved neural net weights and biases and then reading in the MNIST dataset using MNIST.IO and feeding each MNIST.IO TestCase object to the ProcessNeuralNet function in HelloWorld.cs.

 

When the HelloWorld class is instantiated by Program.cs the constructor code below runs which initializes the weights to random numbers between -0.5 and 0.5 and the bias values to 0. If Program.cs sees the appropriate .npy files when it starts, it will overwrite these values with whatever is in the .npy files.

    inputToHiddenWeights20x784 = np.random.uniform(-0.5, 0.5, (20, 784));
    hiddenToOutputWeights10x20 = np.random.uniform(-0.5, 0.5, (10, 20));
    hiddenBiases20x1 = np.zeros((20, 1));
    outputBiases10x1 = np.zeros((10, 1));

First we load a NumSharp NDArray object with a 784x1 matrix that represents the image being processed.

    NDArray image784x1 = testCase.AsNDArray();

Then we multiply each of the 20 layers in the 20x784 input-to-hidden weights by the image and add in the 20x1 bias matrix. And just like that, 784 pixels are crunched down to 20 values that will soon become 20 hidden neuron values. 10fps 60fps

    var hiddenPreSigmoid20x1 = hiddenBiases20x1 + np.matmul(inputToHiddenWeights20x784, image784x1);

Then we apply the sigmoid function to squish the values in that hiddenPreSigmoid20x1 matrix to fall in the range of 0 to 1. Hence the joke, Cab Driver says “that'll be $15 for the cab ride there Sigmoid!” To which Sigmoid replies “Here's your 0.99999969 cents!“

    currentHiddenNeurons20X1 = np.divide(1, (np.add(1, np.exp(-hiddenPreSigmoid20x1))));    

We now have a set of 20x1 hidden neurons and in a couple more steps we can turn them into a set of 10 output neurons. We do that by multiplying 20x1 hidden neuron matrix by the 10x20 hidden-to-output weights, which gives us a 10x1 matrix that are almost the output neurons. 10fps 60fps

    var outputPreSigmoid10x1 = outputBiases10x1 + 
        np.matmul(hiddenToOutputWeights10x20, currentHiddenNeurons20X1);

We apply our friend the Sigmoid again to squish those output neurons down to nice positive numbers between 0 and 1.

    currentOutputNeurons10x1 = 1 / (1 + np.exp(-outputPreSigmoid10x1));

Then we ask the testCase object to give us a 10x1 matrix that represents the output we are expecting, that is a matrix of mostly zeroes except the single neuron that should be turned on will be a 1.0.

    expectedOutput10x1 = testCase.AsLabelNDArray();

Then we set our detected member variable using NumSharp's argmax to simply figure the index of the brightest neuron, and that index is the digit that the neural network detected.

    detected = np.argmax(currentOutputNeurons10x1); 

Ok, at this point the neural network has done the really cool part of its job, it has detected a digit! If it has had one round of training on the MNIST training dataset, the 'detected' member variable will have the right number when processing the separate MNIST test datasets ~90% of the time.

 

The rest of the code is applying what the neural network got wrong to do a better job the next time it sees the image or one like it. This next process is called backpropagation.

Backpropagation

First we subtract the expected output 10x1 matrix from the current output 10x1 matrix. This gives us a numeric representation of just how wrong the output of the neural network was. Remember, even if all the multiplication steps result in the correct output neuron lighting up, just how correct was it? Was it 0.75 or 0.9999? Was there another, wrong neuron that was also close to 1.0? The calculation below gives a very practical, quantified answer as to how the neural network almost or completely missed the boat when identifying the handwritten digit. 10fps 60fps

    expectedOutputDelta10X1 = currentOutputNeurons10x1 - expectedOutput10x1;

Next we multiply the 10x1 difference matrix we calculated in the last step by the hidden neurons. This gives us a rather largish 10x20 matrix that represents how much each hidden-to-output weight (10x20 matrix) needs to be adjusted to give a better result. 10fps 60fps

    hiddenToOutputWeightAdjustment10X20 = 
        np.matmul(expectedOutputDelta10X1, np.transpose(currentHiddenNeurons20X1));

Next we subtract the adjustment from the weights and apply a learnRate of 1% (0.01). Intuitively you might think of the learn rate as dampening the impact any given handwritten number has on the neural network. I mean, I think we can all agree that some people's writing of the number 3 will never be understood by humans, let alone software. So don't let those people with really bad handwriting trash your neural network! 10fps 60fps

    hiddenToOutputWeights10x20 += -learnRate * hiddenToOutputWeightAdjustment10X20;

We do a variation of the same thing for the 10x1 output bias matrix, subtract the difference times the 1% learnRate.

    outputBiases10x1 += -learnRate * expectedOutputDelta10X1;

Next we calculate the differential of the sigmoid. Sigmoid is 1/(1 - e-x) and if you recall the derivative of ex is another helping of e^x. There are some excellent guides on the calculus and algebraic gymnastics required to get to the simple form of sigmoid', google it and enjoy the rabbit hole, I know I did. Let's skip to the end though and go with sigmoid' = sigmoid*(1-sigmoid). Remember when we applied the sigmoid function to the 20x1 hidden neuron matrix? Super cool that we did, because now we can just use those values again! Voila! 10fps 60fps

    hiddenNeuronsSigmoidDifferential20X1 = 
        (currentHiddenNeurons20X1 * (1 - currentHiddenNeurons20X1));

Now we multiply the 10x20 hidden-to-output weight matrix by how much the output neurons missed the boat (a 10x1 matrix), giving us a 20x1 matrix for the next step. 10fps 60fps

    hiddenToOutputWeightsXExpectedOutputDelta20x1 = 
        np.matmul(np.transpose(hiddenToOutputWeights10x20), expectedOutputDelta10X1);

Now we get to use that sigmoid differential we calculated, we multiply the 20x1 matrix from the last step with the differential to give us a 20x1 matrix that represents just how wrong the hidden neurons were. 10fps 60fps

    expectedHiddenDelta20X1 = 
        hiddenToOutputWeightsXExpectedOutputDelta20x1 * hiddenNeuronsSigmoidDifferential20X1;

So we need to apply that shiny new 20x1 matrix of wrongness to the 20x784 input-to-hidden weights. This is where the original image comes to the rescue, we multiply each of the pixels (where a pixel is a number from 0 to 1) in the 784x1 image matrix by the each of the hidden neurons to come up with 20 layers of 784x1 weight adjustments to be applied to the 20x784 weights matrix. 10fps 60fps

    inputToHiddenWeightAdjustment20x784 = 
        np.matmul(expectedHiddenDelta20X1, np.transpose(image784x1));

Next we subtract 1% of that that monster 20x784 adjustment matrix from the 20x784 weights matrix. 10fps 60fps

    inputToHiddenWeights20x784 += -learnRate * inputToHiddenWeightAdjustment20x784;

Our final step is to subtract 1% of how wrong the hidden neurons were from the hidden bias.

    hiddenBiases20x1 += -learnRate * expectedHiddenDelta20X1;

In short the whole process can be described as:

  1. Calculate the neurons
  2. Calculate how wrong the neurons are
  3. Use the wrongness to adjust the weights and biases so the network can do a better job next time

Considerations for software leaders

Since I've spent a number of years leading software teams, I couldn't help but think of some things a leader can do to help a team engaging in the neural network problem space, and this is a software leadership website after all, so I made a list!:

Imagine when, out of the blue, your executive team shows up wanting to leverage AI™. You are going to need a labeled dataset, similar to the MNIST dataset, and it is highly unlikely that what passes for data in your company's workflows is going to cut the mustard. You will likely need to introduce new data markers and collect additional data. If nothing else you need to be able to understand if one row of data is a good outcome and another is not, which I claim just isn't in your dataset already. If it is, you are lucky! Nobody is that lucky.

 

Think of it this way, the neural net we explored in this article leveraged the really nice MNIST dataset for handwritten numbers. Your executive team is incrementally more likely to bust in the door with the equivalent of a wheelbarrow full of pages with scribbled numbers and want an AI solution that can read those pages and do something useful for their business. In short, you will have some expectations to manage and you will need to bargain for time so your team can build/acquire a dataset to train your new magical AI+BlockChain+InsertTrendHere™ powered solution!

Neural nets will happily process data that is just plain wrong and learn Something™. As an example, imagine if I accidentally used i*j to map the 748x1 image to a 28x28 matrix, instead of i*28 + j. I actually inadvertently ran this experiment so you don't have to. The neural net does come to conclusions about the data. It even sort of looks like it identifies things somewhat correctly, but it severely retards the accuracy of the model because pixel i*j is exactly the same as pixel j*i, so only half of the image is being used. Applying good old 1990s era unit tests for your dataset to matrix mapping code will help here. In the case of the image, make a unit test that takes the input matrix and turns it back into an image, then checks to see if it matches the original image pixel values (with some tolerances in your comparisons to original image if needed to account for rounding). Just like anything else in software, all change in neural networks is a mix of Value, Filler and Chaos, and mismapping your inputs is a common source of Chaos that is really hard to detect by just looking at the behavior of your neural network.

 

In conclusion, I hope this set of visualizations and learning was useful for you. If you are a software leader that is helping your team navigate machine learning, I would love to hear some tips for leaders that I can add to the list above.


 

Are you a technical leader? Consider investing in your career growth by becoming a member at iism.org/register.

 

Do you want to publish our articles on your site? Our nonprofit mission is to develop the careers of software leaders by sharing as much software management theory, research and knowledge as humanly possible. To that end, I'd like to take a moment to introduce iiSM.ORG's very liberal media policy. In short, all of our material and articles can be replicated in part or in whole on blogs and media outlets with a simple attribution back to the source. We especially appreciate those of you that translate iiSM.ORG material and articles to other languages!

You might have missed these articles