Dissection of a Tensor
Introduction: What Is It an Array?
Before going through what is an array and why they are so much used in Computer Science or Machine Learning, we will have to do a quick excursus into how variables are stored in memory.
When we declare a variable, that variable will be stored into the RAM (random access memory) of our computer. From this moment that variable, let's say $x$ will have an address that will point to that memory location, a variable that stores the address of another one is called pointer.
x = 4
print(hex(id(x)))
Here we have assigned the value $4$ to the variable x, what we will have is:
- Id: it is the unique identity of that variable
- Hexadecimal address: where that variable is located
The address is represented in hexadecimal format with 0-9 and A-F where each byte has an address. For example a two digit number as a limit of FF or 255 in decimal system which is the same as 8 bits in binary.
We can see that if we copy a variable to another one, the new one is basically a pointer to the old address. However, if a different variable is initialized with the same value as $x$, they will have a different address
y = x
print(id(y)==id(x))
print(hex(id(y)) == hex(id(x)))
z=4
print(hex(id(x)) == hex(id(z)))
What about Arrays
Arrays are objects composed of elements of the same type and stored in continuous memory cells (homoguous memory). Computationally speaking they are more efficient than lists as we can access their elements just by using the location as a single pointer, since the first element will define the memory location of the others, while lists have a different pointer for each element.
import numpy as np
import sys
# Create a Python list and a NumPy array with the same content
py_list = [1, 2, 3, 4, 5]
np_array = np.array([1, 2, 3, 4, 5])
print("Python list memory addresses:")
for i, item in enumerate(py_list):
print(f" Element {i}: value={item}, address={id(item)}")
print("\nNumPy array memory layout:")
print(f" Base address of array: {np_array.__array_interface__['data'][0]}")
print(" Item size in bytes:", np_array.itemsize)
for i in range(len(np_array)):
address = np_array.__array_interface__['data'][0] + i * np_array.itemsize
print(f" Element {i}: value={np_array[i]}, address={address}")
print("\nTotal memory used by list elements:", sum(sys.getsizeof(x) for x in py_list))
print("Total memory used by NumPy array:", np_array.nbytes)
Here we have print the number of bytes needed to store the list or the array and we can see the difference.
An Important Property of Arrays: Strided View
Now we have seen how arrays are stored and why it is more efficient than other data structure, it is important to discuss about a cool application of their continguous memory: the a strided representation.
Basically, each array $n$ of shape $(I, J)$ will be stored in memory as a continuous sequence of number, its elements, and the shape will just clarify how to view the array.
Basically we can define a view of our array as a final shape and an number of elements in the line before moving to the next row, column, called strides and expressed in bits (8 bits for example).
x = np.arange(6).reshape((2, 3))
print(x)
print(x.strides)
Here we have defined a two dimensional array that has to move 24 bits before changing row and 8 bits before changing column. With this information we can change the view just by changing the shape and the stride view:
new_shape = (3, 2)
new_stride = (16, 8)
new_x = np.lib.stride_tricks.as_strided(x, new_shape, new_stride)
print(new_x)
We are saying to move just two numbers before going to the next row and we have defined the final shape to have one more row and one less column.
Another important technique that arrays allow us to implement is to generate a bigger array just by tiling a smaller one with the tile operation.
x = np.eye(9)
num_rep = (3, 3)
x_tiled = np.tile(x, num_rep)
print(x_tiled.shape)
Here we have repeated the identity matrix 3 times on the rows and 3 times in the columns.
All these operations are at the core of many machine learning architectures such as convolutional neural networks (CNNs), which we will implement in the last section of this post.
What about Tensors?
Tensors are the main datatype used by the most used ML library: PyTorch. Mathematically, tensors are any scalar, vector or matrices and we can say that they are the same as arrays (the computer science name for them). The main difference between NumPy's arrays and Torch's tensors is the implementation.
Specifically to ML, tensors are implemented in a way by which they support autograd and in this way backpropagation, the key concept for training a ML system, to read more about the implementation of backprop and autograd I suggest you the first lecture of the YouTube course from Karpathy here (add link).
If you don't have time to do the course, very briefly, every time you perform an operation between two tensors, torch will compute a computational graph that can be use to calculate the gradient relative to that tensor.
import torch
a = torch.tensor(2.0, requires_grad=True)
y = a**2
y.backward()
print(a.grad.item())
In this way torch is able to compute the first derivative automatically if we specify that we want the gradient. Then the gradient is used to update the weights of the network depending on how the influence the loss.
Concretely, this is achieved by defining a class method for each operation between tensors that allows you to save the numerical derivative for that tensor, and when you call the backward() method you simply compute the chain rule relative to that tensor.
Concrete Example: Building a Convolutional Kernel from Scratch
With these concepts in mind, we can finally see how they are used under the hood of the PyTorch library to train a convolutional neural network from scratch.
Important concepts: define the forward pass for a simple 2d conv layer, how to do it without for loops, use it in practice.