Step 1: Import NumPy and Prepare Data
In any machine learning or data science project, data preparation is a critical step. It ensures that the data you’re working with is in the right format, and that you have all the necessary tools at your disposal to manipulate it. In this step, we’ll import the popular NumPy library and create a small dataset.
import numpy as np
# Our simple "language" consists of two sentences.
data = "hello world how are you I am fine"
# Create a vocabulary and lookup tables
chars = list(set(data))
char_to_ix = {ch: i for i, ch in enumerate(chars)}
ix_to_char = {i: ch for i, ch in enumerate(chars)}
Here’s what’s happening in the code above:
- Importing NumPy: We first import the NumPy library, which offers support for large, multi-dimensional arrays and matrices, along with mathematical functions to operate on these arrays.
- Creating a Dataset: A string is initialized to represent our small dataset. It’s a simple sentence that we’ll use to create a vocabulary.
- Building Vocabulary: We create two lookup tables -
char_to_ix
andix_to_char
. These tables allow us to convert characters to indices and vice versa, a common practice in natural language processing.
This basic preparation sets the stage for any further analysis or modeling that you might perform on this data. Whether you’re building a simple character-level model or something more complex, having your data properly prepared and understood is the foundation of any successful project.
Step 2: Create Training Data
In this step, we will use a sliding window approach to create pairs of input sequences and next characters. This method helps in preparing the data for training models like RNNs, where understanding sequences of data is crucial.
# Hyperparameters
HIDDEN_SIZE = 10
SEQ_LENGTH = 3
# Training data
X = []
y = []
for i in range(0, len(data) - SEQ_LENGTH):
X.append([char_to_ix[ch] for ch in data[i:i+SEQ_LENGTH]])
y.append(char_to_ix[data[i+SEQ_LENGTH]])
X = np.array(X)
y = np.array(y)
Here’s a breakdown of the code above and what each part accomplishes:
-
Defining Hyperparameters: We define two hyperparameters:
HIDDEN_SIZE
: The size of the hidden layer in the neural network, set to 10.SEQ_LENGTH
: The length of the sequence for the sliding window, set to 3.
-
Creating Training Data:
- We initialize two empty lists,
X
andy
, to store the input sequences and corresponding next characters. - We then use a
for
loop to iterate through the data using a sliding window of lengthSEQ_LENGTH
. - For each iteration, we append the corresponding character indices from
char_to_ix
toX
, and the next character index toy
. - Finally, we convert
X
andy
into NumPy arrays for easier manipulation and feeding into a model.
- We initialize two empty lists,
The sliding window approach is essential in problems where sequence or order matters. By preparing the data in this way, we ensure that the model can learn from the sequential dependencies in the text, making it possible to predict or generate subsequent characters based on the input.
This preparation sets the foundation for the next step, where we might feed this data into a neural network or other machine learning model for training. By understanding the relationships between sequences of characters, our model can learn to make predictions or generate new sequences that are coherent and contextually relevant.
Step 3: Initialize Weights
In this stage of the process, we define and initialize the weights for our network. Initializing the weights is a crucial step in training a neural network, as it sets the starting point for the optimization process.
# Model parameters
Wxh = np.random.randn(HIDDEN_SIZE, len(chars)) * 0.01
Whh = np.random.randn(HIDDEN_SIZE, HIDDEN_SIZE) * 0.01
Why = np.random.randn(len(chars), HIDDEN_SIZE) * 0.01
bh = np.zeros((HIDDEN_SIZE, 1))
by = np.zeros((len(chars), 1))
Here’s what’s happening in the code above:
-
Wxh
(Input to Hidden weights): This matrix is used to transform the input data (X
) into the hidden layer. It is initialized with random values drawn from a normal distribution, then multiplied by 0.01 to keep the initial weights small. -
Whh
(Hidden to Hidden weights): This matrix represents the weights between the hidden layers in recurrent networks. LikeWxh
, it’s initialized with small random values. -
Why
(Hidden to Output weights): This matrix transforms the hidden layer’s output to the final output layer. Again, it is initialized with small random values to ensure that the training process starts from a broad area in the weight space. -
bh
(Hidden Bias): This vector represents the bias for the hidden layer. It’s initialized to zeros, as adding small random values as biases at the beginning might not have a significant effect on the training process. -
by
(Output Bias): Similarly, this vector represents the bias for the output layer and is initialized to zeros.
The initial weights are typically small, random values, which helps break the symmetry and ensures that all the neurons in the network are learning different features. Multiplying by 0.01 ensures that the weights are not too large, helping to prevent issues related to exploding gradients.
The biases are initialized to zeros since they can adapt quickly during training, and starting them off with non-zero values doesn’t typically provide much benefit.
By correctly initializing these parameters, we set the stage for the training process, allowing our neural network to start learning from the data. It provides a good foundation for gradient-based optimization techniques, paving the way for the efficient training of the model in the subsequent steps.
Step 4: Training the Model
Now that we’ve prepared our data and initialized our network’s weights, we’ll dive into training our model. Training is the iterative process by which a neural network learns from the data. We’ll implement a simple feedforward pass and use gradient descent to optimize our model.
LEARNING_RATE = 0.1
EPOCHS = 1000
for epoch in range(EPOCHS):
for i in range(X.shape[0]):
# Input and target
inputs = X[i]
target = y[i]
# Forward pass
hs = np.zeros((SEQ_LENGTH+1, HIDDEN_SIZE, 1))
for t in range(SEQ_LENGTH):
xs = np.zeros((len(chars), 1))
xs[inputs[t]] = 1
hs[t+1] = np.tanh(np.dot(Wxh, xs) + np.dot(Whh, hs[t]) + bh)
ys = np.dot(Why, hs[SEQ_LENGTH]) + by
probs = np.exp(ys) / np.sum(np.exp(ys))
# Loss
loss = -np.log(probs[target])
# Backward pass (simplified, not handling recurrent connections)
dWhy = np.dot((probs - target), hs[SEQ_LENGTH].T)
dby = probs - target
# Update weights
Why -= LEARNING_RATE * dWhy
by -= LEARNING_RATE * dby
if epoch % 100 == 0:
print(f'Epoch: {epoch} Loss: {loss}')
Here’s a breakdown of this code:
-
Setting Hyperparameters: We define the learning rate and the number of epochs, which are vital hyperparameters for controlling the training process.
-
Training Loop: The outer loop represents the epochs, and the inner loop iterates through each sequence in our training data (
X
). -
Forward Pass:
- We initialize hidden states (
hs
) and input vectors (xs
). - We then loop through the sequence length, performing a forward pass through our network.
- The tanh activation function introduces non-linearity, and we compute the final output (
ys
) followed by the softmax activation to get probabilities.
- We initialize hidden states (
-
Calculating Loss: We calculate the loss using cross-entropy, which measures how well our model’s output matches the true distribution.
-
Backward Pass: In this simplified example, we calculate the gradients for the output-to-hidden weights (
dWhy
) and biases (dby
). -
Updating Weights: Using the calculated gradients, we update the weights and biases with gradient descent, scaled by the learning rate.
-
Printing Progress: We print the loss every 100 epochs to keep track of how our model is learning.
By repeatedly performing forward and backward passes and updating the weights, we enable our model to learn the relationships within the data. Over time, this will allow our model to make accurate predictions or generate new, coherent sequences, demonstrating the incredible capability of neural networks in recognizing complex patterns in data.
Note that this is a simplified version of training a recurrent neural network (RNN), and in practice, more complex algorithms like backpropagation through time (BPTT) might be used, especially for handling recurrent connections. But this code provides a foundational understanding of how training a neural network operates.
Step 5: Making Predictions
After training the model, the next fascinating step is to use it for making predictions. This is where we’ll witness the trained model in action as it takes an input sequence and predicts the next character.
Below is the Python code to make the prediction:
def predict(input_sequence):
hs = np.zeros((HIDDEN_SIZE, 1))
for t in range(SEQ_LENGTH):
xs = np.zeros((len(chars), 1))
xs[input_sequence[t]] = 1
hs = np.tanh(np.dot(Wxh, xs) + np.dot(Whh, hs) + bh)
ys = np.dot(Why, hs) + by
probs = np.exp(ys) / np.sum(np.exp(ys))
return np.argmax(probs)
input_sequence = [char_to_ix[ch] for ch in "how"]
predicted_index = predict(input_sequence)
print(f"Next character prediction: {ix_to_char[predicted_index]}")
In this code snippet, the function predict
takes an input sequence and transforms it through the hidden layers using the weights and biases that have been trained. It then calculates the probabilities for each character in the vocabulary and returns the index of the character with the highest probability.
The line input_sequence = [char_to_ix[ch] for ch in "how"]
converts the input string “how” into a sequence of character indices, and then we call the predict
function with this sequence.
Conclusion
Embarking on the journey of building a character-level prediction model with simple neural networks, we have touched upon the core concepts of deep learning. Through five clear steps, we have trained our model and observed it making predictions, an exciting proof of concept that demonstrates the power and flexibility of neural networks.
While our model is elementary, it’s a gateway to more complex and sophisticated models capable of understanding and generating human-like text. As technology advances, so does our ability to harness these models in various applications, from chatbots to writing assistants. The field of deep learning continues to be an exciting frontier, full of opportunities and innovation, and it all starts with understanding the basics, like those explored in this tutorial. Happy coding!