Build Neural Networks in Python & Scratch: A Fun Intro!

🌐🇩🇪 Deutsch 🇫🇷 Français 🇫🇷 Français 🇪🇸 Español 🇺🇸 English

📖 13 min read•2,582 words•Updated Mar 26, 2026

unlocking AI: Creating Neural Networks in Python and Scratch

Hi, I’m Alex Petrov, an ML engineer. Today, we’re going to demystify neural networks. You might think they’re complex, reserved for advanced programmers. But I’ll show you how to start creating neural networks in Python, and surprisingly, even visualize core concepts using Scratch. This article provides practical, actionable steps for beginners to understand and build their first AI models.

Our journey will cover the foundational theory, a hands-on Python implementation, and then a creative way to grasp the mechanics through Scratch. The goal is to make “creating neural network in ptyhono in scratch” accessible and understandable for everyone.

What is a Neural Network? The Core Idea

Imagine your brain. It has billions of neurons, interconnected, processing information. A neural network is a simplified model of this biological process. It’s a series of algorithms that tries to identify underlying relationships in a set of data through a process that mimics the way the human brain operates.

At its heart, a neural network takes input data, passes it through layers of interconnected “neurons” (or nodes), and produces an output. Each connection has a “weight,” and each neuron has a “bias.” These weights and biases are adjusted during training to make the network’s predictions more accurate.

Inputs, Hidden Layers, and Outputs

Think of it like this:

Input Layer: This is where your data enters. For example, if you’re predicting house prices, inputs might be square footage, number of bedrooms, and location.
Hidden Layers: These are the “thinking” parts. They perform calculations on the input data, transforming it. A network can have one or many hidden layers. More complex problems often require more hidden layers.
Output Layer: This is the final result. For house prices, the output would be the predicted price. For classifying images, it might be the label “cat” or “dog.”

Weights and Biases: The Network’s Learnable Parameters

Every connection between neurons has a weight. This weight determines the strength and importance of that connection. A higher weight means that input has a stronger influence on the next neuron’s activation. A bias is an additional parameter in each neuron that helps shift the activation function. Together, weights and biases are what the neural network “learns” during training.

Activation Functions: Introducing Non-Linearity

After a neuron receives inputs, multiplies them by their weights, and adds a bias, it passes the result through an activation function. This function introduces non-linearity into the network. Without activation functions, a neural network would simply be a linear model, unable to learn complex patterns. Common activation functions include ReLU (Rectified Linear Unit), Sigmoid, and Tanh.

Building a Simple Neural Network in Python

Now, let’s get practical. We’ll use Python to build a basic neural network. For this, we’ll use NumPy for numerical operations, which is a fundamental library for scientific computing in Python. This will be our first step towards creating neural network in Python.

Setting Up Your Environment

First, make sure you have Python installed. Then, install NumPy:

pip install numpy

The Problem: XOR Gate

We’ll train our network to solve the XOR (exclusive OR) problem. The XOR gate is a classic example in neural networks because it’s not linearly separable. This means a single straight line can’t separate the true outputs from the false outputs. It requires a hidden layer.

XOR truth table:

Input (0, 0) -> Output (0)
Input (0, 1) -> Output (1)
Input (1, 0) -> Output (1)
Input (1, 1) -> Output (0)

Python Implementation: A Two-Layer Neural Network

Here’s the Python code for a simple feedforward neural network with one hidden layer. We’ll use the sigmoid activation function and implement the backpropagation algorithm for training.


import numpy as np

# Sigmoid activation function and its derivative
def sigmoid(x):
 return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
 return x * (1 - x)

# Input dataset
X = np.array([[0,0],
 [0,1],
 [1,0],
 [1,1]])

# Output dataset
y = np.array([[0],
 [1],
 [1],
 [0]])

# Seed for reproducibility
np.random.seed(1)

# Initialize weights and biases
# Weights for input to hidden layer (2 inputs, 4 hidden neurons)
weights_input_hidden = np.random.uniform(size=(2,4))
# Weights for hidden to output layer (4 hidden neurons, 1 output)
weights_hidden_output = np.random.uniform(size=(4,1))

# Biases (optional, but good practice for more complex networks)
# For simplicity, we'll omit explicit biases in this basic example,
# letting weights handle the shift for now.
# For more advanced networks, biases are crucial.

learning_rate = 0.1
epochs = 10000

print("Initial weights (input to hidden):\n", weights_input_hidden)
print("Initial weights (hidden to output):\n", weights_hidden_output)

for epoch in range(epochs):
 # Forward propagation
 # Calculate hidden layer output
 hidden_layer_input = np.dot(X, weights_input_hidden)
 hidden_layer_output = sigmoid(hidden_layer_input)

 # Calculate output layer output
 output_layer_input = np.dot(hidden_layer_output, weights_hidden_output)
 predicted_output = sigmoid(output_layer_input)

 # Backpropagation
 # Calculate error
 error = y - predicted_output

 # Calculate delta for output layer
 d_predicted_output = error * sigmoid_derivative(predicted_output)

 # Calculate error for hidden layer
 error_hidden_layer = d_predicted_output.dot(weights_hidden_output.T)
 d_hidden_layer = error_hidden_layer * sigmoid_derivative(hidden_layer_output)

 # Update weights
 weights_hidden_output += hidden_layer_output.T.dot(d_predicted_output) * learning_rate
 weights_input_hidden += X.T.dot(d_hidden_layer) * learning_rate

 if epoch % 1000 == 0:
 loss = np.mean(np.abs(error))
 print(f"Epoch {epoch}, Loss: {loss:.4f}")

print("\nTraining complete.")
print("Final weights (input to hidden):\n", weights_input_hidden)
print("Final weights (hidden to output):\n", weights_hidden_output)
print("\nPredicted output after training:\n", predicted_output.round())

Understanding the Python Code

Let’s break down the Python code for creating neural network in Python:

`sigmoid` and `sigmoid_derivative` functions: These implement the sigmoid activation function and its derivative, essential for backpropagation.
`X` and `y` datasets: These represent our input and output for the XOR problem.
Weight Initialization: `weights_input_hidden` and `weights_hidden_output` are initialized with random values. This randomness is crucial to avoid all neurons learning the same thing.
`learning_rate`: This controls how much the weights are adjusted during each training step. A smaller learning rate means slower but potentially more stable learning.
`epochs`: The number of times the network will go through the entire training dataset.
Forward Propagation:
- Inputs are multiplied by `weights_input_hidden` to get the `hidden_layer_input`.
- `hidden_layer_input` is passed through the `sigmoid` function to get `hidden_layer_output`.
- `hidden_layer_output` is multiplied by `weights_hidden_output` to get `output_layer_input`.
- `output_layer_input` is passed through `sigmoid` to get the `predicted_output`.
Backpropagation: This is the “learning” part.
- Error Calculation: We find the difference between the `y` (actual output) and `predicted_output`.
- Output Layer Delta: This is the error multiplied by the derivative of the sigmoid of the predicted output. It tells us how much to adjust the output layer weights.
- Hidden Layer Error: We propagate the error back from the output layer to the hidden layer.
- Hidden Layer Delta: Similar to the output layer, this tells us how much to adjust the hidden layer weights.
- Weight Updates: Finally, the weights are adjusted based on their respective deltas and the `learning_rate`. This is the core of how the network learns.

After running this code, you’ll see the network’s loss decrease over epochs, and the `predicted_output` should closely match the `y` (XOR truth table) values, demonstrating successful learning. This is a foundational example for creating neural network in Python.

Visualizing Neural Network Concepts with Scratch

Understanding the abstract math can be tough. That’s where Scratch comes in! While you can’t build a complex neural network directly in Scratch, you can create interactive simulations that demonstrate core concepts like neurons, weights, and activation. This helps cement the understanding of “creating neural network in ptyhono in scratch” by providing a visual analogy.

Why Scratch for Neural Networks?

Visual Feedback: See how inputs affect outputs in real-time.
Interactive Learning: Manipulate “weights” and “biases” directly.
Simplifies Complexity: Focus on one concept at a time.
Engaging: Makes learning fun and accessible.

Creating a Single “Neuron” in Scratch

Let’s simulate a single neuron that takes two inputs, multiplies them by weights, sums them, and applies a simple threshold (like a step activation function). This is a great way to visualize a perceptron, the simplest form of a neural network.

Scratch Project Idea: A “Decision Maker” Neuron

Imagine a neuron that decides if you should “Go Outside” based on two inputs: “Is it Sunny?” (1 for Yes, 0 for No) and “Is it Warm?” (1 for Yes, 0 for No).

Steps in Scratch:

Create Variables:
- `Input_Sunny` (slider, range 0-1)
- `Input_Warm` (slider, range 0-1)
- `Weight_Sunny` (slider, range -2 to 2)
- `Weight_Warm` (slider, range -2 to 2)
- `Bias` (slider, range -2 to 2)
- `Weighted_Sum`
- `Output_Decision` (will be 0 or 1)
Create Sprites:
- A “Neuron” sprite (a circle)
- An “Output” sprite (e.g., a happy face for “Go Outside”, a sad face for “Stay Inside”)

Neuron Sprite Script:


 when green flag clicked
 forever
 set Weighted_Sum to (Input_Sunny * Weight_Sunny) + (Input_Warm * Weight_Warm) + Bias
 if Weighted_Sum > 0 then
 set Output_Decision to 1 // Go Outside
 else
 set Output_Decision to 0 // Stay Inside
 end
 end

Output Sprite Script:


 when green flag clicked
 forever
 if Output_Decision = 1 then
 switch costume to [Happy Face]
 say "Let's go outside!" for 2 seconds
 else
 switch costume to [Sad Face]
 say "Staying in today." for 2 seconds
 end
 end

Experimenting in Scratch

Once you’ve built this in Scratch, play around with the sliders:

Change `Weight_Sunny` and `Weight_Warm`: How does increasing one weight make that input more important for the “Go Outside” decision?
Adjust `Bias`: How does the bias affect the threshold? Can you make the neuron always decide to go outside, even if it’s not sunny or warm? Or always stay inside?
Observe `Weighted_Sum`: See how it changes with different inputs and weights.

This simple Scratch project vividly illustrates the core mechanics: inputs, weights, summing, and an activation (threshold) function. It’s an excellent visual aid for understanding the “creating neural network in ptyhono in scratch” connection, particularly the “scratch” part for conceptual understanding.

Expanding Your Knowledge: Next Steps in Python

While our simple Python example is a great start, real-world neural networks use more advanced libraries and techniques. Here are some directions to explore after mastering the basics of creating neural network in Python:

TensorFlow and Keras

These are powerful, widely-used libraries for building and training neural networks. Keras, in particular, provides a high-level API that makes building complex models much easier than with raw NumPy. You define layers, activation functions, and compile your model with just a few lines of code.


# Example using Keras (simplified)
from tensorflow import keras
from tensorflow.keras import layers

# Define the model
model = keras.Sequential([
 layers.Dense(4, activation='relu', input_shape=(2,)), # Hidden layer with 4 neurons, ReLU
 layers.Dense(1, activation='sigmoid') # Output layer with 1 neuron, Sigmoid
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model (using X and y from our XOR example)
model.fit(X, y, epochs=1000, verbose=0)

# Make predictions
predictions = model.predict(X)
print("Keras predictions:\n", predictions.round())

Different Activation Functions

Experiment with ReLU, Leaky ReLU, Tanh, and other activation functions. Each has its strengths and weaknesses depending on the problem.

More Layers and Neurons

Build deeper networks (more hidden layers) and wider networks (more neurons per layer) to tackle more complex problems. Be aware of overfitting, where the network learns the training data too well and performs poorly on new, unseen data.

Loss Functions and Optimizers

Explore different loss functions (e.g., Mean Squared Error, Categorical Crossentropy) and optimizers (e.g., Adam, SGD with momentum). These significantly impact how well and how fast your network learns.

Real-World Datasets

Move beyond XOR. Work with datasets from scikit-learn (e.g., Iris, Wine) or publicly available datasets like MNIST for image classification. This is where the power of creating neural network in Python truly shines.

Conclusion

You’ve taken a significant first step into the world of AI by understanding and implementing a neural network. From the foundational concepts of neurons, weights, and activation functions, to a practical Python implementation for the XOR problem, and even a visual simulation in Scratch, you now have a solid grasp. The journey of creating neural network in ptyhono in scratch is not just about coding; it’s about building intuition and understanding how these intelligent systems learn.

Remember, AI is an iterative process. Keep experimenting, keep learning, and don’t be afraid to break things and rebuild them. The skills you’ve gained here are transferable and will serve as a strong foundation for more advanced machine learning projects. Happy coding!

FAQ

Q1: Why is a hidden layer necessary for problems like XOR?

A1: The XOR problem is “non-linearly separable.” This means you cannot draw a single straight line to separate the inputs that result in an output of 0 from those that result in an output of 1. A single-layer perceptron (without a hidden layer) can only learn linearly separable patterns. A hidden layer allows the neural network to learn more complex, non-linear relationships by transforming the input data into a new representation that can then be linearly separated by the output layer.

Q2: What is the main difference between using NumPy and Keras for creating neural networks?

A2: NumPy provides the fundamental tools for numerical computation in Python, allowing you to implement neural networks from scratch by manually handling matrix multiplications, activation functions, and backpropagation. This gives you deep insight into the underlying mechanics. Keras (built on TensorFlow) is a high-level API that abstracts away much of this complexity. It provides pre-built layers, optimizers, and loss functions, making it much faster and easier to build, train, and experiment with complex neural network architectures, especially for larger datasets and more sophisticated models. While NumPy is great for learning the basics, Keras is preferred for practical, real-world applications.

Q3: Can I build a full, complex neural network directly in Scratch?

A3: No, Scratch is not designed for building full, complex neural networks. It lacks the computational efficiency, mathematical libraries (like NumPy), and advanced features required for training large models with many layers and parameters. However, Scratch is an excellent tool for visualizing and understanding the fundamental concepts of neural networks, such as how individual neurons work, how inputs are weighted, and how activation functions make decisions. It’s a fantastic educational tool for beginners to grasp the intuition behind “creating neural network in ptyhono in scratch” before exploring the code.

Q4: How important is the learning rate, and what happens if it’s too high or too low?

A4: The learning rate is crucial! It determines the step size at which the neural network’s weights are updated during training.

Too High: If the learning rate is too high, the network might “overshoot” the optimal weights, causing the loss to oscillate wildly or even diverge (increase instead of decrease). The network might never converge to a good solution.
Too Low: If the learning rate is too low, the network will learn very slowly. Training will take a very long time, and it might get stuck in a “local minimum” – a suboptimal solution – before reaching the global optimum.

Finding an appropriate learning rate is often a process of trial and error, or using adaptive optimizers like Adam, which adjust the learning rate automatically during training.

🕒 Last updated: March 26, 2026 · Originally published: March 15, 2026

🧬

Written by Jake Chen

Deep tech researcher specializing in LLM architectures, agent reasoning, and autonomous systems. MS in Computer Science.

Learn more →