👋 Introduction
Ever Wondered What’s Behind ChatGPT, DALL-E, or Other Generative AI and Machine Learning Models Used Today?
All of these cutting-edge technologies are based on neural networks. After all, deep learning is built upon neural networks, significantly changing the way machine learning works and introducing approaches that traditional algorithms can't handle. In this blog, we will discuss and explore neural networks—what they are, their inspiration, how they operate, their types, and their applications in daily life. So let’s get the neurons firing!
🤔 What is a Neural Network ?
All told, a neural network represents a crude imitation of how our brains function. While it does not resemble our brains very closely, it attempts to mimic selected aspects of how we learn and process information. When your brain notices something is a cat as opposed to a dog, it will typically contrast features such as body shape, ear size, and fur texture, often without your being aware of it. In much the same way, a neural network takes input data, breaks it down into smaller portions, and makes a decision (or gives an output).
🤖 How do Neural Networks Actually Work ?
Before understanding how neural networks work, let’s first look at their basic structure. Neural networks are built using multiple layers, each consisting of numerous nodes. Layers are categorized into three types:
Input Layer: This layer consists of input data.
Hidden Layers: These are the processing layers where all the mathematics occurs.
Output Layer: This layer consists of the predicted or output data.
There are interconnections between each node of two consecutive layers in a neural network, known as weights. These weights indicate the strength of the connections. Think of it this way: just as you become a faster typist with practice, the neural connections for typing get stronger over time. The number of neurons in each layer and the number of layers can be adjusted according to the use case.
🧠 Single Neuron Functionality
In a simple neuron, let’s say x1,x2, and x3
are the input data. The interconnections between the nodes are represented by the weights w1, w2, and w3
. The intermediate node value H is the summation of the products of the inputs and the weights, plus an additional bias B, which helps to introduce an adjustability factor.H = w1.x1 + w2.x2 + w3.x3 + B
The output Y is a function of H or Y = f(H)
, known as an activation function.
🧠 Single Neurons to Multiple Neurons
Seeing as we understand how single neurons work, neural processing communicates through the network, layer by layer. In the case of the input layer, the weights, biases, and activation levels are processed, and the results are passed to the output layer. The process keeps repeating until reaching the end of processing and reaches the final output. The last output can actually be either one neuron or a layer of many neurons, depending on the network's design or instruction to give this output.
So we calculated H = w1.x1 + w2.x2 + w3.x3 + B
and passed this through an activation function. This raises the question: What is an Activation function, and why do we need one? Can’t we just use a sine, cosine, or any other known linear function?
⚙️ Activation Functions
Activation functions introduce non-linearity to neural networks, enabling them to learn complex patterns. Functions like ReLU and Sigmoid are preferred for their efficient gradients and suitability for specific tasks, such as binary classification. Although sine and cosine are nonlinear, they cannot be effectively used, as they can lead to periodic behavior and other issues that hinder model performance. Linear functions also fail to work effectively for the same reason.
There are many known activation functions, such as Sigmoid, tanh, ReLU, Leaky ReLU, Maxout, and ELU. Their mathematical functions vary, but the most commonly used are the Sigmoid, ReLU
, and hyperbolic tangent (tanh)
functions.
ReLU : ReLU stands for Rectified Linear Unit. It essentially becomes an identity function (y=x) when x≥0
and becomes 0 when x<0
. This is a widely used activation function because it is nonlinear and straightforward.
Sigmoid : The Sigmoid function is bounded between 0 and 1. It approaches 0 for very negative values and 1 for very positive values, effectively “squishing” extreme values into this range. This is useful in neural networks when we want to ensure values aren’t excessively high or low, especially in the last layer when we need binary outputs (0 or 1).
🔍 Types of Neural Networks
Now that we’ve learned about neural networks and how they work, do we use the same neural network for all tasks? For instance, if I wanted to build my own large language model like ChatGPT and an image classifier that identifies dog breeds, would they use the same neural networks?
The answer is no; you can’t use a single type of neural network for all tasks.
The type of neural network architecture you choose depends on several factors, including:
Task specialization
Data characteristics
Model complexity
Optimized performance
Computational efficiency
There are various types of neural networks, such as:
1) Feedforward Neural Networks : This is the simplest form of artificial neural network, comprised of an input layer, hidden layers, and an output layer. These networks are useful for solving binary classification or regression problems, such as identifying whether a person is male or female based on height and weight, or predicting the prices of houses.
2) Convolutional Neural Networks (CNNs): This type of neural network decomposes data into smaller parts. Using a graphical method, CNNs break an image into smaller portions for processing.An iteration of input portions gets combined with unseen portions of data, allowing feature extraction and pattern recognition by the network. This helps CNNs in image classification: whether it is a fiercely barking German shepherd or a cute little Chihuahua.
3) Recurrent Neural Networks (RNNs): RNNs are a special class of neural networks designed to handle sequential data, such as time series or text. Unlike traditional feedforward neural networks, RNNs have connections that loop back on themselves, allowing them to maintain a hidden state that can capture information about previous inputs in the sequence. This feedback loop enables the model to remember previous inputs and their context, making RNNs particularly suited for tasks where context matters. RNNs are commonly applied in text-to-text translation, sentiment analysis, text generation, and chatbot systems.
- Transformers: The transformer models mark a great leap forward in machine learning. This facilitates various methods of tackling the natural language processing (NLP) landscape, from the 2017 paper "Attention Is All You Need" to potentially building inroads into computer vision tasks. Succinctly put, unlike their recurrent neural network (RNN) counterparts**,** transformers do not process data in a particular sequential order; they consider all elements of a sequence simultaneously by attention mechanisms that greatly aid both in efficient training and improved performance on large datasets. This advance tipped off the evolution of large language models like ChatGPT, Llama, and BERT.
We’ve explained Transformers in our Instagram Post : View Post
⚠️ Limitations of Neural Networks
Throughout this blog, we’ve explored how great neural networks are, but are there any associated problems? Although neural networks can process vast amounts of data and predict outcomes beyond human comprehension, certain limitations still exist:
Black Box: Neural networks are often referred to as a "black box," meaning we don’t always know why a neural network produces a certain output. With many layers and nodes, it becomes challenging to pinpoint the exact reason behind a prediction. Since we can’t see inside the black box, this lack of transparency can sometimes lead to trust issues. There’s also the risk of bias and unfairness; biased outputs may go unnoticed due to our inability to examine the inner workings of the model.
Shwetangshu Biswas’ Blog on Black Box : Click HereOverfitting : Overfitting refers to the situation where a neural network is trained too much on the training data and thus does poorly with unseen data. It becomes overly familiar with the training set, failing to generalize to real-world scenarios; thus, the prediction is no longer reliable. For example, if a housing price prediction model is trained only on data from a specific neighborhood, it may learn patterns unique to that area. Consequently, it will struggle to predict prices in different neighborhoods, demonstrating overfitting.
Large Data Requirements: As neural networks work by discovering patterns in the data, we need to have high-quality data in large quantities to train these models. Data needs to be clean and tidy, without any noise; and if not provided with such data, the model is unable to discover any patterns and will produce poor results. For example, if the dataset for training on a facial recognition system consists of blurry or poorly lit images, the neural network may misidentify clear images of individuals, emphasizing how low-quality data leads to poor predictions.
Computational Expense: Training neural networks has a rather high computational cost compared to that of traditional machine learning models, due to the complex architecture of networks with multiple layers and many neurons in each layer. The types of features and the volume of data used for training also contribute to the computational cost. All this could lead to increased training times, energy consumption, and overall cost of operation.
🎯 Conclusion
Neural networks mark a major leap forward in machine learning, allowing models to grasp complex patterns and tackle various tasks. Ranging from simple structures like feedforward networks to advanced designs like transformers, these models are changing how we approach generative AI, natural language processing, and image recognition. Despite their power, neural networks face challenges such as overfitting, high computational demands, and limited interpretability. Recognizing these strengths and weaknesses is essential for effectively using neural networks in practical applications. As technology progresses, neural networks will continue to lead the way, fostering innovation and transforming various industries. Let’s keep delving into the intriguing realm of AI and discover what the future holds!