How it works · Funcionamiento · Funzionamento · Funcionamento

How is AI Created?

AI training

Creating an AI does not mean “programming all the answers”, but designing a system capable of learning from data.

Most current AI systems work through something called an artificial neural network.

What is a neural network?

A neural network is a mathematical model inspired —in a very simplified way— by the human brain.

It is made up of artificial neurons (small computational units) organized in layers and connected to each other. Each connection has an associated number called a weight, which indicates the importance of that signal.
Learning consists, precisely, in adjusting those weights.

When a neural network is created, the weights are initialized with small random values. The network therefore begins “knowing nothing” and learns by automatically adjusting those values through a mathematical algorithm called error backpropagation (backpropagation).

How does a neural network learn?

The basic process is the following:

The network receives an input (for example, an image).
It produces an output (for example: “dog”).
This output is compared with the correct answer (for example: “cat”).
The error is calculated.
That error is propagated backwards through the network.
The weights are slightly adjusted to reduce that error.
This process, repeated millions of times, is called training.

Types of learning

There are different types of learning:

Supervised learning: The model learns from previously labeled examples. It proposes an answer and compares it with the correct one; if it fails, it adjusts its parameters.

Reinforcement learning: There is no direct correct label. The model learns through trial and error: it performs an action, receives a reward or penalty, and adjusts its strategy according to the result.

How are language models trained?

In models such as those developed by OpenAI, training usually has several phases:

First, they learn to predict the next word in millions (or billions) of texts.
Then they are adjusted using supervised learning.
Finally, they are refined using reinforcement learning based on human evaluations, with the goal of improving the quality, usefulness, and safety of the responses.

Why do they need so much computing power?

Modern networks can have millions or even billions of parameters (weights). Each adjustment involves enormous mathematical calculations that are performed using GPUs (graphics processors) located in large data centers.

How do we know it has learned well?

The model is tested with new data that it has not seen during training. If it works well with that data, we say that it has generalized; otherwise it will only have memorized.

The black box

Cuando interaccionamos con una IA, conocemos la capa de entrada, el prompt con el que hacemos una solicitud y la capa de salida: la respuesta producida. Pero no los procesos que se realizan entre esos dos puntos y que suele nominarse de forma inquietante como la caja negra; las redes neuronales artificiales son tan complejas matemáticamente (con miles de millones de parámetros) que para un humano es imposible seguir exactamente qué combinaciones concretas llevaron a una determinada respuesta, qué unidades de cálculo fueron las más decisivas o cómo interactuaron entre sí.

Want to Dive Deeper?

Weights

Weights are real numbers (positive or negative), generally very small and with many decimal places, such as 0.8, −1.37, 0.0021, −0.00045.

Each weight indicates how much influence a signal has on the next neuron (computational unit).

If the weight is large and positive → that signal has a strong influence.
If it is small → it has little influence.
If it is negative → it reduces or reverses the influence.

A simple example: imagine we want to predict whether someone will pass an exam. The network receives three pieces of data:

hours of study
hours of sleep
mobile phone use

And it assigns a weight to each of them:

study → 0.9
sleep → 0.4
mobile phone → −0.8

This means that:

Studying has a strong positive influence
Sleeping helps a little
Using the phone a lot has a negative effect

But all of this happens as a mathematical calculation, not as a “moral judgment”.

During training, the weights are adjusted very gradually.

In today’s large models there may be billions of weights, and each one is a decimal number. That is why supercomputers are needed to handle them. A neural network does not store ideas; it stores numbers. And yet those numbers make it possible to write poems, solve problems, or describe emotions.

Gradient descent

The adjustment of the weights is carried out using a mathematical technique called gradient descent. Let’s see how it works with a very simple example, with only one weight and without complicated formulas.

Let us suppose that our goal is for the model to return, as an answer, a number that is double the value it receives as input.

The mathematical model we design will be very simple and will follow the formula:

Output = weight × input

There is only one weight. That weight is the number we want to find so that the answers we obtain are correct.

Step 1

We start with a random weight. In our example, it is 1.

Step 2

The AI tries an input value, for example 3, and performs the calculation:

Output = 1 (weight) × 3 = 3

But the correct answer should be 6. ❌

Step 3

The machine then calculates the error:

Error = correct result − model result

Error = 6 − 3 = 3

The model has fallen short.

We need to increase the weight.

Step 4: applying the idea of gradient descent

If the error is positive, it means that we must increase the weight. But the adjustment will be small. Suppose the adjustment factor is 0.1.

New weight = previous weight + (0.1 × error)

New weight = 1 + (0.1 × 3)

New weight = 1 + 0.3 = 1.3

Step 5: repeating the calculation

We test again with input = 3

Output = 1.3 × 3 = 3.9

It is still not 6, but it has improved.

New error = 6 − 3.9 = 2.1

Step 6: we adjust again.

New weight = 1.3 + (0.1 × 2.1)

New weight = 1.3 + 0.21 = 1.51

Step 7: we keep repeating…

1 → 1.3 → 1.51 → 1.657 → 1.7599 → 1.831930 → 1.882351 → 1.917646 → 1.942352 → 1.959646 → 1.971752 → 1.980227 → 1.986159 → 1.990311 → 1.993218 → 1.995253 → 1.996677 → 1.997674 → 1.998372 → 1.998860 → 1.999202 → 1.999441 → 1.999609 → 1.999726

As we can see, the weight gets closer and closer to 2, even if it never exactly reaches it. But since all systems have limited precision and round values, we would eventually obtain the number 6. At that point the learning stops. Absolute perfection is not the goal; the goal is for the error to be insignificant.

Output = 1.999726 × 3 = 5.999178 ≈ 6

Error = 0.000822 ≈ 0

The model has discovered the correct weight (2) through calculations, gradually reducing the error using gradient descent.

This very simple example must be scaled up. In an AI model, instead of one weight there are millions, and instead of one multiplication there are millions of chained operations.

Obra publicada con Licencia Creative Commons Reconocimiento Compartir igual 4.0