Are we on the same page?
Published November 03, 2025

Generated with Adobe Firefly AI
Introduction
The question of meaning and its connection to language has been with us for much longer than large language models have.
One of the most influential models in this regard comes from the Swiss linguist Ferdinand de Saussure (1857-1913). His work laid the foundation for modern linguistic theory and structuralism. Instead of focusing on the diachronic (referring to historical development) study of language, as was customary at the time, he viewed language as a self-contained system. In a metaphor for his research approach, he describes it as the position in a chess game: he considers the position itself, not the moves that led to the state on the board. This symbol also illustrates the distinction between langue (language) and parole (speech).
Langue refers to the rules, grammar, vocabulary, and syntax. The parts of language that are known to all speakers and form the basis of communication.
A single move on the chessboard would correspond to parole, i.e., the individual and transient use of language in writing or speaking. According to de Saussure, language can thus be divided into a superordinate system (langue) and its concrete application (parole).

The linguistic sign
One of the most important concepts for understanding language is the separation of the linguistic sign into the signifier and the signified. These are inseparable physical components. The comparison with a sheet of paper cited by de Saussure shows that its two sides cannot be separated without being destroyed, just as the reaction cannot occur without the action.
The signifier refers to the internal sound image of a word, while the signified refers to the associated idea. This can be clearly illustrated using the Spanish word “árbol” (tree). When you hear it for the first time, the sounds initially seem like mere noises. However, repetition creates a psychological trace in the memory—a sound image that remains accessible independently of the current speech act. This sound image is the signifier. Over time, it becomes linked to the general idea of “tree,” which does not stand for a specific tree, but for the abstract concept. This concept represents the signified.
Similar to signifiers, large language models use tokens, which are units into which an input value is broken down. These units are not necessarily words, but can also be parts of words such as prefixes or suffixes, punctuation marks, or numbers. These units are converted into a unique static numerical ID. Just as the connection between words and meanings is arbitrary, the numerical values for tokens are not directly linked to semantic meaning. These IDs alone are not sufficient for the language model to function. This requires so-called embeddings, which are the analog to the signified.
Embeddings are no longer mere numerical identifiers, but vectors in a high-dimensional space. These vectors are initialized randomly in the first step of training. During training, terms that appear in the same context are moved closer together. Therefore, in a trained network, semantically close terms are represented as locally close vectors.
Meaning through relation
This network of vectors establishes a clear connection back to de Saussure. In order to grasp the meaning of a word, signification - the connection between signifier and signified - is not sufficient. Terms must be related to each other in order to have meaning. Accordingly, the valeur (value) of a linguistic sign results from its position in the overall system.
This value can be described in two ways:
1. Conceptually: The value of a sign is determined by its opposition to all other signs in the system. The meaning of “red” comes not only from the idea of the color red, but also from the difference between it and the sum of what it is not: red is not blue, not green, not yellow... This difference gives “red” its unique value in the system.
2. Material: This value is also determined by the sound image, the material part of the speech symbol. The sound image “D-o-g” only has value because it differs from the sound image “C-a-t”.
Each individual sound is meaningless. Only the linking of them into a unique chain that differs from the other sound images in the system allows the sound image to carry a conceptual value.
The value of a linguistic sign is therefore always relational; it results from its position in a system of differences.
So, are we on the same page?
In summary, it can be said that a language model processes linguistic signs in a manner very similar to humans. De Saussure's concept of meaning through difference is more than evident in the functioning of language models.
The link between signified and signifier is also reflected in language models: a specific token triggers a specific activation pattern in the neural network of the language model, just as a specific sound pattern triggers a specific psychological response in humans.
However, one important detail must not be overlooked here. While a language model uses huge amounts of actual parole and learns from statistical correlations, for humans, language is not only characterized by the relationship between linguistic signs. Every linguistic sign is connected to an empirical concept.
For humans, the term “tree” is not only dependent on the connection between the terms “tree,” “branch,” “leaf,” “climb,” etc., but is also something that can be experienced.
In conclusion, we understand the term through similar mechanisms, but not in the same way.