The Abstract Meaning Machine: From Philosophy to Language Models

A thought experiment

Imagine a scientist named Johnathan who is said to be a magician with words. Johnathan’s secret is that he doesn’t actually process words as strings of letters; instead, he takes every word in a sentence and, using a vast mental table, matches that word to its “meaning entity.” A meaning entity is an abstract object that purely represents the semantic weight of the word.

When Johnathan reads a sentence, he first converts every discrete word into its corresponding meaning entity. Then, he uses a specific set of methods to combine, weigh, and mix those entities together to create a unified “sentence meaning entity.”

Johnathan also uses these meaning entities as a tool for comparison. He compares various utterances by looking at their final entities to figure out how functionally similar two sentences are. Because he possesses the full landscape of these entities, he can find words that perfectly suit a given meaning entity, allowing him to build machines that take an entity and use it to generate highly probable continuations to any sentence.

LLMs!

Johnathan isn’t doing magic; he is executing the exact architecture of a modern Large Language Model. The thought experiment maps flawlessly onto the mechanics of transformers:

Mapping to Entities: This is the Embedding Layer. Words (tokens) are strictly mapped to continuous, high-dimensional vectors (the meaning entities).
Combining and Mixing: This is Multi-Head Self-Attention. The model compares every entity to every other entity in the sequence to update their values based on context. “Apple” next to “Mac” becomes a different entity than “Apple” next to “Tree”.
Sentence Meaning Entity: This is the contextualized vector representation outputted by the final layer of the network.
Generating Continuations: This is the Unembedding / Language Modeling Head. The final mixed entity is projected back into the vocabulary space to output a probability distribution over all possible next tokens.

The Philosophy

This idea—mapping discrete words to abstract “entities”—is not an invention of deep learning. It is one of the longest-standing approaches to linguistics and epistemology.

John Locke advanced the ideational theory of meaning, arguing that words are strictly outward marks of internal, mental entities (ideas). To Locke, communication was the act of translating one’s mental entities into words, so the listener could translate those words back into corresponding mental entities in their own mind.

Later, Bertrand Russell and the logical positivists stripped away the psychology for pure logic. In Russell’s view, words correspond to logical entities or “atoms” of the real world. A sentence is a logical construct of these entities.

This maps directly into truth-conditional semantics (like Montague grammar). In this framework, the “meaning” of a sentence is the precise condition under which it is true. Words are mapped to entities (often mathematical functions), and the grammar of the sentence is the algorithm that composes these functions together to yield a final truth value. Modern LLMs just replace “functions that yield truth values” with “vectors that yield probability distributions.”

Abstract Meaning Machine

We can formalize Johnathan’s process, and the philosophical theories behind it, into an Abstract Meaning Machine—a generalized framework for processing semantics:

1) Entity Mapping Algorithm: Every discrete token $x_i$ in a sequence is mapped to a structural entity $E_i$ .

2) Mixing Algorithm: The values of the entities are modified with respect to their positions and the other entities in the sequence. In a simple implementation, this is an $O(N^2)$ loop where every entity interacts with every other entity to update its own state:

E_i' = f(E_i, \{E_1, E_2, \dots, E_N\})

3) Generation Algorithm: The changed entities are collapsed (via pooling or selecting the final state) into a global sequence entity $E_{seq}$ . This object can be used to compare distances between concepts, or to infer a probability distribution $P(x_{t+1} \mid E_{seq})$ on all future tokens.

Alternative Entities

In modern AI, the “entities” are dense vectors of continuous real numbers. But the Abstract Meaning Machine doesn’t strictly require arrays of floats. What if we swapped the underlying data structure of the entities?

Truth-Conditional Objects: Instead of vectors, entities could be formal logical functions. Mixing them wouldn’t mean taking dot products; it would mean executing functional composition to determine the precise truth-conditions of the generated claim.
Lattices: Entities could be represented as nodes in a mathematical lattice (as seen in Formal Concept Analysis). Mixing entities would involve finding the “supremum” (least upper bound) or “infimum” (greatest lower bound) of concepts, creating hierarchical, perfectly interpretable structures of meaning rather than opaque geometric spaces.

End

The gap between a 17th-century philosopher wondering how ideas become words, and a 21st-century engineer optimizing a transformer, is surprisingly narrow. Whether the entities are Locke’s mental ideas, Russell’s logical atoms, or OpenAI’s high-dimensional vectors, the framework remains the same. We map the discrete, we mix the abstract, and we generate the future.