The idea is simple: humans are naturally great at creating mosaic art. From the Roman Empire to French Neo-Impressionism, we can effortlessly put together single strokes of paint to create a larger, complex picture, perfectly balancing local actions with global context. Large Language Models, however, struggle with this because they fundamentally lack spatial understanding. Autoregressive Mosaics is an attempt to force an LLM trained only on text to paint a picture one discrete pixel at a time. The system provides the model with a blank grid (MxN) and a text prompt, and the LLM must logically infer where to place each structural character and color, step by step, using only its linguistic training. The final results, while often visually primitive or unimpressive, are incredibly amusing. They provide a raw, unfiltered glimpse into the inner workings of a neural network trained solely on text, exposing exactly how these models represent and often fracture everyday visual concepts and geometry.
And as is with any art, the outputs are open to interpretation from the viewer. Try squinting your eyes, what do you see? Does the output carry a resemblence to what you wanted?
To explore this phenomenon, the installation uses two distinct generation methods. In the first approach, the "ASCII canvas", the AI acts as a painstaking, literal painter. In a single forward pass, it generates a grid of text characters alongside a matching color palette, making a deliberate decision for every individual cell on the canvas. Because these models process information in a strict, one-dimensional sequence, they quickly lose track of two-dimensional coordinates. As a result, shapes drift, tear, and collapse into fascinating, fragmented abstractions.
The second approach, the "code canvas" asks the AI to act as a mosaic artist who's expert at writing code. Instead of placing raw pixels, the LLM writes a short computer program containing precise geometric instructions, like drawing lines, rectangles, and circles. A restricted rendering engine then executes this code to paint the final image. This neuro-symbolic method vastly outperforms the ASCII approach because it aligns with the model's true strengths. Language models are highly fluent in the abstract logic of programming; by writing code, the AI delegates the unforgiving mathematics of 2D space to a deterministic engine, allowing its conceptual understanding of an object to emerge with sudden, striking coherence.