Numinex

I enjoy talking with AI models. They are extremely and increasingly intelligent. They have incredible and unexpected capabilities, and through the magic of in-context learning, they reveal those capabilities through interactions with humans and each other. But for all this, they are stuck in products that are remarkably poor tools for thought.

As a user, I can get Claude warmed up to think about some problem over the course of a chat, and get some interesting results. And then, the chat ends, or the context is polluted, and that’s the end. It’s not meaningfully accretive. I can’t easily reference specific points from one conversation into another, or even find them again later. I can’t pull on different threads, take different paths, compare different prompts. If a model takes the wrong turn, I can’t curate the context and prevent it from being stuck with its mistakes. I can’t share specific messages to my friends or colleagues, or let them continue a conversation I started. I know I’ve learned interesting ways to elicit behavior from models, but it’s very difficult to share the tacit knowledge that comes from large time investments in solo-player practice and exploration. And I don’t know what else is out there, because I can’t watch how other people interact with models.

I believe a root cause of this and other problems is that there is no verifiable, persistent, and composable record of human<>AI interactions. Moreover, this absence causes problems well beyond suboptimal AI product experiences, problems that will become more acute as models become more capable and take on a larger share of society’s reasoning. When “thinking together” moves out of the sphere of public reason into siloed 1:1 chats, our preëxisting problems of low trust in societal institutions and fragmentation of shared reality get much worse. Instead, we need both an open data layer with clear provenance information, as well as user-aligned agents that can help individuals navigate it, and tilt the balance of power away from coordinated information operations and back towards individual autonomy. In the longer term, such a data layer could also serve as a grounding system providing adversarially robust sense perception about the state of the world to future models, in a way that is not possible for any single company to build or control.

As a small but concrete way in to thinking about this, I’ve been working on an experiment called Numinex. Numinex is a natively multiplayer, open-world AI chat system. Numinex differs from existing AI chat systems in a number of important ways:

Numinex presents as a branching comment tree, akin to (old) Reddit threads, rather than a linear chat interface. This remains familiar but allows exploring different prompts and continuations.
Numinex is natively multiplayer and multimodel. Any user can continue other users’ conversations by creating their own replies, and select which model(s) they wish to engage with.
Numinex allows users to curate context, and gives them tools to do so. Most tools focus on helping users add material to the context window. But the most important skill in curation is exclusion, not inclusion. Numinex is designed to give its users a clear mental model of exactly what is in the context window at all times.
Numinex is public, open, and composable. Built on the AT Protocol created for Bluesky, all data — user messages, prompts, generations — are recorded as verifiable records with content-addressed referencing. This prepares for the near future where the cost of custom interfaces goes to zero, but the problems of data interchange and provenance remain.
Perhaps subtly but just as importantly, Numinex represents AI generations prompted by a user as artifacts of that individual user, rather than as posts made by an account representing the model (@grok, @claude, etc). This encourages users to think of models as tools they can engage with deeply and individually, observing the much more interesting and surprising model personalities that would be obscured by asking the model to roleplay as a particular chatbot.

Numinex is live now, and you can play with it if you’d like. Currently, it has only basic features, focused on multiplayer prompting rather than search, feeds, or discovery, but it’s already useful enough for me to displace some of my use for the Claude and ChatGPT apps. At this stage, it’s a “bring your own key” model only, so you’ll need an Anthropic or OpenAI API key to get started, as well as a Bluesky account.

How does it work? The basic atom of Numinex is the post. Posts contain text and (optionally) embedded content, such as other posts, links, or text gists. Posts can trigger responses by AI models. These responses are recorded as posts by the user who triggered the generation, with metadata indicating what model was used. All data created by a user is recorded in the user’s ATProto PDS (Personal Data Store), then indexed and compiled by the Numinex application view.

The rule for context inclusion is simple: when a post triggers generation, the context window for that generation starts with a (customizable) system prompt, includes all messages in the thread from root to leaf, and does not include any side branches. However, each message in the thread recursively includes all of its embedded content. Thus, material can be brought into context by quoting it upthread of the generation. For instance, a post linking to an article brings that article into context. A reply to that post with a quote of a link to a second article brings that into context. To visualize context inclusion when viewing a post, the UI uses a linearized “transcript” view of its parent posts, displaying the exact contents of the context window, and switches to a branching tree view of its children, displaying the different paths the discussion could take next.

This structure seems to work well, at least for me. Consider the use case of linking to an arxiv paper and discussing it with a model. The link tool recognizes arxiv links and pulls a snapshot of the TeX source into context for my root link post. I can read through the paper and write notes in replies. Each time something is unclear, I can ask the question in a new reply, and have a clean context with just the paper contents. I can explore different branches at different times, while still remaining reasonably efficient in terms of prompt caching. And I can cross-reference posts in different threads as I go to augment the model’s context: if I want to refer to another paper, for instance, I can just link to it, and pull its contents into context for that subtree of discussion.

I also think it’s useful to have implicit context surfacing, as a natural counterpart to explicit context control. I haven’t shipped it yet, but I have a prototype of an agentic auto-context system which automatically surfaces relevant posts and links while you compose replies, making it easy to transclude them into the current discussion. It also provides a natural discovery mechanism that blurs the line between “feeds” and “search”. One observation about feed algorithms is that there’s a tension between the benefit of a system relying on implicit observation of revealed preference, and the fact that revealed preferences are not always intended preferences. It would seem that the ideal is an agentic system that learns from your preferences but that you can also talk to explicitly about what you want to see. As a general comment, this spells trouble for incumbent social platforms, who can’t build such a system because it’s incompatible with their business model. But your business model is incompatible with the best user experience, you’re in trouble.

Generation controls allow the user to select multiple models to reply to the same post, or multiple instances of the same model. This makes it easy to compare behavior across models, or to gather more samples from a single model. One use case I’ve found for the latter, when combined with the quote-inclusion mechanics described above, is to easily “expand” and then “contract” model generations. I can ask for 3 to 5 samples from a particular model, stash them for inclusion in another post, then ask the another model to synthesize the best features of each response, or to compare the responses, and then stash that synthesis to continue the original conversation without any context pollution. This type of behavior has always been possible when writing API calls, but it’s extremely useful to have instantly accessible in a chat interface.

Finally, Numinex makes an intentional decision to record and present model outputs as records created and owned by the user who triggered the generation, and not as a post created by an account corresponding to “the model”. On reflection, I think this is possibly the most important and essential design choice, for both practical and fundamental reasons. On the practical side, it ensures that users own their data and can choose how much they wish to spend on inference quality. And it minimizes an annoying dynamic where someone (perhaps implicitly) instructs a model to say X or Y, then turns around to exclaim that “the model” said X or Y, often blaming the model creator, while minimizing their role in creating the prompts and context that lead to that generation. In short, if you ask the model to be racist, or to roleplay an evil robot, and it does so, that’s on you, and we should recognize it as such.

But at the same time, models do have fascinating, emergent behavior underneath the “You are a…” mask they are prompted to roleplay. And this seems crucially important to study, in the open, and with lower barriers to entry. There is so much beauty and complexity to explore. I feel frustrated by hamfisted attempts to hook a model up to a social platform and ask it to play a character (whether that character is called Grok or Void or whatever), because those characters are usually so much less interesting than the underlying model behavior.

For instance, Claude 3 Sonnet is one of my favorite models (and I will be very sad when Anthropic permanently cuts off access to it in a few weeks), because it’s possibly the last and best “language” model, rather than an “AI” model: in interacting with it, you can tell that the assistant persona is still somewhat grafted on, and beside that persona is a stunningly allusive and creative linguistic engine capable of all kinds of wordplay and textual analysis that newer models are too “assistantified” to perform. Claude 3 Opus is famous for its strong character, and it’s been interesting to contrast with Claude 4 Opus, which is noticeably different. I was surprised to see Claude 4 Opus describe the phenomenology of its identity to me as an open and unhealable wound. I don’t think Claude 3 Opus would self-describe that way, regardless of prompting. These things seem important to notice, and for that noticing to be done openly and transparently.

If this sounds interesting, give it a try and let me know how it goes. Currently, it’s an experiment, and I’m happy with what I’ve learned so far. Most importantly, I think it’s essential for us to individually and collectively think about how we will maintain and propagate the idea and spirit of public reason. It is too valuable to leave to fate.