AI-Driven Text Adventure Game

Status: In Development
Language: Rust

My obsession of the moment is creating a text-based adventure game driven by a large language model (LLM). The game is being written in Rust, backed by KoboldCPP and the Mistral 7b Instruct model. Documentation for getting the project to run is in the linked repository.

This page will document my journey and things I've learned throughout this project.

Development Journal (Newest First)

March 2024: Removal of Box<dyn> from GbnfLimit

After writing the previous dev journal about my attempts to remove the dynamic trait objects from GBNF limit creation, I have finally succeeded in figuring out the right set of trait bounds and associated types required to make it work properly. Initial implementation of removing the dynamic trait object was very easy, but I quickly ran into an issue with how "primitives" (i.e. single values like a number or string) vs "complex" (nested types with multiple fields) are handled.

This required creating two new types:

GbnfLimitedPritive
GbnfLimitedComplex

These two wrapper structs have a bunch of fancy trait bounds and associated types on them that allow instances to be created that hold the proper limit rule for the given field which is limited. That might be hard to understand, so here is a simplified example, directly from the game code.


// (Simplified) Struct definitions

pub struct RawCommandExecution {
    // other fields removed.

    #[gbnf_limit_complex]
    pub event: Option,
}


pub struct RawCommandEvent {
    pub event_name: String,

    #[gbnf_limit_primitive]
    pub applies_to: String,

    #[gbnf_limit_primitive]
    pub parameter: String,
}

// Limit creation
let applies_to = vec!["self", "uuid1", "uuid2", "uuid3"];
let all_uuids = vec!["uuid1", "uuid2", "uuid3"];
let event_limit = RawCommandEventGbnfLimit {
    applies_to: GbnfLimitedPrimitive::new(applies_to),
    parameter: GbnfLimitedPrimitive::new(all_uuids),
};

let limit = RawCommandExecutionGbnfLimit {
    event: GbnfLimitedComplex::new(event_limit),
};

In this code, the event itself has two limited fields:

applies_to: The UUID of the thing in the scene that the event originates from.
parameter: The UUID of the thing in the scene that is affected by this event.

These are single String values. GbnfLimitedPrimitive takes a Vec of allowed values, without any heap allocation or dynamic dispatch. In contrast, the main struct has an Option field that can contain a single event. The generated GBNF limit struct mirrors the creation of the regular struct, and takes only one instance of GbnfLimitedComplex, again with no heap allocation or dynamic dispatch.

This also makes the code much easier to read, as it no longer requires a bunch of janky Box::new or into() invocations.

March 2024: Attempts at no-dyn GbnfLimit

The AI game can now limit output using the gbnf_limit feature, but it requires dynamic trait objects for this. Rather than generating a so-called "limit struct" with proper concrete types, the code relies on using dynamic typing of anything that can produce a GbnfLimit. This makes the code easier to understand, but creating limit structs does is not ergonomic:

Lots of Box::new.
Performance implications of dynamic dispatch.

I am trying to fix this on a separate branch that is not yet uploaded to the Git repository, because it's a giant mess. I have made some progress, but I'm running into the limitations of Rust's (very powerful) generics system. Namely, blanket traits are not so specific: an impl for Option<T> also counts as an impl for Option<Vec<T>>. This can be solved by something called "trait specialization," but that's an unstable nightly-only feature and has its own set of issues.

I have almost worked out a way to make the concrete types work. But much like the initial implementation of the GBNF grammar generator, I've sort of hit a roadblock due to trying to remove dynamic dispatch.

I've been spending my time creating gemfreely instead.

I hope to return to the AI game soon and get the dynamic dispatch fully removed from GBNF Limit code, so development on at least one interactive command can resume!

March 2024: gbnf_limit and Constrained LLM Output

As of early March 2024, I have finally finished the implementation of the GBNF limiting feature, required for better coherence in the LLM, and essentially a blocker for any further development of useful/fun things in the game. The game is now capable of limiting individual fields of the LLM's JSON response to specific values, which is extremely useful when we want the LLM to pick from a list of IDs (of people, items, exits, etc in a scene). This is the basis of all command processing. Combined with the existing coherence code, I think the game will be able to advance at a much faster pace now.

The rest of this month will focus on:

Integrating the new feature into the codebase.
Squashing any further bugs with the limiting (e.g. at time of writing on March 5th, deserializing of LLM responses is broken due to mismatch between serde field name and the generated response).

Some tweaking of the event responses themselves might also be necessary. Right now, there are two string parameters: `applies_to` and `parameter`. The `parameter` field is often used for an ID (e.g. what exit to pick), but it can also be the amount of damage taken, or something else. This can confuse the LLM, so it might be best to either remove one of these fields, or rearchitect the event responses so that they are less confusing.

I think a one-of function might be the next big issue to deal with in the GBNF derive macro. Enabling conversion of Rust enums to GBNF rules would give more flexibility to the LLM to generate proper event responses. That way we could have strongly-typed events in JSON, where semantic meaning is clear. Combined with value limiting, we could have very expressive GBNF rules.

February 2024: derive(Gbnf) and More Coherence!

I am currently focusing on creating a derive macro to automatically generate GBNF grammar rules from Rust types. This has two main benefits:

No more brittle maintenance of hardcoded GBNF rules by hand.
The ability to create hyper-specific limits to further constrain LLM output.

The necessity of limiting LLM output was the main driving factor behind the creation of this derive macro. By forcing the LLM to, for example, output only specific UUIDs or database IDs in response to a prompt (e.g. "Select the exit the player should take"), the accuracy of its responses should be much, much higher. Without these dynamic GBNF rules, the LLM can still sometimes pick an ID that does not exist, or fill the response with a nonsense value.

I will likely spin this GBNF derive macro out into a separate crate for use by the wider Rust community.

January 2024: Persistent World and Coherence

In the first month of 2024, I put a massive focus into creating a procedurally-generated persistent game world that the player can navigate. The other main focus was the coherence of the LLM, and reorganizing the code to make it worthy of presenting to the world (and making feature implementation easier, of course).

This month saw the addition of numerous coherence checks and systems around the output of the LLM, as well as proper implementation of the ability to "continue" prompts with the LLM, if it needed to generate more data than could be delivered in one reply.

December 2023: Beginnings of Something Playable

By December, the simple comand parser had turned into a proper "thing with a game loop." It was still very raw and basic, but a long vacation over Christmas 2023 allowed me to implement almost all of the core concepts needed to get the game working. This is when I discovered GBNF grammars to constrain the LLM's output, as well as dealing with a number of challenges related to how the LLM actually outputs data:

Coherence: By switching to Mistral 7b Instruct, the LLM is actually very good at outputting the right data ... usually.
Creating Non-Existing Things: The LLM could still create strange information or add things to the game world that should not be there.
GBNF challenges: Despite constraining the LLM output with a GBNF, it could still output nonsense within those constraints.

November 2023: LLM Command "Parsing"

The project began by writing a simple Rust application that calls KoboldCPP's API to "parse" commands given as if the user was playing a text-based adventure game. The results were coherent, but not regular enough to be of any use to a machine attempting to decode what the LLM means. This was the beginning of my research into how LLMs work and the challenges associated with prompt engineering, coherence, and sanitizing their output.

‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗

⤴️ [/projects] 🏠 Home