News Details

🤖 AI Snack 🍿 : Guided LLM Generation: Shaping Language Models to Your Needs

Guided generation optimizes large language models for structured tasks like information extraction, ensuring reliable and non-random outputs.

Image created bu Bing Generator

Do you ever wonder what other uses cases exist for LLMs beyond chat interfaces? Well, there are a lot! From information extraction to agents that can take actions, code generation, classification tasks and more. And what do all these have in common? That you always need structured outputs...

We need to integrate these LLMs with existing systems and for doing so we need them to be reliable and not random. Guided generation is a technique that solves this problem and you can learn how here.

What is Guided LLM Generation?

Language models have revolutionized the AI landscape in recent years. These powerful models, often referred to as Large Language Models (LLMs), have demonstrated astonishing abilities in understanding and generating coherent and useful text. We have seen them a lot in chat interfaces like ChatGPT or Google Bard, however, chat is just one use case for these models and the possibilities are much larger. One way to explore other uses cases are through Guided Generation.

Guided generation involves constraining an LLM to produce outputs matching predefined formats or schemas. This steering can extract entities, produce particular data formats like JSON, or generate code.

One very useful use case is information extraction, imagine that you want to extract entities from a medical report, like diagnosis, symptoms, allergies, and you want it in a JSON schema. How do you do it? Do you ask the LLM kindly to output JSON and pray to the AI gods that the brackets and entities are in the right place? Or we can force it to generate this structured output by selecting characters from a set of defined expressions and ignoring the rest. Let’s see an example for the medical information extraction.

1. We define the schema that we want to enforce.

    
        class Diagnosis(BaseModel):
            condition: str
            icd_code: str

        class MedicalReport(BaseModel):
            patient_id: int
            date: datetime
            diagnoses: List[Diagnosis]
    

2. We pass the medical report to our magical guided generation tool and receive output matching the defined schema.

    
{
    "patient_id": 123456,
    "date": "2023-11-21T00:00:00",
    "diagnoses": [
        {
            "condition": "Hypertension",
            "icd_code": "I10"
        },
        {
            "condition": "Diabetes Type 2",
            "icd_code": "E11"
        }
    ]
}
    

The generator is constrained to produce only certain characters/patterns at each position.

  1. When generating the patient_id field, the system only considers digit character probabilities, ignoring other characters.
  2. After the patient ID, a comma is manually inserted into the output.
  3. The system then adds "date":  to prepend the date field.
  4. Numbers are incrementally generated until a 4-digit year is reached.
  5. A timestamp is completed by inserting hyphens and colons at predefined points.
  6. When outputting the ICD code we can check that the generated code exists and is not an hallucination.

This proceeds sequentially, constrained at each step to match the target schema. Only certain characters or patterns are valid choices to add to the output. In effect, guided generation operates like a finite state machine navigating between predefined states. The states and allowed transitions between them are defined by the schema specification, which acts like a regular expression.

In ChatFAQ we use this technique everywhere! We are building a system, not just a chatbot and it is required that the LLMs are compatible with the rest of the components, to have a seamlessly integrated, efficient, and user-friendly experience that meets the high standards of modern AI applications.