A Keyword Only Introduction to Context Engineering.
Still iterating on this idea. Distillation creates clarity. Writing and editing in public.
Context Engineering: A practical LLM alignment framework for using LLMs in products. Designed to reduce hallucinations, reduce API costs via token count reduction and make LLMs overall more reliable. The three main areas of consideration in context engineering are context, hallucinations and LLM outputs.
The Context Constraint Problem: There will always be a limit on the number of tokens you can pass in one context window.
Intent to Input Problem: The gap between a user's internal intent behind a LLM query and their actual input.
Desired Output vs Actual Output Problem: The gap between the output generated using the current context window and the output the user wants but does not know how to describe.
Context Window: the max tokens in immediate LLM API call
Minimum Necessary Context: the smallest number of tokens required to enable the LLM to generate a suitable response to a prompt.
Model Settings: The Variable Settings for OpenAI's Chat LLMs:
Temperature: Controls randomness. Lowering results in less random completions. As the temperature approaches zero, the model will become deterministic and repetitive.
Max length: The max number of tokens to generate.
Top P: Controls diversity via nucleus sampling. .5 means half of likelihood-weighted options are considered.
Frequency Penalty: How much to penalize new tokens based on their existing frequency in the text so far. Decreases the likelihood to repeat the same line verbatim.
Presence Penalty: How much to penalize new tokens based on whether they appear in text so far. Increase the model's likelihood to talk about new topics.
Outputs:
LLM Output: Any information generated by an LLM based on a context window prompt.
Potential Output(s): The range of possible outputs given the foundational model, the context window and the configuration of the model settings (temp, max length, top p, frequency penalty and presence penalty)
Suitable Output: A LLM output that is:
relevant and free from weak falsifiability hallucinations
addresses the user' intent
matches the user's desired detail level
Desired Output: The output a user wants. The user is asking a question or otherwise using the LLM for knowledge work, the desired output can only confirmed as the desired output when seen by the user.
The Output Epistemology Problem: A LLM user can only know the desired output when they see it.
It is not possible to known information before knowing it. Hence why we ask questions. When a user prompts a LLM, they do not actually know what they need. They just know they are asking questions to get novel information, and when they see new information, they can then know if that is the information they were looking for
Actual Output: The output of the LLM. Can sometimes the desired output, a directionally correct output or a noisy output.
Noisy Output: An output that contains a hallucination or is otherwise unhelpful, not relevant or does not match the user's desired detail level.
Directionally Correct Output: An output that was closer to the user's desired output than the previous output.
Output Vector: The line created by iterative LLM outputs from the same immediate context window (we can think of iterative chatmodel prompts).
Context:
Context Sensitivity: A measure of how the addition or lack of novel background information impacts the quality of an LLM output. The inputs consist of the base model being used and the scenario being solved for.
Model : Scenario
The model to represent content sensitivity:
(Model : Scenario) * Context() → Output
Low Context Sensitivity: A Potential Output who's quality will not be meaningfully impacted by the inclusion of novel context in a context window. An example of a low context sensitivity:
Ask GPT-4 "what is 2+2?"
(GPT-4 : Arithmetic[what is 2+2?]) * Context(None)
Will get you the same you the same as:
(GPT-4 : Arithmetic[what is 2+2?]) * Context(“Here is how you add numbers together: 1+1 = 2”)
With less tokens.
Low Context Sensitivity Prompt: A prompt that is:
Bounded: has a clearly defined problem, desired outcome and makes few assumptions and no layered assumptions.
An example would be:
"Write a poem in the style of Shakespeare."
High Context Sensitivity: A Potential LLM output who's quality will be meaningfully changed via the inclusion of novel context in the context window. An example of a high context sensitivity:
Ask GPT-4 to write a "write me a chatbot script in Python using Langchain."
(GPT-4 : Code[Write a chatbot script in Py using Langchain]) * Context(None)
This is going to get you a worthless output. GPT-4 will object saying it doesn’t know what that package is - because it doesn’t have the context!
But if you told it what Langchain is, and gave it some code snippets and docs in the context, the quality of the output would increase dramatically.
(GPT-4 : Code[Write a chatbot script in Py using Langchain]) * Context(Langchain is…here are code snippet examples…) = a somewhat useful answer
High Context Sensitivity Prompt: A prompt that is:
Unbounded: There is no well defined stopping point to find a desired outcome and the prompt forces the LLM to make assumptions to create a desired outcome
An example would be:
"How can I make $100k a month?"
Hallucinations:
LLM Hallucination: outputs that do not align with reality or verifiable facts. When the model "makes shit up."
Falsifiability: the capacity for some proposition, statement, theory or hypothesis to be proven wrong
Strong Falsifiability: When a statement can practically and easily disproven. An example of a strong falsifiability statement would be:
"2+2 = 5"
Strong Falsifiability Hallucination: A LLM output that is :
easy to disprove and is bounded.
based on a bounded problem statement and makes few assumptions
An example would be:
"Abraham Lincoln was Japanese"
Weak Falsifiability: When a statement is hard, impractical or logically impossible to disprove. An example of a weak falsifiability statement would be:
"You and I are thinking the exact same thing right now."
Weak Falsifiability Hallucination: A LLM Output that is :
hard, impractical or logically impossible to disprove
based on a unbounded problem statement and makes several and derived from several, sometimes stacked assumptions
An example of a Weak Falsifiability Hallucination
"To make 100k a month you should quit your job, start a cult and get your cult members to give 100k a month"
Context Strategies:
Context Block: a specifically designed piece of text created to contain the minimum necessary context on a specific topic, such as API, package, event or other topic a LLM may lack knowledge on.
Window Context Stuffing: Toploading a context window with lots of context blocks and other background information before presenting immediate scenario to solve. The opposite of Iterative Context Insertion.
Iterative Context Insertion: Opposite of Window Context Stuffing, presenting scenario at the top of the window and presenting novel context as model hallucinates as a corrective measure.
Context Swapping: Advanced output quality boosting technique where specific context is insert into the window, a response generated and then the previously inserted context removed to save token count. A subset of this is end-of-window context swapping.
End-of-Window Context Swapping: Generally when you have reached the end of the window - you are working with a context rich window, but the problem is that you unable to generate any further responses due to token limits. This technique is simply removing the last response in a window, generating a response and then storing the response else and reprompting the model with by deleting the response and inserting a new message. This allows you to effectively extend the context window indefinitely, at the cost of no further iteration.
System Architecture Considerations:
Context-on-the-fly: Automated methods to retrieve novel context as needed. Examples of this are retrieval tools that can query vector dbs, make searches other otherwise interact with their environment to get outside context and inject it into the immediate context window.