Why Sleev
What is Sleev?
It is a harness-agnostic API gateway that optimizes incoming LLM requests and exposes a purpose-built toolset for agents of all kinds.
The goal is simple: keep the useful signal, cut the noise, and spend fewer tokens getting the same or better work done.
Why does Sleev exist?
Agent sessions get messy fast.
A good session can include decisions, files, errors, plans, tool results, dead ends, and old context that no longer matters. Most agents are bad at separating the useful parts from the pile, so they keep dragging history forward because throwing it away is risky.
That means the model sees more tokens than it needs. The request gets longer, more expensive, and sometimes worse, because the important bits are buried under noise.
What does Sleev change?
Sleev gives the session a context layer that can compress history instead of blindly replaying it.
It keeps the pieces that matter for the model’s next step and collapses the rest into a smaller form. Each request becomes denser: more relevant information per token, less stale baggage, and more room for new context.
How does that save money?
Inference billing follows token usage. If the model has less useless context to process, your bill drops.
Depending on the workload, Sleev can reduce token usage by 30-80%.
What about provider caching?
Provider caching still matters. Compression should not mean giving up provider-side prompt caching.
A lot of the engineering work in Sleev is about making sessions cheaper without casually destroying the cache behavior providers already offer.
The surprising part is that this can go better than expected. In our testing, especially with OpenAI, we have seen higher cache hit rates with Sleev than without it.
That does not mean every workload behaves the same way. It does mean compression and provider caching are not enemies. Done carefully, Sleev can reduce total token use while still playing well with the cache economics of the underlying model API.
Does smaller context make agents smarter?
Smaller context is not automatically better. Dumb compression just loses information.
Sleev is built around keeping recall useful. The point is not to make every request tiny. The point is to make the session easier for the model to reason about: fewer distractions, cleaner history, and less repeated junk competing with the facts that matter.
High signal. Low noise. Cheaper sessions.
What is Sleev not?
Sleev is not a model provider, IDE, or agent harness. It does not ask for your provider API keys. It does not replace your tools. It sits in the API path and makes the context going through that path more efficient.