Bridging the Transparency Gap

How streamll Brings Observability to DSPy Applications

14 September 2024

In my work on AI systems in government, I kept running into the same problem: our applications could perform complex reasoning, but we had no visibility into how they arrived at their conclusions. Users would get an answer from the AI but no insight into the thought process behind it. For our internal assurance teams, this was a showstopper. They needed to understand the AI's decision-making, not just the final output.

We were using DSPy, which is great for building sophisticated AI pipelines with retrieval, reasoning, and generation steps. But DSPy modules are opaque by default. When a module is busy retrieving documents or working through a chain of thought, all that valuable context is hidden from the rest of the system.

I realised this was a fundamental architectural gap. In regulated environments like government, you can't have a monolithic AI system that processes sensitive data and interacts with users. You need clear boundaries between components, with the AI running in a secure, isolated environment. But users still need to understand what the AI is doing!

That's why I built streamll. It bridges this gap between DSPy and the rest of our infrastructure. With streamll, you can surface the AI's intermediate reasoning steps and progress updates to your existing systems like Redis and RabbitMQ in real-time.

How it works

@streamll.instrument(stream_fields=["analysis"])
class DocumentAnalysis(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=10)
        self.analyse = dspy.ChainOfThought("documents, query -> analysis")

    def forward(self, query):
        with streamll.trace("document_retrieval") as ctx:
            docs = self.retrieve(query)
            ctx.emit("records_found", data={
                "count": len(docs),
                "relevance_scores": [d.score for d in docs[:3]]
            })
        # This streams tokens as they're generated
        return self.analyse(documents=docs, query=query)

In this example, the DSPy module is running in a secured environment, but it's using streamll to emit events about what it's doing, including real-time token-by-token updates as it generates the final analysis. The web application, which might be running in a completely different place, can receive these events over a message queue or WebSocket. So instead of just seeing a loading spinner, the user gets meaningful updates like "AI found 12 relevant documents" followed by the analysis output appearing word-by-word.

Under the hood

streamll handles several complex challenges:

It's deeply integrated with DSPy, so it can tap into both manual tracing (the streamll.trace context manager) and automatic callbacks from DSPy itself (lm.start, module.forward.end, etc.).
It uses a multi-layered context propagation system to correlate events, even across complex nested operations. This is how it keeps track of which events belong to which high-level operation.
It can emit events directly to production-grade systems like Redis streams and RabbitMQ queues—the same infrastructure you're already using for inter-service communication.
It handles the async/sync impedance mismatch, so you can use async event sinks even in synchronous DSPy code.

The hardest part was actually building this in a production-hardened way. Integrating with DSPy's evolving execution model, handling streaming edge cases, and bridging between async and sync code—it ended up being over 2000 lines of complex infrastructure code. But the end result is an @streamll.instrument decorator that "just works" and lets you focus on your actual application logic.

Why this matters

I'm open-sourcing streamll because I think this is a common problem. As AI systems get more sophisticated, we need better tools for making them observable and explainable. Regulators are (rightly) demanding more transparency, and users deserve to know what's happening behind the scenes.

streamll is my attempt to make this kind of observability accessible to everyone building AI applications with DSPy. It's not a perfect solution, but I hope it's a step in the right direction. And I'm excited to see what the community does with it!