All posts
originduckdb-wasmlog-analysisprivacywasm

How DuckViz started — debugging logs without the infra

From a single-file log parser to a privacy-first AI data platform. The story of how DuckViz came together, one constraint at a time.

Vikas Awaghade6 min read

I've been chasing log files for fifteen years. Frontend work, security tooling, SIEM workflows, pipeline triage — somewhere in the middle of it I noticed the same scene repeating: you have a 200MB log on your laptop, a customer waiting for an answer, and the only "real" path forward is to spin up Logstash, ship the file somewhere, build a pipeline, fight grok patterns, and finally — finally — open Kibana to look at the data.

DuckViz started as a refusal to keep doing that.

The first cut: parse it locally, query it locally

The first version wasn't a product. It was a question: how much of the log-analysis stack actually has to be a stack?

A grep gives you lines. A pipeline gives you fields. Between those is a parser — and parsers don't need infra. They need a regex, a schema, and somewhere fast to run. So I started there.

I wrote a Rust log parser, compiled it to WebAssembly, and ran it in a Web Worker on the same page where the file was opened. No upload. No backend. The file's bytes never left the laptop. Fields came out as JSON rows on the other end, ready to be queried — by someone, somewhere.

That "someone" was the next problem.

DuckDB-WASM: the missing query layer

Parsed JSON without a query engine is a half-finished tool. You can .filter() arrays in JavaScript, but the moment your log has half a million rows you're back to loading data into a "real" database — which is exactly the round trip I was trying to delete.

DuckDB-WASM was the second answer. A columnar engine that runs in the browser and supports the SQL you'd actually write in production. The same WHERE, the same GROUP BY, the same window functions — except the data is sitting in a Web Worker on the same page as the parser. Round trip to a remote engine: zero milliseconds, because there is no remote engine.

Stitching those two together was the moment DuckViz stopped being a parser and started being a tool. Open a file. Get a queryable table. Write SQL. See results. No login required for the first thirty seconds.

After the query, the question

Engineers don't open logs to admire them. They open logs because something happened — an outage, a customer report, a security alert — and someone is going to ask "what changed and why?" within the hour. The query gives you the rows. The post-mortem still has to be written.

So we built that next.

The report builder takes a queried dataset and an LLM and produces a structured RCA: incident summary, observed timeline, supporting charts, hypothesis, recommendation. Not a chatbot reply — a document, in the shape your team already writes them in. The crucial constraint we held onto: the LLM never sees raw rows. It sees schemas, samples, and aggregations — the things a good analyst would summarize before showing them to anyone outside the room. Row values stay on the device. That rule shaped every API we shipped after it.

Reports turned out to be a wedge. The same generation engine that wrote an RCA could write an executive summary, a weekly metrics digest, a board-deck draft. The difference between those outputs is template, not pipeline.

The expansion: dashboards, decks, embedding

Once a single document could be generated from a dataset, the natural question was: what else? A dashboard is just a report whose sections are charts. A slide deck is a report with stricter geometry. An embeddable widget is a chart that happens to live inside someone else's app.

So we kept pulling threads.

  • Dashboards got a draggable grid with 80+ D3 chart types and an AI flow that picks visualizations from the schema before generating SQL.
  • Decks got a slide presenter that exports to PPTX, so you can hand a PM the file they were going to ask for anyway.
  • Embedding got the React packages — @duckviz/dashboard, @duckviz/explorer, @duckviz/report — so the same in-browser pipeline can ship inside another product without us hosting anything for them.
  • The CLI closed the loop for engineers who don't live in a browser tab: npx duckviz ./file.log opens the app with the file already loaded.

None of that was on a roadmap. Each piece was the obvious next thing once the piece before it shipped.

Real cases that shaped what stayed in

A few moments mattered more than they should have.

A security engineer pasted in 80MB of Sysmon XML, expected it to choke, and watched the WASM parser chew through it in seconds. That's why we kept the per-file log table model and why XML attribute parsing handles single-quote attributes — Sysmon writes them that way, and most parsers fall over.

A founder generated a board update from a CSV the night before a meeting and exported the deck to PPTX. That's why programmatic PDF and PPTX export is in the package, not behind a server.

An analyst asked, "can I just embed this in our internal tool?" That's why the SDK exists and why the dashboard ships as a React component, not an iframe.

Every one of those interactions tightened a constraint we'd already chosen — nothing leaves the device, the same engine works hosted or embedded, the AI sees structure not values — and pushed the surface area outward.

Why the constraint matters

The privacy posture isn't a marketing line. It's the load-bearing wall. The moment you accept that raw rows can be uploaded "just for analysis," you've shipped a different product: one that needs SOC2 reviews, data-processing addenda, regional storage, and a different conversation with every prospect's security team. We kept the wall up because it makes everything else easier to ship and easier to trust.

DuckViz today is what you get when you keep that wall up and let the rest of the stack reorganize around it: a privacy-first AI data platform where the database, the query, the chart, and the document all run on the same machine that holds the file.

Where we go from here

The pieces that exist now — Explorer, Dashboard, Report, Deck, SDK, CLI — are the kit. The next stretch is depth: more chart types, sharper RCAs, deeper SDK integrations, more file formats handled out of the box.

But the first principle hasn't changed since the very first parser commit: the file shouldn't have to leave your machine to be useful. Everything else is a consequence.

If you've ever waited for a pipeline to be set up so you could answer one question, this is the product I built for you. Open a file at app.duckviz.com and see how far it gets you in five minutes.

— Vikas