Instruments that help people leverage AI
Context engineering. Retrieval workflows. Reproducible benchmarking. Spec management for the humans and agents doing the work.
One sentence of tool selection guidance eliminated a 13-point accuracy penalty from over-tooling.
O4Pre-rendered model views scored 0.893 vs 0.558 for agent-assembled context. d=1.01, N=10. 4x cheaper.
Exploratory study, single corpus, N=3-10 replications. Full methodology and threats →
What we build
Tracks work through orient-plan-agree-execute-reflect-report. Task DAGs, propagation, stakeholder dispositions. Rust.
Syncs reusable AI agent instructions from git-based registries. Bidirectional, multi-registry, drift detection. Rust.
Manages developer toolchains from git registries. Checksum and signature verification. Generates mise config. Rust.
Tmux sessions organized into verticals and remotes. Save, restore, server isolation. One keybind to switch. Rust.
Structural retrieval, graph traversal, and completeness checking for SysML v2 models. Rust. 14 commands, 10 MCP tools.
Reproducible evaluation of tool-augmented LLMs on structured engineering tasks. Python.
Four primitives for LLM-correct codebases. Derived obligations, prescriptive failure, bundled enforcement.
Converts OMG KeBNF specs to ANTLR4 and tree-sitter. Bridges OMG specifications and working parsers. Rust.
Tree-sitter grammar for SysML v2. 6 language bindings. The parsing foundation for the MBSE toolchain.
How this started
The lab started with a narrow question: how does AI interact with structured engineering artifacts? We built tools, ran benchmarks, wrote papers, and the same shape kept showing up across domains. Along the way we found alignment with GitLab's Knowledge Graph team, who are solving related context-engineering and retrieval problems at production scale on the SDLC side. We've been contributing findings on prescriptive failure patterns and tool description effectiveness into their eval methodology.
Everything is MIT-licensed and on GitLab. If you have a domain with structured artifacts and want to know where AI leverage actually lives inside it, we'd like to talk.