Results
What we've measured so far. All data is open and all experiments are designed for independent replication.
AI on SysML v2 Model Comprehension
Representation dominates retrieval.
Pre-rendered views scored 0.893 vs 0.558 for agent-assembled context (d=1.01). Tool guidance eliminated a 13-point penalty. Vector search and graph traversal produced null results.
GitLab Knowledge Graph for SDLC Queries
GKG improves baseline accuracy by 77% (+21pp).
Sonnet 4, n=20, 31 test fixtures. Multi-hop queries show the largest effect. Worked examples in tool descriptions are essential: 0% accuracy without them. Runs against DuckDB simulation, not production GKG.
Feedback Signal Properties for LLM Code Repair
Precision matters more than brevity.
Naming what failed and what was expected keeps accuracy up. Brevity saves 47% tokens for the same accuracy. Current-gen models ceiling on these tasks. Needs more runs.