Pick a client and Detto pulls their briefing from your vault — key contacts, current priorities, and upcoming dates — and lays it in front of you before the call starts. While you talk, it records both sides, separates the speakers, and writes that same context straight into the transcript header. The output file is renamed to include the client, so every capture lands in your vault already labeled, linked, and easy to find later. Capture without context is just noise — Detto makes sure every recording knows who it's about and why it matters.
Every capture is a plain .md file with YAML frontmatter — type, date, duration, attendees, tags. Drop it in Obsidian, open it in VS Code, or pipe it to an agent. No proprietary format, no export step, no lock-in. Your vault stays organized without manual effort.
Parakeet-TDT v3 runs ASR inference entirely on your Mac's Neural Engine. No audio is sent anywhere — capture happens in memory.
Llama 3.2 3B (4-bit) cleans up filler words, punctuation, and grammar locally via MLX. Two-phase: a transcript appears instantly, then refines in the background.
After a one-time ~3 GB model download, everything runs offline. The only other network call is an optional update check to a static file on GitHub.