The goal was to free up attention during lectures while still leaving with ready-to-study material. I combined local scripts, cloud storage, and large language models to handle capture, transcription, summarisation, and spaced-repetition cards without manual intervention.
Baseline Workflow
- Record audio on my phone and drop the file into a monitored local folder.
 - A watcher script triggers Whisper V2, producing transcripts and timestamps.
 - The transcript syncs to OneDrive where a Power Automate flow dispatched content to Claude 2 along with prompt templates.
 - Claude returned lecture summaries plus flashcards that were saved as markdown for Obsidian/RemNote.
 
Stability Challenges
Power Automate originally drove the Claude integration through UI scripting. Any UI tweak, latency spike, or mis-click could stall the run. Missing transcripts or truncated outputs made it clear that deterministic automation needs APIs, retries, and logging.
Migration to Gemini
In 2024 I rerouted the pipeline to Gemini’s larger context window. That allowed a single prompt to handle transcription cleanup, summarisation, and Q&A generation. The move also reduced the number of services involved and replaced UI scripting with authenticated API calls.
What Stuck
- Observability: logging each step—recording, transcription, prompt, response—made debugging trivial.
 - Prompt discipline: templates versioned alongside code yield predictable outputs even as models evolve.
 - Fallbacks: when LLM latency spikes, the pipeline stores partial outputs and retries rather than failing silently.
 
The system now takes raw lecture audio to clean notes with minimal intervention. The same architecture is reusable for meeting capture or research interviews, and it keeps evolving as new high-context models appear.
← Back to article summaries