Large language models and generative AI captured my attention the moment GPT-3 became public. The first prototypes I built were rough and often failed, but each attempt revealed where AI could genuinely improve student workflows. Over time those experiments converged into three strands: learning assistants, creative tooling, and automated note taking.
Learning Assistant: LeanGPT
LeanGPT started as a tool to transform lecture recordings and course material into flashcards and summaries. Built with Python and Flet, it provided a simple desktop interface where students could paste text and receive structured outputs. Prompt engineering was an iterative process—small changes in wording often produced large swings in quality—but the assistant proved the concept that AI can prepare revision material in minutes.
Automated Note Taking
Because I prefer to focus entirely on lectures, I assembled a pipeline to take notes for me. The workflow:
- Record lecture audio and drop the file into a monitored folder.
 - Use Whisper V2 to transcribe the recording locally.
 - Store the transcript in OneDrive, where a Power Automate script pushes the content into a Claude 2 chat for summarisation.
 - Return the cleaned notes to an Obsidian vault for further study.
 
The system was fragile—Power Automate occasionally clicked the wrong UI element—but it demonstrated how off-the-shelf tools can be orchestrated into an end-to-end solution even without full API access.
Practical notes from that phase: I relied on Whisper V2 for fairly accurate transcriptions, but the fragile glue was the UI automation. Over time I learned that replacing screen-driven automation with API-accessible steps dramatically increases reliability. By 2024 I reworked parts of the pipeline to use Gemini where possible and reduced the number of brittle pieces to a single, more robust flow.
Stable Diffusion: Consistent Digital Characters
In parallel, I explored Stable Diffusion for visual storytelling. Around 2023 I focused on creating highly convincing, photorealistic humans — what I called "fake humans" — without fine-tuning. It required many iterations, prompt engineering, careful seeding and blending techniques (LoRAs and prompt chaining at the time). The reward was a surprisingly consistent character set that looked realistic and coherent across many images — a much harder problem then than it is today.
That work taught me how to control latent-space artifacts, the value of seed and prompt templates for reproducibility, and how to combine generative images with creative projects such as book covers or character illustrations.
What These Projects Revealed
- API limitations (rate limits, model capacity, cost) often dictate architecture as much as functionality.
 - Prompt design is closer to UX research than coding—observe behaviour, iterate, and document successful patterns.
 - Partial automation still adds value; human review remains essential when accuracy matters.
 
Next Steps
I rebuilt LeanGPT as a web application using Next.js with Gemini Pro as the language model. The migration was driven by Gemini's large context window and improved token handling which simplified multi-stage pipelines (transcribe → summarise → flashcards) into fewer, more reliable prompts. Gemini also reduced the amount of brittle glue-code I needed to orchestrate multiple models.
As an aside, many of my early AI experiments date back to 2020–2022 when tools were less accessible. The landscape changed rapidly — what needed multi-step hacks in 2022 can often be handled with a single model prompt today. The overall lesson: pick the right abstraction layer (API-driven workflows where possible) and treat prompt design like product UX — it defines behaviour more than the surrounding code in many LLM projects.
← Back to article summaries