Software Project 2024 AI enablement

AI Integration Projects

A collection of experiments with GPT models, Stable Diffusion, and Whisper that balance ambition, practicality, and student budgets.

Large language models and generative AI captured my attention the moment GPT-3 became public. The first prototypes I built were rough and often failed, but each attempt revealed where AI could genuinely improve student workflows. Over time those experiments converged into three strands: learning assistants, creative tooling, and automated note taking.

Key outcome: Delivered working prototypes for summarising lectures, generating study aids, and producing consistent synthetic characters while mapping the limits of current AI APIs.

Learning Assistant: LeanGPT

LeanGPT started as a tool to transform lecture recordings and course material into flashcards and summaries. Built with Python and Flet, it provided a simple desktop interface where students could paste text and receive structured outputs. Prompt engineering was an iterative process—small changes in wording often produced large swings in quality—but the assistant proved the concept that AI can prepare revision material in minutes.

Automated Note Taking

Because I prefer to focus entirely on lectures, I assembled a pipeline to take notes for me. The workflow:

The system was fragile—Power Automate occasionally clicked the wrong UI element—but it demonstrated how off-the-shelf tools can be orchestrated into an end-to-end solution even without full API access.

Practical notes from that phase: I relied on Whisper V2 for fairly accurate transcriptions, but the fragile glue was the UI automation. Over time I learned that replacing screen-driven automation with API-accessible steps dramatically increases reliability. By 2024 I reworked parts of the pipeline to use Gemini where possible and reduced the number of brittle pieces to a single, more robust flow.

Stable Diffusion: Consistent Digital Characters

In parallel, I explored Stable Diffusion for visual storytelling. Around 2023 I focused on creating highly convincing, photorealistic humans — what I called "fake humans" — without fine-tuning. It required many iterations, prompt engineering, careful seeding and blending techniques (LoRAs and prompt chaining at the time). The reward was a surprisingly consistent character set that looked realistic and coherent across many images — a much harder problem then than it is today.

That work taught me how to control latent-space artifacts, the value of seed and prompt templates for reproducibility, and how to combine generative images with creative projects such as book covers or character illustrations.

What These Projects Revealed

Next Steps

I rebuilt LeanGPT as a web application using Next.js with Gemini Pro as the language model. The migration was driven by Gemini's large context window and improved token handling which simplified multi-stage pipelines (transcribe → summarise → flashcards) into fewer, more reliable prompts. Gemini also reduced the amount of brittle glue-code I needed to orchestrate multiple models.

As an aside, many of my early AI experiments date back to 2020–2022 when tools were less accessible. The landscape changed rapidly — what needed multi-step hacks in 2022 can often be handled with a single model prompt today. The overall lesson: pick the right abstraction layer (API-driven workflows where possible) and treat prompt design like product UX — it defines behaviour more than the surrounding code in many LLM projects.

Back to article summaries