Customising LLMs: a case of cognitive seduction?

Over a year ago now, in an article for the AI in Education course, Trevor Baxter and I explored what was then still an emerging shift in the way generative AI was being used in HE: from general-purpose large language models towards greater visibility and awareness of tools that could be customised, grounded and shaped by users. The process is still ongoing and slow and there is, I think, a disparity in both awareness and use between students and their teachers (i.e teachers are less aware and doing it less).  At the time, much of the public conversation around AI focused on generic chatbot interactions. Early adopters and compulsive fiddlers like me and a bunch of people I have got to know these last few years from a range of institutions had been fiddling around with Custom GPTs in ChatGPT, Projects in Claude, and on its release in the UK, I think towards end of 2024, Google’s NotebookLM, which changed the user interface and pushed RAG to the forefront of capability awareness. Domain-specific AI assistants grounded in selected documents, tailored instructions and contextual memory addressed to an extent hallucination and trust issues which meant that if you were able to push the many many other issues to one side (!) (and increasingly it seems that students are definitely finding a way to do this), it completely changed the previously typical ‘short prompt in; long bit of superficial but supremely self-assured content out’ interaction. Put even more simply- generic chats are often poor quality but using (no code, non specialist) techniques to ‘ground’ the AI can lead to much improved outcomes. In that article, we discussed how these developments might address some of the persistent frustrations surrounding AI systems beyond hallucinations and generic outputs arguing that this might be the trigger towards more effective use in educational workflows. 

At the end of the article, we asked participants three broad questions:

  • What are your thoughts about the customisation potential of these tools?
  • Have you produced or used customised versions of generative AI models?
  • If not, what do you think the primary gains might be, and what might we lose if we use them too much or unthinkingly?

The response from the learning community was-and continues to be- really intersting. Across more than 400 comments over the last year or so, participants rarely settled into simplistic ‘AI is amazing’ or ‘AI is terrible’ positions which is good given the narratives of AI that persist in both regular and social media. Instead, the discussion revealed something consistently more nuanced- that I’d probably characterise as cautious (granted also sometimes exuberant) optimism mixed with genuine (granted sometimes staring-eyed) concern. One of the clearest themes was that participants saw grounding and customisation as a major improvement over generic AI systems and my own view is that this does mean we need to ensure that every stakeholder has to have at least a broad understanding of the implications of this. Many course participants described a sense of relief at the possibility of more focused, context-aware outputs:

“It feels less like asking a random stranger on the internet and more like consulting an assistant who has actually read the material.”

Another participant nailed the thing that worries me a lot about how far our students are prepared to go along with AI outputs uncritically: 

“The accuracy is exciting, but also dangerous because people may trust it too much precisely because it sounds more grounded.”

No aspect of the discussion generated more visceral reaction than NotebookLM’s podcast feature and while they have introduced more tools since which I think are genuinely valuable in terms of reformatting information, it is still the tool that is most likely to drop jaws of those unfamiliar (even though the default American accent and tone are often cited as reasons to not use it in those outside the US context). Participants used words like “astounding”, “uncanny”, “disturbing”, “impressive”, “surreal” and “frightening”. The realism of the generated dialogue appeared to unsettle people as much as it impressed them. It made me think how we are firmly in an era where the uncanny valley effect has seeped into audio media. 

Screenshot from Notebook LM interactive ‘call in’ podcast feature.

One participant said:

“I forgot after thirty seconds that the voices were artificial.”

But another noted the commonly expressed feeling:

“It sounds polished and convincing, but also oddly empty, like listening to people perform understanding.”

I think that last point is really critical going forward. Many participants were not simply evaluating technical quality because, either explicitly or implicitly, they were addressing issues of trust and deeper connections with content. It’s useful I think to acknowledge, as I did in my previous post reflecting on another forum discussion, that all those contributing had made the decision to follow and actively participate in these discussions- this suggests a strong predisposition to think though I imagine, in open forums, to also be influenced by others in those same threads.  Trust was explicitly raised in terms of the implications for the relationships between students and their teachers and, incorrect or misinformation. If synthetic conversations become indistinguishable from human ones, what happens to assumptions around expertise, testimony and authenticity?

Increasingly the discussion revealed educators who had been experimenting with customised AI systems in interesting and quite sophisticated ways: building assessment-support GPTs, lesson-planning assistants, revision tools and discipline-specific tutoring systems. Others discussed using NotebookLM or Claude Projects to synthesise literature, organise ideas or support research coding.

“I uploaded policy documents, module guides and marking criteria into a custom GPT and immediately got more useful outputs than from standard ChatGPT.”

another offered a more cautious perspective:

“It’s brilliant at producing structure and starting points, but weak at nuance and disciplinary judgement.”

Perhaps my tendency to be optimistic is clouding my view here but participants were not by and large seeming to treat AI as a replacement for expertise- though would they admit that if they were? (unlikely I know). Many explicitly framed it as a collaborative or augmentative tool that could remove repetitive labour leaving them, the (presumably!) humans responsible for interpretation, evaluation and decision-making.

Other common gains included:

  • faster synthesis of complex material
  • support for revision and study
  • Change in the way interactions worked- eg clear indication of source of summary output
  • increased accessibility
  • multilingual support
  • reduced administrative workload
  • support for neurodivergent learners
  • improved contextualisation of AI outputs

“For the first time, I felt AI was pointing me back towards sources rather than away from them.”

“If AI can help with the labour of organisation and sorting, perhaps humans can spend more time actually thinking.”

At the same time, concerns about cognitive offloading cropped up a lot and increasingly so. I maintain that cognitive offloading is not necessarily a bad thing though it connotes that way currently. Writing can be a form of cognitive offloading after all. My most important offload I do at least twice a day (now, frequently, via dictation function in an LLM I have to admit) is the production of to-do lists. Nevertheless, inappropriate cognitive offloading is something we need to spend more time discussing I think.  Participants repeatedly questioned what might happen if learners increasingly bypassed difficult but important cognitive processes such as reading, synthesis, uncertainty and reflection.

“The danger is not that students will cheat. It is that they may stop wrestling with ideas.”

“Convenience is seductive. The risk is that we slowly outsource the struggle that learning depends on.”

And it is that seduction that I think could be a useful lens to reflect on our behaviours. One thing I noted was that a lot of folk were feeling  (like I have for a long time) that there is another cognitive label we have to confront: cognitive dissonance. Many were excited and uneasy at the same time. They argued that there is genuine potential for accessibility, efficiency (I am still a BIG sceptic about this, noting how I am busier – but likely more productive than ever) and learning support, while also insisting that critical thinking, disciplinary expertise and human judgement must remain central – even though they see patterns of loss of centrality as the treacherous AI terrain is navigated awkwardly and unevenly, often without the expert stewardship (for either teacher or student) that is critical. In the same way, though, that I think we need a nuanced perspective on ‘offloading’ I likewise think we need an increasingly nuanced one in terms of the seductive power of customisation. Seduction to me connotes being led astray but people are and will continue to be seduced by things and a lot of pleasure will undoubtedly follow too. Seduction need not be always negatively connoted- if we are drawn to the Siren call of customisation we need to go into experiementation with our eyes and ears open (so avoid Odysseus’ beeswax solution), be aware of the promise of potential new ways of seeing and doing and learning whilst understanding that there’s a real danger of falling too far under a spell.

Blank canvases

Inspired by something I saw in a meeting yesterday morning, I returned today to Gemini Canvas and Claude equivalent (still not sure what it is called). Both these tools are designed to enable you to “go from a blank slate to dynamic previews to share-worthy creations, in minutes.”

The resource I used was The Renaissance of the Essay? (LSE Impact Blog) and the accompanying Manifesto which Claire Gordon (LSE) and I led on with input from colleagues from LSE and here at King’s. I wondered how easily I could make the manifesto a little more dynamic and interactive. In the first instance I was thinking about activating engagement beyond the scroll and secondly thinking about text inputs and reflections.

The basic version in Gemini was a 4th-iteration output where after initial very basic prompt:

“turn this into an interactive web-based and shareable resource”

…I tweaked (using natural language) the underpinning code so that the boxes were formatted better for readability and to minimise scrolling and the reflection component went from purely additional text to a clickable pop-up. I need to test with a screen reader to see how that works of course.

I then experimented with adding reflection boxes and an export notes function. It took 3 or 4 tweaks (largely due to copy text function limits in browser) but this is the latest version. Obviously with work this could be made to look nicer but I’m impressed with initial output and ability to iterate and for functionality in very short time (about 15 mins total).

For the Claude one I thought I’d try having all those features including in-text input interaction from the start. Perhaps that was a mistake, because although the intial output looked great, the text input was buggy. 13 iterations later and I got the input fix. However, then the export function that I’d added around version 3 had stopped working so I needed to do a lot more back and forth. In the end I ran out of time (about 40 mins in and at version 19) and settled on this version with the inadequate copy/ paste function.

It’s all still relatively new and what’s weird about the whole thing is the continual release of beta tools, experiemtnal spaces and things that in any other context would not be released to the World. Nevertheless, there is already utility visible here and no doubt they will continue to improve. I sometimes think that my biggest barrier to finding utility is my own limited imagination. I defintiely vibe off seeing what others have done. This further underlines for me the difference and a significant problem we have going forward. ‘Here’s a thing.’ they say. “What’s it for?’ we ask. ‘I dunno,’ they shrug, ‘efficiency?’

My prompt for this was:
‘tech bros shrugging’

The Manus from U.N.C.L.E.

‘Deploying AI agents’ sounds so hi tech and futuristic to (non Comp-Sci) me whilst weirdly also resonating of classic 60s and 70s TV shows I loved as a kid. I have been fiddling for a while on the blurred boundaries between LLMs and Agents, notably with Claude, but what appealed when I first saw Manus was the execution of outputs seemingly beyond what Claude can manage. Funnily enough it looks quite a bit like Claude but it seems it is actually a multi-tool agent. I pretty much concur with the conclusion from the MIT Tech review:

While it occasionally lacks understanding of what it’s being asked to do, makes incorrect assumptions, or cuts corners to expedite tasks, it explains its reasoning clearly, is remarkably adaptable, and can improve substantially when provided with detailed instructions or feedback. Ultimately, it’s promising but not perfect.

Caiwei Chen

Anyway, I finally got in, having been on the Manus waitlist for a while. Developed by Chinese startup Monica, it is an autonomous AI agent capable of executing complex online tasks without ongoing human input and created something of a buzz. TL:DR: This is the initial output from first prompt to web-based execution. The selection and categorisation need honing but this in my view is an impressive output. The second version after addition of a follow up prompt.

Longer version:

I wanted to see what I could get from a single prompt so decided to see if it could build a shareable, searchable web page that curates short how-to videos (under five minutes) by higher education educators demonstrating uses of Generative AI. I began by requesting Manus to collect and cluster videos showing how AI is applied in teaching, assessment, feedback, and research (Natural Language Prompt). Manus responded immediately by creating a structured project directory and initiating web searches to identify relevant video content, starting with collections from institutions like Notre Dame and Harvard (which it didn’t get beyond in the first iteration).

Once videos were sourced, Manus automatically filtered them to ensure they were under five minutes in length (but failed to note that one wasn’t a video and others linked to same page where they were already curated!) and produced by educators in the higher education sector (this it did get right). It then categorised them by thematic area (Teaching, Assessment & Feedback, AI Literacy, and Research but the categrosations were not always spot on) while also tagging institutional affiliations (mostly OK). The tagging and filtering work pretty well but the actual sourcing is very limited in that first iteration. A metadata database was created to support an interactive interface, allowing users to search videos by presenter, title, or description and filter by category or institution. Manus built this interface and deployed the collection to a live web page: https://cjubrvkx.manus.space. It took about 15 minutes. Iteration is by natural language prompting for fine tuning.

The initial prompt from me and the start of the executaiton of the processes

After launch, I expanded the scope by asking Manus to include contributions from additional individuals and institutions- I threw in my old buddy Dr. Gerhard Kristandl (Dr GeeKay) who I know has a YouTube channel, I added me- Martin Compton (without any additional info) and it found relevant content from each of us. The other three potential souces I threw in reaped no reward though. This is the reasoning:

I searched for content from Tadhg Blommerde, University of Sydney, and Monash University as well, but most of their videos exceeded our 5-minute duration criteria. I’ve documented all findings in the data directory for reference.

Manus resumed its asynchronous task execution, searching for and integrating further video content into the existing framework enabling me to publish a second version . So the output works, I would need much more thoughtful prompting to get it how I want it and, tbh, there are better ways of curating YouTube content but the principle is what interested me and the output is remarkable given the stages and processes it went through.

You can watch a replay of the agent in action here. Not as much fun as watching Napoleon Solo and Illya Kuryakin combat THRUSH (I know, I know).