I’ll believe it when I see it

I finally got round to trying out ‘Nano Banana’, the Google AI Studio image editor. It’s incredible that the ‘naming of AI things’ is as sensible as the naming of cables for Apple products and conducted by a team of 7 year olds. Anyway, long story short, this is pretty remarkable and pretty depressing all in one go.

Here are my first two edits. Each has two iterations. Prompts are followed by the new version in each case.

Unaltered image 1 (Woolwich Ferry, London, taken by me)

Insert a star trek like space ship realistically in the sky

it’s a bit too big, make it smaller, more distant

Boldly going 400 yards. Unless there’s a light breeze.

Unaltered image 2

I look bald in this. I need to be wearing a hat suitable for a spy

My daughter behind me needs to be my arch nemesis. make her stealthy and holding a water pistol

It looks nothing like my daughter and it’s amazing the AI found a hat that actually fits.

The little AI symbol in the corner is sure to fox anyone using such tools for nefariousness.

‘I can tell when it’s been written by AI’

Warning: the first two paragraphs might feel a bit like wading through treacle but I think what follows is useful and is probably necessary context to the activity linked at the end!

LLMs generate text using sophisticated prediction/ probability models, and whilst I am no expert (so if you want proper, technical accounts please do go to an actual expert!) I think it useful to hone in on three concepts that help explain how their outputs feel and read: temperature, perplexity and burstiness. Temperature sets how adventurous the word-by-word choices are: low values produce steady, highly predictable prose; high values (this is on a 0-1 scale) invite surprise and variation (supposedly more ‘creativity’ and certainly more hallucination). Perplexity measures how hard it is to predict the next word overall, and burstiness captures how unevenly those surprises cluster, like the mix of long and short sentences in some human writing, and maybe even a smattering of stretched metaphor and whimsy. Most early (I say early making it sound like mediaeval times but we’re talking 2-3 years ago!) AI writing felt ‘flat’ or ‘bland’ and therefore more detectable to human readers because default temperatures were conservative and burstiness was low.

I imagine most ChatGPT (other tools are available) users do not think much about such things given these are not visible choices in the main user interface. Funnily enough, I do recall these were options in the tools that were publicly available and pre-dated GPT 3.5 (the BIG release in November ’22). Like a lot of things skilled use can impact (so a user might specify a style or tone in the prompt). Also, with money comes better options so, for example,  Pro account custom GPTs can have precise built in customisations. I also note that few seem to use the personalisation options that override some of the things that many folk find irritating in LLM outputs (Mine states for example that it should use British English as default, never use em dashes and use ‘no mark up’ as default). I should also note that some tools still allow for temperature manipulation in the main user interface (Google Gemini AI Studio for example) or when using the API (ChatGPT). Google AI Studio also has a ‘top P’ setting allowing users to specify the extent to which word choices are predictable or not.  These things can drive you to distraction so it’s probably no wonder that most right-thinking, time poor people have no time for experimental tweaking of this nature. But as models have evolved, developers have embedded dynamic temperature controls and other tuning methods that automatically vary these qualities. The result is that the claim ‘I can tell when it’s AI’ may be true of inexpert, unmodified outputs from free tools but so much harder from more sophisticated use and paid for tools. Interestingly, the same appears true for AI detectors. The early detectors’ reliance on low-temperature signatures now need revisiting too for those not already convinced of their vincibility.  

Evolutionary and embedded changes therefore have a humanising effect on LLM outputs. Modern systems can weave in natural fluctuations of rhythm and unexpected word choices, erasing much of the familiar ChatGPT blandness. Skilled (some would say ‘cynical’) users, whether through careful prompting or bypassing text through paraphrasers and ‘humanisers’,  can amplify this further. Early popular detectors such as GPTZero (at my work we are clear colleagues should NEVER be uploading student work to such platforms btw) leaned heavily on perplexity and burstiness patterns to spot machine-generated work, but this is increasingly a losing battle. Detector developers are responding with more complex model-based classifiers and watermarking ideas, yet the arms race remains uneven: every generation of LLMs makes it easier to sidestep statistical fingerprints and harder to prove authorship with certainty.

For fun I ran this article through GPT Zero….Phew!

It is also worth reflecting on what kinds of writing we value. My own style, for instance, happily mixes a smorgasbord of metaphors in a dizzying (or maybe its nauseating) cocktail of overlong sentences, excessive comma use and dated cultural references (ooh, and sprinkles in frequent parentheses too). Others might genuinely prefer the neat, low-temperature clarity an AI can produce. And some humans write with such regularity that a detector might wrongly flag them as synthetic. I understand that these traits may often reflect the writing of neurodivergent or multi-lingual students.

To explore this phenomenon and your own thinking further, please try this short activity. I used my own text as a starting point and generated (in Perplexity) five AI variants of varying temperatures. The activity was built in Claude. The idea is it reveals your own preferred ‘perplexity and burstiness combo’ and might prompt a fresh look at your writing preferences and the blurred boundaries between human and machine style. The temperature degree is revealed when you make your selection. Please try it out and let know how I might improve it (or whether I should chuck it out the window i.e. DefenestrAIt it)

Obviously, as my job is to encourage thinking and reflection about what this means for those teaching, those studying and broadly the institution they work or study in, I’ll finish with a few questions to stimulate reflection or discussion:

In teaching: Do you think you can detect AI writing? How might you respond when you suspect AI use but cannot prove it with certainty? What happens to the teacher-student relationship when detection becomes guesswork rather than evidence?

For assignment design: Could you shift towards process-focused assessment or tasks requiring personal experience, local knowledge or novel data? What kinds of writing assignments become more meaningful when AI can handle the routine ones? Has that actually changed in your discipline or not?

For your students: How can understanding these technical concepts help students use AI tools more thoughtfully rather than simply trying to avoid detection? What might students learn about their own writing voice through activities that reveal their personal perplexity and burstiness patterns? What is it about AI outputs that students who use them value and what is it that so many teachers disdain?

For your institution: Should institutions invest in detection tools given this technological arms race, or focus resources elsewhere? How might academic integrity policies need updating as reliable detection becomes less feasible?

For equity: Are students with access to sophisticated prompting techniques or ‘humanising’ tools gaining unfair advantages? How do we ensure that AI developments don’t widen existing educational inequalities? Who might we be inadvertently discriminating against with blanket bans or no use policies?

For the bigger picture: What kinds of human writing and thinking do we most want to cultivate in an age when machines can produce increasingly convincing text? How do we help students develop authentic voice and critical thinking skills that remain distinctly valuable?

When you know the answer to the last question, let me know.

Essays & AI: collective reflections on the manifesto one year on

Its roughly a year since we (Claire Gordon and I plus a collective of academics from King’s & LSE) published the Manifesto for the Essay in the Age of AI. Despite improvements in the tech AND often pretty compelling evidence and arguments for the reduction of take home, long form writing in summative assessments, I STILL maintain the essay has a role as I did this time last year. On one of the pages of the AI in Education short course authored by colleagues at King’s from the Institute of Psychiatry, Psychology & Neuroscience (Brenda Williams) and Faculty of Dentistry, Oral & Craniofacial Sciences (Pinsuda Srisontisuk and Isabel Miletich) they detail patterns of student AI usage. They end with a suggestion that participants take a structured approach to analysing the Manifesto and the outcome is around 150 responses (to date) offerring a broad range of thoughts and ideas from educators working across disciplines and educational levels across the world. This was the forum prompt:

Is the essay dead?

The manifesto above argues that this is not the case, but many believe that long form writing is no longer a reliable way to assess students. What do you think?

Although contributors come from diverse contexts, some shared patterns and tensions really stand out which I share below. I finish with a wee bit of my own flag waving (seems to be a popular pastime recently).

Sentiment balance

The overwhelming sentiment is broad agreement and reformist.

  • Most participants explicitly reject the idea that “the essay is dead”. They value essays for nurturing critical thinking, argumentation, independence and the ability to sustain a coherent structure.
  • A minority voice expresses stronger doubts, usually linked to practical issues (e.g. heavy marking loads, students’ shrinking reading stamina, or the ease of AI-generated text) and call for greater diversification of assessment.
  • There is also a strand of cautious pragmatism: many see the need for significant redesign of both teaching and assessment to remain relevant and credible.

In short, the mood is hopeful and constructive rather than nostalgic or doom ‘n’ gloom. The essay is not to be discarded but has to be re-imagined.

Here are a couple of sample responses:

Not quite dead, no. I think of essays as a ‘thinking tool’ – it’s a difficult cognitive task, but a worthwhile one. I think, as mentioned in the study, an evolution towards ‘process orientated’ assessment could be the saviour of the essay. Perhaps a movement away from the product (an essay itself) being the sole provider of a summative grade is what’s needed. Thinking of coursework, planning, supervisor meetings and a reflective journal on how their understanding developed over the process of researching, synthesising, planning, writing and redrafting could be included. (JF)

In their current form, many take-home essay assessments no long reliably measure a students’ learning, nor mirror the skills students need for the workplace (as has arguably always been the case for many subjects). I wonder if students may increasingly struggle to see the value of writing essays too. However, I do value the thought processes that go into crafting long form writing. I think if essays are thoughtfully redesigned and include an element of choice for the learner, perhaps with the need to draw on some in-house case study or locally significant issue, then essays are not necessarily dead.(AM)

The neat dodge to this question is to suggest the essay will be like the ship of Theseus. It will remain but every component in it will be made of different materials 🙂 (EP)

Key themes emerging from the comments

1. Process over product
A strikingly common thread is the shift from valuing the final script to valuing the journey of thought and writing. Contributors repeatedly advocate staged submissions, reflective journals, prompts disclosure, oral defences or supervised drafting. This aligns directly with the manifesto’s calls to redefine essay purposes and embed critical reflection (points 3 and 4).

2. Productive integration of AI
Few respondents argue for banning AI (obviously the responses are skewed towards those willing to undertake an AI in Education short course in the first place!). Instead, many echo the manifesto’s seventh and eighth points on integration and equity. Suggestions include:

  • require students to document prompts and edits,
  • use AI to generate counter-arguments or critique drafts,
  • support second-language writers or neurodivergent students with AI grammar or audio aids,
  • design tasks tied to personal data, lab results or workplace contexts that AI cannot easily fabricate.

A persistent caution is that without clear guidance, AI may encourage superficial engagement or plagiarism. Transparent ground rules and explicit teaching of critical AI literacy are seen as essential.

3. Expanding forms and contexts
Many contributors support the manifesto’s second point on diverse forms of written work. They propose hybrid assessments such as essays combined with oral presentations, podcasts, infographics or portfolios. Others emphasise discipline-specific needs: scientific reporting, medical case notes, or creative writing, each with distinct conventions and AI implications.

4. Equity, access and institutional support
There is strong agreement that AI’s benefits and risks are unevenly distributed. Participants highlight the need for:

  • institutional investment in staff development and student training,
  • clarity on acceptable AI use across programmes,
  • assessment designs that do not disadvantage those with limited technological access.

5. Rethinking academic integrity
Several comments resonate with the manifesto’s call to revisit definitions of cheating and originality. Rather than policing AI, some suggest designing assessments that render unauthorised use unhelpful or irrelevant, while foregrounding honesty and reflection.

What this means for the manifesto

The forum feedback affirms the manifesto’s central claim that the essay remains a vital, adaptable form, but it also pushes its agenda in useful directions.

  • Greater emphasis on process-based assessment. While the manifesto highlights process and reflection, practitioners want even stronger endorsement of multi-stage, scaffolded approaches and/ or dialogic or presentational components as the cornerstone of future essay design.
  • Operational guidance for AI use. Educators call for more than principles: they need models of prompt documentation, supervised writing practices and examples of AI-resistant or AI-enhanced tasks.
  • Disciplinary specificity. The manifesto could further acknowledge the wide variance in how essays function, from lab reports to creative pieces and provide pathways for each. Of course we, like everyone are subject to a major impediment…
  • Workload and resourcing. Several voices stress that meaningful change requires institutional support and realistic marking expectations; without these, even the best principles risk remaining aspirational. This for me is likely the biggest impediment, not least because of the ongoing, multi layered crises HE is confronted with just now.

Overall, the conversation demonstrates an appetite for renewal rather than retreat to sole reliance on in-person exams though this remains still a common call. I stand with the consensus view that the essay (and other long form writing) is not in terminal decline but in the midst of a necessary transformation. What we need to see is this: Educators alert to the affordances and limitations of AI, conversations happenning between students and those that support them in discipline and with academic skills and students writing assessments that are AI-literate. As we find our way to the other side of this transititional space we are in, deluged by inappropriate use and assessments too slow in changing, eventually the writing will (again) be genuinely engaging, students will see value in finding their own voices and we’ll move closer to consensus on some new ways of producing as legitimate. When I read posts on social media advocating wholesale shift to exams (irrespective of other competing damages this may connote and in apparent ignorance of the many ways cheating happens in invigilated in person exams) or ‘writing is pointless’ pieces I am struck by the usually implicit but sometimes overt assumption that writing is ONLY valuable as evidence of learning. Too rarely are formative/ developmental aspects rolled into the arguments alongside a failure to connect to persuasive (in this and wider for learning arguments) rationales for reconsidering the impact on grades on how students approach wiritng. And, finally, even if 80% of students did want the easiest route to a polished essay, I’m not abandoning the 20% that appreciate the skills development, the desirable difficulties and will to DO and BE as well as show what they KNOW. Too many of the current narratives advocate not only thowing the baby out with the bathwater but then refuse to feed the baby because, you know, the bathwater was dirty. Unpick THAT strangled metaphor if you can.

Plus ça change; plus c’est a scroll of death

Hang on it was summer a minute a go
I looked at my blog just now and saw my last post was in July. How did the summer go so fast? There’s a wind howling outside, I am wearing a jumper and both actual long dark wintry nights and the long dark metaphorical ones of our political climate seem to loom. To warm myself up a little I have been looking through some tools that offer AI integrations into learning management systems (LMS aka VLEs)* rather than doing ‘actual’ work. That exploration reminded me of the first ever article I had published back in 2004. The piece has long since disappeared from wherever I save the printed version and is no longer online (not everything digital lasts forever, thank goodness) but I dug the text out of an old online storage account and reading it through has made me realise how much things have changed broadly while, in other ways, it is still the same show rumbling along in the background, like Coronation Street (but no-one really remembers when it went from black and white to colour).

What I wrote back then
In that 2004 article I described the excitement of experimenting with synchronous and asynchronous digital discussion tools in WebCT (for those not ancient like me, Web Course Tools – WebCT- was an early VLE developed by the University of British Columbia which was eventually subsumed into Blackboard). I was teaching GCSE English and was programme leader for an ‘Access to Primary Teaching’ course and many of my students were part time so only on campus for 6 hours per week across two evenings. I’d earlier taught myself HTML so I could build a website for my history students- it had lots of text! It had hyperlinks! It had a scolling marquee! Images would have been nice but I knew my limits. When I saw WebCT, I was fired up by the possibilities of discussion forums and live chat. When I set it up and trialled it I saw peer support, increased engagement with tough topics, participation from ‘quiet’ students amongst other benefits. I was so persuaded by the added value potential I even ran workshops with colleagues to share that excitement.

See this great into to WebCT from someone in CS dept at British Columbia from 1998:

That is still me of course. My job has changed and so has the context, but the impulse to share enthusiasm for digital tools that foster dialogue and interaction remains why I do what I do. It was nice to read that and I felt a fleeting affection for that much younger teacher, blissfully unaware of the challenges ahead! Even so and forming a rattling cognitive dissonace that is still there, I was frustrated by the clunky design and awkward user interface that made persuading colleagues to use it really challenging. Log in issues took up a lot of time and balancing ‘learning’ use with what I then called ‘horseplay’ (what was I, 75?!) took a while to calibrate. Nevertheless, I thought these worth working through but, even with some evidence of uptake across the college I was at was apparent, there was a wider scepticism and reluctance. Why wouldn’t they? ‘it’s too complex’; ‘I am too busy’; ‘the way I do it now works just fine, thank you’. Pretty much every digital innovation has been accompanied by similar responses; even the good ones! I speculated about whether we needed a blank sheet of paper to rethink what an LMS could be, but concluded that institutions were more likely to tinker and add features than to start again.

2004? Feels like yesterday; feels like centuries ago
It was only 2003–4 (he says, painfully aware that I have colleagues who were born then), yet experimenting with an LMS felt novel and that comes over really clearly in my article. If you’d asked me this morning when I started using an LMS I might have said 1998 or 99. 2003 feels so recent in the contexct of my whole teaching career. What the heck was I doing before all that? Thinking back I realise that in my first full time job there was only one computer in our office and John S. got to use that as he was a trained typist (so he said). And older than me. In the article I was carefully explaining what chat and forums were and how they were different from one another, so the need for that dates the phenomeon too I suppose. Later, after moving to a Moodle institution, I became e-learning lead and engaged with JISC working groups- a JISC colleague who oversaw the VLE working group jokingly called me Mr Anti-Moodle because I was vocal in my critiques. It wasn’t quite acccurate- I was critical for sure but then, as now, I liked the concept but disliked the way it worked. Persuading people to adopt an LMS was hard as I said, and, while I have seen some brilliant use of Moodle and the like, my impression is that the majority (argue with me on this though) of LMS course are functional repositiories with interactive and creative applications the exception rather than the norm. The scroll of death was a thing in 2005 and it is as much of a thing now. It also made me think of current ‘Marmitey’ positions folk are taking re: AI. Basically, AI (big and ill defined as it usually is) has to come with nuance and understanding so binary, entrenched, one size fits all positons are unhelpful and, in my view, hard to rationalise and sustain.

The familiar LMS problem
Back to the LMS, from WebCT to Moodle and other common current systems, the underlying functionality has barely shifted (I mean from the perspective of your average teacher/lecturer or student). Many still say Moodle feels very 1990s (probably they mean early 2000s but I suspect they, like me, find it hard to reconcile the idea of any year starting with a 2000 could be a long time ago). Ultimately I think none of these systems offered a genuinely encouraging combination of interface and user experience and that is an issues that persists to this day. The legacy of those early design decisions lingers, and we are still working around them. People have been predicting the death of the VLE for years (including me) but it has not happened. When I first saw Microsoft Teams just before Covid, I thought here’s the nail in the coffin. I was wrong again. Maybe being wrong about the end of the LMS is another running theme.

Will AI change the LMS story?
So what about AI powered integrations? Will they revolutionise how the LMS works? Will they be part of the reason for a shift away from them? Unlikely in either sense is my best guess. Everything I see now is about embellishments and shortcuts that feed into the existing structure. My old dream of a blank-sheet LMS revolution has faded. Thirty years of teaching and more than twenty years using LMSs suggest that this is one component of digital education that will not fade away. The tools will keep evolving, but the slow, steady thrum of the LMS endures in the background. I realise that I have finally predicted non change so don’t bet on that as I have been wrong quite a bit in the past. What I do know is that digital discussions using tools to support dialogic pedagogies have persisted as have the issues related to them. Only 10-20% of my students use the forums! I hear that still. But what I realised in 2004 and maintain to this day is that 10-20% is a significat embellishment for some and alternative for others so I stick with what I said back then in that sense at least. Oh, and lurking is a legit and fine thing for yet others!

One of the most wonderful things about the AI in Education course (so close to 15,000 participants!) is the forums. They add layers of interest that cannot be planned or produced. I estimate only 10-15% of participants post but what a contribution they are making and its an enhancement that keeps me there and, I am convinced, adds real value to those not posting too.

*I’ll stick with LMS as this seems to be pretty ubiquitous these days though I am aware of the distinctions and when I wrote the piece about ‘WebCT’ the term VLE was very much go to.