‘Good’ English & the AI thing

I was once told by a fellow English teacher (back in the day) that it was funny I was teaching English. I asked why. ‘Well you’re all ‘wiv’ and ‘froo,” she said, highlighting my non standard accent. After that I meekly tried to speak ‘better’ for ages, especially in company of the other English teachers. It knocked my sideways but over time I fought back in small ways. Not enough though. Conversations and a couple of articles I read on WonkHE this week have given me a chance to think about how AI is offerring leverage to change and a critical lens on what we value in so called academic English.

I have just come from a meeting with Kelly Webb-Davies, whose thinking (nicely exemplified here ‘I wrote this- or did I?’ and here: ‘On the toxicity of assuming writing is thinking’ ) and ongoing work at Oxford I have long found a perfect prod to the hornets’ nest that is my brain. What Kelly consistently does so well, and what came through again very strongly in our discussion, is a refusal to accept binary, nuance-less framing of AI as either an existential threat or panacea to drudgery. Instead, Kelly’s focus in our conversation was towards much harder and more important questions: what do we think writing is for (anywhere but especially in academia where student writing is the means by which they are evaluated not the the thing they are being judged on), why we value it so highly in assessment, and where might our assumptions about writing, cognition and academic legitimacy be misplaced. 

Kelly’s work is particularly powerful in contrasting privileged and/or colonial thinking about ‘correct’ and ‘acceptable’ academic writing with translanguaging, our idiolects, cultural and linguistic histories and neurodiversity. Kelly problematises assumptions about writing as a technology that suggest writing is a natural proxy for thinking or even THE way of doing thinking (can you do thinking?). In addition, writing has always evolved alongside other technologies, from inscription to print to word processors, and yet higher education continues to treat a narrow, highly codified form of written English as if it were a timeless measure of intellectual engagement. Thinking manifests in multiple ways, many of which are poorly served by conventional academic writing. AI is able to bridge idiolect to ‘accepted’ language in standard forms (see how I keep using inverted commas?) translating between ways of thinking and the forms the academy currently recognises. This then raises questions about how far we might use AI to compromise in this way or whether we might leverage this truly disruptive phenomenon to challenge yet more of our fundamental beliefs about what is and is not acceptable as well as how we validate learning. Received wisdom may not be that wise at all; a point I tried to make in my previous post. I’m really looking forward to reading and hearing more about Kelly’s ideas a for Voice First Written Assessment (VFWA) which is a really practical and thoughtful approach that values the way people think, speak and write, putting that at the heart of assessments (while enabling valid assessment; everyone’s a winner!)

My meeting coincided with publication of a piece from Jim Dickinson earlier this week. Dickinson does not deny the risks of AI, nor does he minimise the evidence that poorly designed uses of AI can undermine learning. The folk (me included) writing or talking about AI often feel obligated to front load caveats and boundaries (I understand this; I don’t want to talk about that… while thinking:  ‘please don’t derail this session by insisting we talk about ‘y’) though Dickinson does pretty good job of weaving a LOT of things in! He very helpfully brings together ‘famous’ – perhaps even notorious- studies from a growing body of research that shows, when taken collectively, how uncritical, convenience-driven AI use can hollow out motivation, attention and the owning of ideas and learning itself but can also be a boon to ‘productive struggle’ and valuable scaffold (if used advisedly).  What emerged was that the problem is not helpful when framed as AI as this broad, abstract threat. The ongoing research is critical but that shouldn’t legitimise stalling interventions, exploration and confrontation of the harsh realities of the extensive, actual, often unsupported use of AI in research and writing (as production) workflows. The way learning and assessment continue to be designed around immature understandings constrained by tradition and conservatism (my reading, not his words) gives a sense that what we thought we were doing were frail even before ChatGPT. The same tool that produces passivity in one design can deepen judgement and persistence in another. Potential and frailty coexist, and the difference is pedagogical intent, design and scaffolding.

In another piece this week on WonkHE, Rex McKenzie, makes an observation that will I am sure cause debate and consternation but it’s a fundamental one. McKenzie’s comparison between university expectations and journal publishing practices is pretty stark. He shows that while students in many (most? all?) institutions are policed for using AI to shape language, structure and expression, the professional academic world (as manifested in journal policies on AI use) has largely accepted AI-assisted writing, provided accountability and disclosure remain with the human author. This contrast exposes how much of what we currently assess is not intellectual substance but adherence to a particular linguistic performance. Across academic publishing trust appears to be growing on the assumption that if writing is honed it is ok because the research and ideas are owned by the authors. This then raises a really uncomfortable question about permitted AI use (not least in traffic light systems) where ideation is often seen as fine but the use of AI to support writing is taboo. Have we got it arse about face? (I used that phrasing as a less than subtle way to signal that no AI is choosing my metaphors). 

Rex McKenzie’s thinking converges strongly with Kelly’s argument. The insistence that “writing is thinking” is not an innocent pedagogical claim; it is historically and culturally situated. It privileges those already fluent in dominant academic registers and marginalises others, whether through class, language background, neurotype or other trait or characteristic. Treating one form of English as the sole legitimate evidence of cognitive engagement risks mistaking conformity for rigour. AI does not create this problem, but (finally!) it makes it harder to ignore. Talking with Kelly and reading these articles and thinking about the ‘problem’ of AI I arrive at what will be for many a really uncomfortable conclusion. It’s been said before but it’s worth saying again through a lens of challenge to the hegemony of standard forms of expression and writing: The AI threat is not primarily about cheating, efficiency or technological disruption. It’s a threat to convention; to conservatism; to tradition; to imagined halcyon times. Let us re-articulate and argue about what we value, what we recognise and what we are willing to redesign. If we continue to treat writing as both the locus and the pre-eminent proof of thinking, we will remain trapped in defensive and incoherent policy positions. If, instead, we take seriously questions of cognitive engagement, judgement and inclusion, then AI catalyses (and can even enable) long-overdue honesty about assessment, pedagogy and realities of how systems continue to marginalise. 

Kelly will be speaking at a compassionate assessment event on 5th March – you can also see my contribution to the compassionate assessment resources via QAA pages here.

Comet limits restless legs

I’m one of those people whose knee is constantly jiggling. Especially when I am sat in ‘receive’ mode in a meeting or something. To reduce the jiggling I fiddle with things and the thing I have been fiddling with will be familiar to anyone who likes to see what all the fuss is about with new tech. I’m always asking myself- novelty or utility? (I had my fingers burnt with interactive whiteboards and have been cautious ever since). You may be interested in the output of Perplexity’s ‘Comet’- the browser based AI agent, the outcomes of which are littering LinkedIn right now- or the video below which is a conversation between me and one of my AI avatars… if not either of these I’d stop reading now tbh.

In the image below is a link to what I instructed using a simple prompt: “display this video in a window with some explanatory text about what it is and then have a self-marking multi choice quiz below it.” [youtube link]

It is a small web application that displays a YouTube video, provides some explanatory text, and then offers a self-marking multiple choice quiz beneath it.

Click on the image to see the artefact and try the quiz

The process was straightforward but illuminating. The agent prepared an interactive webpage with three generated files (index.html, style.css, and app.js) and then assembled them into a functioning app. It automatically embedded the YouTube video correctly (but needed an additional prompt when it did not initially display), added explanatory text about the focus of the video (AI in education at King’s College London), and then generated an eight-question multiple choice quiz based on the transcript.

The quiz has self-marking functionality, with immediate feedback, score tracking and final results. The design is clean and the layout works in my view. The questions cover key points from the transcript: principles, the presenter’s role, policy considerations and recommendations for upskilling. The potential applications are pretty obvious I think. Next step would be to look at likely accessibility issues (a quick check highlights a number of heading and formatting issues), finding a better solution for hosting and then the extent to which fine tuning the questions for level is do-able with ease. But given I only needed to tweak one for this example, even that basic functionality suggests this will be of use.

The real novelty here is the browser but also the execution. I have tried a few side-by-side experiments with Claude and in each the fine tuning needed for a satisfactory output was less here. The one failed experiment so far is converting all my saved links to a searchable / filterable dashboard. The dashboard looks good but I think there were too many links and it kept failing to make all the links active. Where tools like notebook LM are offerring a counter UX to text in; reams out LLMs of the ChatGPT variety, this offers a closer-to-seamless agent experience and it is both ease of use and actualy utility that will drive use I think.

‘I can tell when it’s been written by AI’

Warning: the first two paragraphs might feel a bit like wading through treacle but I think what follows is useful and is probably necessary context to the activity linked at the end!

LLMs generate text using sophisticated prediction/ probability models, and whilst I am no expert (so if you want proper, technical accounts please do go to an actual expert!) I think it useful to hone in on three concepts that help explain how their outputs feel and read: temperature, perplexity and burstiness. Temperature sets how adventurous the word-by-word choices are: low values produce steady, highly predictable prose; high values (this is on a 0-1 scale) invite surprise and variation (supposedly more ‘creativity’ and certainly more hallucination). Perplexity measures how hard it is to predict the next word overall, and burstiness captures how unevenly those surprises cluster, like the mix of long and short sentences in some human writing, and maybe even a smattering of stretched metaphor and whimsy. Most early (I say early making it sound like mediaeval times but we’re talking 2-3 years ago!) AI writing felt ‘flat’ or ‘bland’ and therefore more detectable to human readers because default temperatures were conservative and burstiness was low.

I imagine most ChatGPT (other tools are available) users do not think much about such things given these are not visible choices in the main user interface. Funnily enough, I do recall these were options in the tools that were publicly available and pre-dated GPT 3.5 (the BIG release in November ’22). Like a lot of things skilled use can impact (so a user might specify a style or tone in the prompt). Also, with money comes better options so, for example,  Pro account custom GPTs can have precise built in customisations. I also note that few seem to use the personalisation options that override some of the things that many folk find irritating in LLM outputs (Mine states for example that it should use British English as default, never use em dashes and use ‘no mark up’ as default). I should also note that some tools still allow for temperature manipulation in the main user interface (Google Gemini AI Studio for example) or when using the API (ChatGPT). Google AI Studio also has a ‘top P’ setting allowing users to specify the extent to which word choices are predictable or not.  These things can drive you to distraction so it’s probably no wonder that most right-thinking, time poor people have no time for experimental tweaking of this nature. But as models have evolved, developers have embedded dynamic temperature controls and other tuning methods that automatically vary these qualities. The result is that the claim ‘I can tell when it’s AI’ may be true of inexpert, unmodified outputs from free tools but so much harder from more sophisticated use and paid for tools. Interestingly, the same appears true for AI detectors. The early detectors’ reliance on low-temperature signatures now need revisiting too for those not already convinced of their vincibility.  

Evolutionary and embedded changes therefore have a humanising effect on LLM outputs. Modern systems can weave in natural fluctuations of rhythm and unexpected word choices, erasing much of the familiar ChatGPT blandness. Skilled (some would say ‘cynical’) users, whether through careful prompting or bypassing text through paraphrasers and ‘humanisers’,  can amplify this further. Early popular detectors such as GPTZero (at my work we are clear colleagues should NEVER be uploading student work to such platforms btw) leaned heavily on perplexity and burstiness patterns to spot machine-generated work, but this is increasingly a losing battle. Detector developers are responding with more complex model-based classifiers and watermarking ideas, yet the arms race remains uneven: every generation of LLMs makes it easier to sidestep statistical fingerprints and harder to prove authorship with certainty.

For fun I ran this article through GPT Zero….Phew!

It is also worth reflecting on what kinds of writing we value. My own style, for instance, happily mixes a smorgasbord of metaphors in a dizzying (or maybe its nauseating) cocktail of overlong sentences, excessive comma use and dated cultural references (ooh, and sprinkles in frequent parentheses too). Others might genuinely prefer the neat, low-temperature clarity an AI can produce. And some humans write with such regularity that a detector might wrongly flag them as synthetic. I understand that these traits may often reflect the writing of neurodivergent or multi-lingual students.

To explore this phenomenon and your own thinking further, please try this short activity. I used my own text as a starting point and generated (in Perplexity) five AI variants of varying temperatures. The activity was built in Claude. The idea is it reveals your own preferred ‘perplexity and burstiness combo’ and might prompt a fresh look at your writing preferences and the blurred boundaries between human and machine style. The temperature degree is revealed when you make your selection. Please try it out and let know how I might improve it (or whether I should chuck it out the window i.e. DefenestrAIt it)

Obviously, as my job is to encourage thinking and reflection about what this means for those teaching, those studying and broadly the institution they work or study in, I’ll finish with a few questions to stimulate reflection or discussion:

In teaching: Do you think you can detect AI writing? How might you respond when you suspect AI use but cannot prove it with certainty? What happens to the teacher-student relationship when detection becomes guesswork rather than evidence?

For assignment design: Could you shift towards process-focused assessment or tasks requiring personal experience, local knowledge or novel data? What kinds of writing assignments become more meaningful when AI can handle the routine ones? Has that actually changed in your discipline or not?

For your students: How can understanding these technical concepts help students use AI tools more thoughtfully rather than simply trying to avoid detection? What might students learn about their own writing voice through activities that reveal their personal perplexity and burstiness patterns? What is it about AI outputs that students who use them value and what is it that so many teachers disdain?

For your institution: Should institutions invest in detection tools given this technological arms race, or focus resources elsewhere? How might academic integrity policies need updating as reliable detection becomes less feasible?

For equity: Are students with access to sophisticated prompting techniques or ‘humanising’ tools gaining unfair advantages? How do we ensure that AI developments don’t widen existing educational inequalities? Who might we be inadvertently discriminating against with blanket bans or no use policies?

For the bigger picture: What kinds of human writing and thinking do we most want to cultivate in an age when machines can produce increasingly convincing text? How do we help students develop authentic voice and critical thinking skills that remain distinctly valuable?

When you know the answer to the last question, let me know.

Essays & AI: collective reflections on the manifesto one year on

Its roughly a year since we (Claire Gordon and I plus a collective of academics from King’s & LSE) published the Manifesto for the Essay in the Age of AI. Despite improvements in the tech AND often pretty compelling evidence and arguments for the reduction of take home, long form writing in summative assessments, I STILL maintain the essay has a role as I did this time last year. On one of the pages of the AI in Education short course authored by colleagues at King’s from the Institute of Psychiatry, Psychology & Neuroscience (Brenda Williams) and Faculty of Dentistry, Oral & Craniofacial Sciences (Pinsuda Srisontisuk and Isabel Miletich) they detail patterns of student AI usage. They end with a suggestion that participants take a structured approach to analysing the Manifesto and the outcome is around 150 responses (to date) offerring a broad range of thoughts and ideas from educators working across disciplines and educational levels across the world. This was the forum prompt:

Is the essay dead?

The manifesto above argues that this is not the case, but many believe that long form writing is no longer a reliable way to assess students. What do you think?

Although contributors come from diverse contexts, some shared patterns and tensions really stand out which I share below. I finish with a wee bit of my own flag waving (seems to be a popular pastime recently).

Sentiment balance

The overwhelming sentiment is broad agreement and reformist.

  • Most participants explicitly reject the idea that “the essay is dead”. They value essays for nurturing critical thinking, argumentation, independence and the ability to sustain a coherent structure.
  • A minority voice expresses stronger doubts, usually linked to practical issues (e.g. heavy marking loads, students’ shrinking reading stamina, or the ease of AI-generated text) and call for greater diversification of assessment.
  • There is also a strand of cautious pragmatism: many see the need for significant redesign of both teaching and assessment to remain relevant and credible.

In short, the mood is hopeful and constructive rather than nostalgic or doom ‘n’ gloom. The essay is not to be discarded but has to be re-imagined.

Here are a couple of sample responses:

Not quite dead, no. I think of essays as a ‘thinking tool’ – it’s a difficult cognitive task, but a worthwhile one. I think, as mentioned in the study, an evolution towards ‘process orientated’ assessment could be the saviour of the essay. Perhaps a movement away from the product (an essay itself) being the sole provider of a summative grade is what’s needed. Thinking of coursework, planning, supervisor meetings and a reflective journal on how their understanding developed over the process of researching, synthesising, planning, writing and redrafting could be included. (JF)

In their current form, many take-home essay assessments no long reliably measure a students’ learning, nor mirror the skills students need for the workplace (as has arguably always been the case for many subjects). I wonder if students may increasingly struggle to see the value of writing essays too. However, I do value the thought processes that go into crafting long form writing. I think if essays are thoughtfully redesigned and include an element of choice for the learner, perhaps with the need to draw on some in-house case study or locally significant issue, then essays are not necessarily dead.(AM)

The neat dodge to this question is to suggest the essay will be like the ship of Theseus. It will remain but every component in it will be made of different materials 🙂 (EP)

Key themes emerging from the comments

1. Process over product
A strikingly common thread is the shift from valuing the final script to valuing the journey of thought and writing. Contributors repeatedly advocate staged submissions, reflective journals, prompts disclosure, oral defences or supervised drafting. This aligns directly with the manifesto’s calls to redefine essay purposes and embed critical reflection (points 3 and 4).

2. Productive integration of AI
Few respondents argue for banning AI (obviously the responses are skewed towards those willing to undertake an AI in Education short course in the first place!). Instead, many echo the manifesto’s seventh and eighth points on integration and equity. Suggestions include:

  • require students to document prompts and edits,
  • use AI to generate counter-arguments or critique drafts,
  • support second-language writers or neurodivergent students with AI grammar or audio aids,
  • design tasks tied to personal data, lab results or workplace contexts that AI cannot easily fabricate.

A persistent caution is that without clear guidance, AI may encourage superficial engagement or plagiarism. Transparent ground rules and explicit teaching of critical AI literacy are seen as essential.

3. Expanding forms and contexts
Many contributors support the manifesto’s second point on diverse forms of written work. They propose hybrid assessments such as essays combined with oral presentations, podcasts, infographics or portfolios. Others emphasise discipline-specific needs: scientific reporting, medical case notes, or creative writing, each with distinct conventions and AI implications.

4. Equity, access and institutional support
There is strong agreement that AI’s benefits and risks are unevenly distributed. Participants highlight the need for:

  • institutional investment in staff development and student training,
  • clarity on acceptable AI use across programmes,
  • assessment designs that do not disadvantage those with limited technological access.

5. Rethinking academic integrity
Several comments resonate with the manifesto’s call to revisit definitions of cheating and originality. Rather than policing AI, some suggest designing assessments that render unauthorised use unhelpful or irrelevant, while foregrounding honesty and reflection.

What this means for the manifesto

The forum feedback affirms the manifesto’s central claim that the essay remains a vital, adaptable form, but it also pushes its agenda in useful directions.

  • Greater emphasis on process-based assessment. While the manifesto highlights process and reflection, practitioners want even stronger endorsement of multi-stage, scaffolded approaches and/ or dialogic or presentational components as the cornerstone of future essay design.
  • Operational guidance for AI use. Educators call for more than principles: they need models of prompt documentation, supervised writing practices and examples of AI-resistant or AI-enhanced tasks.
  • Disciplinary specificity. The manifesto could further acknowledge the wide variance in how essays function, from lab reports to creative pieces and provide pathways for each. Of course we, like everyone are subject to a major impediment…
  • Workload and resourcing. Several voices stress that meaningful change requires institutional support and realistic marking expectations; without these, even the best principles risk remaining aspirational. This for me is likely the biggest impediment, not least because of the ongoing, multi layered crises HE is confronted with just now.

Overall, the conversation demonstrates an appetite for renewal rather than retreat to sole reliance on in-person exams though this remains still a common call. I stand with the consensus view that the essay (and other long form writing) is not in terminal decline but in the midst of a necessary transformation. What we need to see is this: Educators alert to the affordances and limitations of AI, conversations happenning between students and those that support them in discipline and with academic skills and students writing assessments that are AI-literate. As we find our way to the other side of this transititional space we are in, deluged by inappropriate use and assessments too slow in changing, eventually the writing will (again) be genuinely engaging, students will see value in finding their own voices and we’ll move closer to consensus on some new ways of producing as legitimate. When I read posts on social media advocating wholesale shift to exams (irrespective of other competing damages this may connote and in apparent ignorance of the many ways cheating happens in invigilated in person exams) or ‘writing is pointless’ pieces I am struck by the usually implicit but sometimes overt assumption that writing is ONLY valuable as evidence of learning. Too rarely are formative/ developmental aspects rolled into the arguments alongside a failure to connect to persuasive (in this and wider for learning arguments) rationales for reconsidering the impact on grades on how students approach wiritng. And, finally, even if 80% of students did want the easiest route to a polished essay, I’m not abandoning the 20% that appreciate the skills development, the desirable difficulties and will to DO and BE as well as show what they KNOW. Too many of the current narratives advocate not only thowing the baby out with the bathwater but then refuse to feed the baby because, you know, the bathwater was dirty. Unpick THAT strangled metaphor if you can.

Transparent AI workflows

I just posted this on innovation and I thought the way I got to that from an idea that popped into my head while listening to a podcast when walking to the station might be interesting for a few reasons. 1. Because we talk a lot about transparency in AI use…well the link below and then the final post reveals the stages I went through (other than the comprehensive final edit I did in MS Word). 2. I think it shows a lot about the increasing complexity in the nature of authorship. 3. It shows that where AI augments writing, how challenging it is to capture the nature of use and actually ‘be’ transparent because writing has suddenly become somehting I can do on the move and 4. will challenge many to consider the legitimacy and quality of writing when produced in this way. I should also note that this post, I did in the regular way of typing direxctly into the WordPress author window without (not sure why) built in spellechecker.

Here is the full transcript of the audio conversation followed by (at the draft stage) additional text based prompts I did while strap hanging on the tube. The final edit I did this afternoon on my laptop.

Image: https://www.pexels.com/@chris-f-38966/

The Manus from U.N.C.L.E.

‘Deploying AI agents’ sounds so hi tech and futuristic to (non Comp-Sci) me whilst weirdly also resonating of classic 60s and 70s TV shows I loved as a kid. I have been fiddling for a while on the blurred boundaries between LLMs and Agents, notably with Claude, but what appealed when I first saw Manus was the execution of outputs seemingly beyond what Claude can manage. Funnily enough it looks quite a bit like Claude but it seems it is actually a multi-tool agent. I pretty much concur with the conclusion from the MIT Tech review:

While it occasionally lacks understanding of what it’s being asked to do, makes incorrect assumptions, or cuts corners to expedite tasks, it explains its reasoning clearly, is remarkably adaptable, and can improve substantially when provided with detailed instructions or feedback. Ultimately, it’s promising but not perfect.

Caiwei Chen

Anyway, I finally got in, having been on the Manus waitlist for a while. Developed by Chinese startup Monica, it is an autonomous AI agent capable of executing complex online tasks without ongoing human input and created something of a buzz. TL:DR: This is the initial output from first prompt to web-based execution. The selection and categorisation need honing but this in my view is an impressive output. The second version after addition of a follow up prompt.

Longer version:

I wanted to see what I could get from a single prompt so decided to see if it could build a shareable, searchable web page that curates short how-to videos (under five minutes) by higher education educators demonstrating uses of Generative AI. I began by requesting Manus to collect and cluster videos showing how AI is applied in teaching, assessment, feedback, and research (Natural Language Prompt). Manus responded immediately by creating a structured project directory and initiating web searches to identify relevant video content, starting with collections from institutions like Notre Dame and Harvard (which it didn’t get beyond in the first iteration).

Once videos were sourced, Manus automatically filtered them to ensure they were under five minutes in length (but failed to note that one wasn’t a video and others linked to same page where they were already curated!) and produced by educators in the higher education sector (this it did get right). It then categorised them by thematic area (Teaching, Assessment & Feedback, AI Literacy, and Research but the categrosations were not always spot on) while also tagging institutional affiliations (mostly OK). The tagging and filtering work pretty well but the actual sourcing is very limited in that first iteration. A metadata database was created to support an interactive interface, allowing users to search videos by presenter, title, or description and filter by category or institution. Manus built this interface and deployed the collection to a live web page: https://cjubrvkx.manus.space. It took about 15 minutes. Iteration is by natural language prompting for fine tuning.

The initial prompt from me and the start of the executaiton of the processes

After launch, I expanded the scope by asking Manus to include contributions from additional individuals and institutions- I threw in my old buddy Dr. Gerhard Kristandl (Dr GeeKay) who I know has a YouTube channel, I added me- Martin Compton (without any additional info) and it found relevant content from each of us. The other three potential souces I threw in reaped no reward though. This is the reasoning:

I searched for content from Tadhg Blommerde, University of Sydney, and Monash University as well, but most of their videos exceeded our 5-minute duration criteria. I’ve documented all findings in the data directory for reference.

Manus resumed its asynchronous task execution, searching for and integrating further video content into the existing framework enabling me to publish a second version . So the output works, I would need much more thoughtful prompting to get it how I want it and, tbh, there are better ways of curating YouTube content but the principle is what interested me and the output is remarkable given the stages and processes it went through.

You can watch a replay of the agent in action here. Not as much fun as watching Napoleon Solo and Illya Kuryakin combat THRUSH (I know, I know).

But how? And why even? Practical examples of ways assessments have been modified

Modifying or changing assessment ‘because of AI’ always feels like it feeds ‘us and them’ narratives of a forthcoming apocalypse (already predicted) and couches the change as necessary only because of this insidious, awful thing that no-one wants except men in leather chairs who stroke white cats.

It is of course MUCH more complex than that and much of the desired change has been promoted by folk with a progressive, reform, equity, inclusion eye who do (or immerse themselves in) scholarship of HE pedagogy and assessment practices.

Anyway, a colleague suggested that we should have a collection of ideas about practical ways assessments could be modified to either make them more AI ‘robust’ or at least ‘AI aware’ or ‘ AI inclusive’ (I’m hesitant to say ‘resitant’ of course). Whilst colleagues across King’s have been sharing and experimenting it is probably true to say that there is not a single point of reference. We are in King’s Academy working on remedying this as part of the wider push to support TASK (transforming assessment for students at King’s) and growing AI literacy but first I wanted to curate a few examples from elsewhere to offer a point of reference for me and to share with colleagues in the very near future. I’ve gone for diversity from things I have previously book marked. Other than that, they are here only to offer points of discussion, inspiration, provocation or comparison!

Before I start I should remind KIng’s colleagues of our own guidance and the assessment principles therein, note that with collleagues at LSE, UCL and Southampton I am working on some guidance on the use of AI to assist with marking (forthcoming and controversial). Some of the College Teaching Fund projects looked at assessment and This AI Assessment Scale from Perkins et al. (2024) has a lot of traction in the sector too and is not so dissimilar from the King’s 4 levels of use approach. It’s amazing how 2023 can feel a bit dated in terms of resources these days but this document form the QAA is still relevant and applicable and sets out broader, sector level approarpriate principles. In summary:

  • Institutions should review and reimagine assessment strategies, reducing assessment volume to create space for activities like developing AI literacy, a critical future graduate attribute.
  • Promote authentic and synoptic assessments, enabling students to apply integrated knowledge practically, often in workplace-related settings, potentially incorporating generative AI.
  • Move away from traditional, handwritten, invigilated exams towards innovative approaches like digital exams, observed discipline-specific assessments or oral examinations
  • Design coursework explicitly integrating generative AI, encouraging ethical use, reflection, and hybrid submissions clearly acknowledging AI-generated content.
  • Follow guiding principles ensuring assessments are sustainable, inclusive, aligned to learning outcomes, and effectively demonstrate relevant competencies, including appropriate AI usage.

I’m also increasingly referring to the two lane approach being adopted by Sydney which leans heavily into similar principles. Context is different to UK of course but I have a feeling we will find ourselves moving much closer to the broad approach here. It feels radical but perhaps no more radical than what many, if not most, unis did in Covid.

Finally, the examples

Example 1. UCL Medical Sciences BSc.

  • Evaluation of coursework assessments to determine susceptibility to generative AI and potential integration of AI tools.
  • Redesign of assessments to explicitly incorporate evaluation of ChatGPT-generated outputs, enhancing critical evaluation skills and understanding of AI limitations.
  • Integration of generative AI within module curricula and teaching practices, providing formative feedback opportunities.
  • Collection of student perspectives and experiences through questionnaires and focus groups on AI usage in learning and assessments.
  • Shift towards rethinking traditional assessment formats (MCQs, SAQs, essays) due to AI’s impact, encouraging ongoing pedagogical innovation discussions.

Example 2 – Cardiff University Immunology Wars

  • Gamification: Complex immunology concepts taught through a Star Wars-inspired, game-based approach.
  • AI-driven game design: ChatGPT 4.0 used to structure game scenarios, resources, and dynamic challenges.
  • Visual resources with AI: DALLE-3 employed to create engaging imagery for learning materials.
  • Iterative AI prompting: An innovative method using progressive ChatGPT interactions to refine complex game elements.
  • Practical, collaborative learning: Students collaboratively trade resources to combat diseases, supported by iterative testing and refinement of the game.

Example 3 Traffic lights University Winsconsin Green Bay

The traffic light system they are implementing is reflected in these three sample assessments:

  1. Red light – prohibited
  2. Yellow light – limited use
  3. Green Light – AI embedded into the task

Example 4 Imperial Business School MBA group work

  • Integration of AI: The original essay task was redesigned to explicitly require students to use an LLM, typically ChatGPT.
  • The change: Individual component of wider collaborative task. Students submit both the AI-generated output (250 words) and a critical evaluation of that output (250 words) on what is unique about a business proposal.
  • Critical Engagement Emphasis: The new task explicitly focuses on students’ critical analysis of AI capabilities and limitations concerning their business idea.
  • Reflective Skill Development: Students prompted to reflect on, critique, and consider improvements or extensions of AI-generated content, enhancing their evaluative and adaptive skills.

3 for 1! Example 5 – Harvard

Create a fictional character and interview them

World building for creative writing

Historical journey

More to follow…

Also note:

Manifesto for the essay

Related article (Compton & Gordon, 2024)
 
Also see: (Syska, 2025)We tried to kill the essay

Bots with character

This is a swift intro to character AI *(note 1)- a tool that is available to use for free currently (on a freemium model). My daughter showed it to me some months ago. It appears as a novelty app but is used (as I understand it) beyond entertainment for creative activity, gaming, role playing and even emotional support. For me it is the potential to test ideas that many have about bot potential for learning that is most interesting. By shifting focus away from ‘generating essays’ it is possible to see the appeal of natural language exchanges to augment learning in a novel medium. While I can think of dozens of use cases based on the way I currently (for example) use YouTube to help me to learn how to unblock a washing machine I imagine that is a continuum that goes all the way up to teacher replacement.*(note 2) Character AI is built on large language model, employs ‘reinforcement’ (learning as coversations continue) and provides an easy to learn interface (basically typing stuff in boxes) that allows you to ground the bot with ease in a wysiwyg interface.

As I see it, it offers three significant modifications in the default interface to standard (free) LLMs. 1. You can create characters and define their knowledge and ‘personality’ traits by having space to ground the bot behaviour through customisation. 2. You can have voice exchanges by ‘calling’ the character. 3. Most importantly, it shifts the technology back to interaction and away from lengthy generation (though they can still go on a bit if you don’t bake succinctness in!) What interests me most is the potential to use tools like this to augment learning, add some novelty and provide reinforcement opportunity through text or voice based exchanges. I have experimented with creating some academic architypes for my students to converse with. This one is a compassionate pedagogue, this one is keen on AI for teaching and learning, this one a real AI sceptic, this one deeply worried about academic integrity. They each have a back story, defined university role and expertise. I tried to get people to test arguments and counter arguments and to work through difficult academic encounters. It’s had mixed reviews so far: Some love it; some REALLY do not like it at all!

How do/ could you use a tool like this?

Note 1. This video in no way connotes promotion or recommendation (by me or by my employer) of this software. Never upload data you are not comfortable sharing and never upload your own or other’s personal data.

Note 2: I am not a proponent of this! There may be people who think this is the panacea to chronic educational underfunding though so beware.

Conversing with AI: Natural language exchanges with and among the bots

In the fast evolving landscape of AI tools, two recent releases have really caught my attention: Google’s NotebookLM and the advanced conversational features in ChatGPT. Both offer intriguing possibilities for how we might interact with AI in more natural, fluid ways.

NotebookLM, still in its experimental stage and free to use, is well worth exploring- as one of my King’s colleagues pointed out recently: it’s about time Google did something impressive in this space! Its standout feature is the ability to generate surprisingly natural-sounding ‘auto podcasts’. I’ve been particularly struck by how the AI voice avatars exchange and overlap in their speech patterns, mimicking the cadence of real conversation. This authenticity is both impressive and slightly unsettling and at least two colleagues thought they were listing to human exchanges.

I tested this feature with three distinct topics:

Language learning in the age of AI (based on three online articles):

A rather flattering exchange about my blog posts (created in fact by my former colleague Gerhard Kristandl – I’m not that egotistical):

A summary of King’s generative AI guidance:

The results were remarkably coherent and engaging. Beyond this, NotebookLM offers other useful features such as the ability to upload multiple file formats, synthesise high-level summaries, and generate questions to help interrogate the material. Perhaps most usefully, it visually represents the sources of information cited in response to your queries, making the retrieval-augmented generation process transparent.

The image is a screenshot of a NotebookLM (experimental) interface with a note titled "Briefing Document: Language Learning in the Age of AI." It includes main themes and insights from three sources on the relationship between artificial intelligence (AI) and language learning:

1. **"Language Learning in the Age of AI" by Richard Campbell**: Discusses AI applications in language learning, highlighting both benefits and challenges.
2. **"The Future of Language Learning in an Age of AI" by Gerhard Ohrband**: Emphasizes that human interaction remains crucial despite AI tools in language acquisition.
3. **"The Timeless Value of Language Learning in the Age of AI" by Sungho Park**: Focuses on the cultural and personal value of language learning in an AI-driven world.

The note then expands on important ideas, specifically on the transformative potential of AI in language learning, such as personalized learning and 24/7 accessibility through AI-driven platforms.

Meanwhile, ChatGPT’s latest update advance voice feature (not available in EU, by the way) has addressed previous latency issues, resulting in a much more realistic exchange. To test this, I engaged in a brief conversation, asking it to switch accents mid-dialogue. The fluidity of the interaction was notable, feeling much closer to a natural conversation than previous iterations. Watch here:

What struck me during this exchange was how easily I slipped into treating the AI as a sentient being. At one point, I found myself saying “thank you”, while at another I felt a bit bad when I abruptly interrupted. This tendency to anthropomorphise these tools is deeply ingrained and hard to avoid, especially as the interactions become more natural. It raises interesting questions about how we relate to AI and whether this human-like interaction is beneficial or potentially problematic.

These developments challenge our conventions around writing and authorship. As these tools become more sophisticated, the line between human and AI-generated content blurs further. What constitutes a ‘valid’ tool for authorship in this new landscape? How do we navigate the ethical implications of using AI in this way?

What are your thoughts on these developments? How might you see yourself using tools like NotebookLM or the advanced ChatGPT in your work?

Sources used for the Langauge ‘podcast’:

  1. Language Learning in the Age of AI” by Richard Campbell
  2. The Future of Language Learning in an Age of AI” by Gerhard Ohrband
  3. The Timeless Value of Language Learning in the Age of AI” by Sungho Park

AI3*: Crossing the streams of artificial intelligence, academic integrity and assessment innovation

*That’s supposed to read AI3 but the title font refuses to allow superscript!

Yesterday I was delighted to keynote at the Universities at Medway annual teaching and learning conference. It’s a really interesting collaboration of three universities: University of Greenwich, University of Kent and Canterbury Christchurch University. Based at the Chatham campus in Medway you can’t help but notice the history the moment you enter the campus. Given that I’d worked at Greenwich for five years I was familiar with the campus but, as was always the case when I went there during my time at Greenwich, I experienced a moment of awe when seeing the campus buildings again. It’s actually part of the Chatham Dockyard World Heritage site and features the remarkable Drill Hall library. The reason I’m banging on about history is because such an environment really underscores for me some of those things that are emblematic of higher education in the United Kingdom (especially for those that don’t work or study in it!)

It has echoes of cultural shorthands and memes of university life that remain popular in representations of campus life and study. It’s definitely a bit out of date (and overtly UK centric) like a lot of my cultural references, but it made me think of all the murders in the Oxford set crime drama ‘Morse’.  The campus locations fossilised for a generation the idea of ornate buildings, musty libraries and deranged academics. Most universities of course don’t look like that and by and large academics tend not to be too deranged. Nevertheless we do spend a lot of time talking about the need for change and transformation whilst merrily doing things the way we’ve done them for decades if not hundreds of years. Some might call that deranged behaviour. And that, in essence, was the core argument of my keynote: For too long we have twiddled around the edges but there will be no better opportunity than now with machine-assisted leverage to do the things that give the lie to the idea that universities are seats of innovation and dynamism. Despite decades of research that have helped define broad principles for effective teaching, learning, assessment and feedback we default to lecture – seminar and essay – report – exam across large swathes of programmes. We privilege writing as the principle mechanism of evidencing learning. We think we know what learning looks like, what good writing is, what plagiarism and cheating are but a couple of quick scenarios to a room full of academics invariably reveal lack of consensus and a mass of tacit, hidden and sometimes very privileged understandings of those concepts.

Employing an undoubtedly questionable metaphor and unashamedly dated (1984) concept of ‘crossing the streams’ from the original Ghostbusters film, I argued that there are several parallels to the situation the citizens of New York first found themselves in way back when and not least the academics (initially mocked and defunded) who confront the paranormal manifestations in their Ghostbusters guises. First are the appearances of a trickle of ghosts and demons followed by a veritable deluge. Witness ChatGPTs release, the unprecedented sign ups and the ensuing 18 months wherein everything now has AI (even my toothbrush).   There’s an AI for That has logged 12,982 AIs to date to give an indication of that scale (I need to watch the film again to get an estimate on number of ghosts). Anyway, early in the film we learn that a Ghost catching device called a ‘Proton Pack’ emits energy streams but:


“The important thing to remember is that you must never under any circumstances, cross the streams.” (Dr Egon Spengler)

Inevitably, of course, the resolution to the escalating crisis is the necessity of crossing the streams to defeat and banish the ghosts and demons. I don’t think that generative AI is something that could or should be defeated and I definitely do not think that an arms race of detection and policing is the way forward either. But I do think we need to cross the streams of the three AIs: Artificial Intelligence; Academic Integrity and Assessment Innovation to help realise the long-needed changes.

Artificial Intelligence represents the catalyst not the reason for needing dramatic change.

Academic Integrity as a goal is fine but too often connotes protected knowledge, archaic practices, inflexible standards and a resistance to evolution.

Assessment innovation is the place where we can, through common language and understanding, address the concerns of perhaps more traditional or conservative voices about perceived robustness of assessments in a world where generative AI exists and is increasingly integrated into familiar tools along with what might be seen as more progressive voices who, well before ChatGPT, were arguing for more authentic, dialogic, process-focussed and, dare I say it, de-anonymised and humanly connected assessments.

Here is our opportunity. Crossing the streams may be the only way we mitigate a drift to obsolescence! MY concluding slide showed a (definitely NOT called Casper) friendly ghost which, I hope, connoted the idea that what we fear is the unknown but as we come to know it we find ways to shift from engagement (sometimes aggressively) to understanding and perhaps even an ‘embrace’ as many who talk of AI encourage us to do.

Incidentally, I asked the Captain (in my custom bot ‘Teaching Trek: Captain’s Counsel’) a question about change and he came up with a similar metaphor:

Blow Up the Enterprise: Sometimes, radical changes are necessary. I had to destroy the Enterprise to save my crew in “Star Trek III: The Search for Spock.” Academics should learn when to abandon a failing strategy and embrace new approaches, even if it means starting over.”

In a way I think I’d have had an easier time if I’d stuck with Star Trek metaphors. I was gratified to note that ‘The Search for Spock’ was also released in 1984. An auspicious year for dated cultural references from humans and bots alike.

—————–

Thanks:

The conference itself was great and I am grateful to Chloe, Emma, Julie and the team for orgnaising it and inviting me.

Earlier in the day I was inspired by presentations by colleagues from the three universities: Emma, Jimmy, Nicole, Stuart and Laura. The student panel was great too- started strongly with a rejection of the characterisation of students as idle and disintersted and carried on forcefully from there! And special thanks too to David Bedford (who I first worked with something like 10 years ago) who uses an analytical framework of his own devising called ‘BREAD’ as an aid to informing critical information literacy. His session adapted the framework for AI interactions and it prompted a question which led, over lunch, to me producing a (rough and ready) custom GPT based on it.

I should also acknowledge the works I referred to: 1. Sarah Eaton whose work on the 6 tenets of post-plagiarism I heartily recommended and to 2. Cath Ellis and Kane Murdoch* for their ‘enforcement pyramid’ which also works well as one of the vehicles that will help us navigate our way from the old to the new.

*Recommendation of this text does not in any way connote acceptance of Kane’s poor choice when it comes to football team preference.