Limitations of generative AI tools
In the previous step, you considered prompt engineering and ways to get the most from interactions with generative AI tools. In this step, you will have the opportunity to consider real and assumed limitations of generative AI tools.
AI-generated image based on prompt: “photorealistic computer keyboard with hands poised to type”, August 2024, via Midjourney.
Generative AI limitations
There has been an explosion of discussion about what generative AI can or will be able to do. But where are the limits? In education, this understanding becomes increasingly crucial as we design curricula and assessments. Whilst initial worries across education were focussed on writing and text production by large language models, the landscape has dramatically shifted. We’re now grappling with multimodal models that can seamlessly handle text, images, video, and even code simultaneously.
In further and higher education, conversations have become much more complex than concerns about academic integrity breaches as a consequence of students handing in AI-generated text (though these persist, of course). As these tools become deeply embedded in daily academic life and the boundaries between human-created and AI-assisted work continue to blur, we’re seeing a fundamental shift in how we think about assessment and learning. The notion of creating ‘AI-proof’ assignments has largely been abandoned as naive, replaced by more nuanced discussions about appropriate use and integration of these technologies. It’s a debate that is divisive and complex. In order to position ourselves, we need to consider what we believe the purposes of learning in any context to be and how we might evidence that learning. It may be that use of AI tools to augment learning is entirely appropriate. It may be that it is not. It may vary from situation to situation! One lens through which to consider what is appropriate use is to think about limitations alongside potentials.
We’ll examine the current state of AI text and image generators to exemplify the ongoing issues, tendencies, and debates, with particular attention to how these have evolved since the previous iteration of this course in 2023.
Limitations lifted?
The limitations that were focal to discussions as recently as 2023 seem almost quaint now. We’ve moved far beyond concerns about training data cutoff dates , web connectivity or inability to process images. The latest models can analyse complex visuals, engage with real-time information and generate remarkably accurate citations if using grounded data or when drawing from academic databases. The issue of hallucination, while still present because of ways in which generative tools are designed and function, has been significantly reduced in premium tools—though it remains a concern in many free alternatives.
Whilst retrieval-augmented generation (RAG) systems (that is, systems that allow for document / data uploads to augment prompts) offer promise in mitigating AI hallucinations, they are not yet a comprehensive solution. RAGs enhance large language model (LLM) reliability by integrating the user’s own pre-defined and ‘approved’ data, yet they can still propagate outdated data or fill apparent gaps with ‘unapproved’ data. Additionally, newer tools are equipped with longer ‘context windows’—these represent a greater ability to ‘remember’ more of a conversation and thereby maintain coherence and continuity. To address this, complementary approaches are crucial. These include AI guardrails (usually pre-built in), which vet LLM outputs before user delivery; prompt engineering (which is essentially a skill you develop over time, with practice); and fine-tuning, which trains LLMs on specialised datasets. User feedback mechanisms can also refine models and enhance accuracy. Notably, increasing input length doesn’t guarantee better performance, as ‘prompt overloading’ can occur and memory limits can lead to the impression of ‘forgetting’ in the chats you have. The optimal approach combines RAGs with these strategies, acknowledging that hallucinations remain an inherent challenge in generative AI.
What’s particularly interesting is how the conversation has shifted from ‘What can’t AI do?’ to:
‘How well does AI do these tasks compared to human experts?’
We are constantly reminded of this question in every release note of new AI versions where companies highlight how their latest tools outperform or perform to the same level as humans in bar exams, SAT tests, etc. This is especially relevant in educational contexts where we’re seeing increasing use of AI tools for scaffolding learning rather than replacing human effort entirely.
Limitations linger
Despite rapid advances, certain fundamental limitations persist, and new ones have emerged. The ‘prompt engineering’ challenge remains—getting exactly what you want still requires considerable skill and understanding. This has led to an interesting phenomenon where an ability to interact effectively with generative technologies is often cited as a critical AI literacy skill.
The issue of ‘AI blandness’ and homogenisation of language has also evolved in interesting ways. While premium models can now generate more stylistically diverse content, there’s still an ineffable quality to human-created work that AI struggles to replicate. This is particularly evident in creative writing and personal reflection pieces, where authentic voice and genuine emotional resonance remain distinctively human characteristics.
A new limitation has emerged around what we might call ‘deep contextual understanding’. While AI can process and respond to complex queries, it still struggles with truly understanding disciplinary nuances and methodological traditions. This becomes particularly evident in graduate-level work where deep subject knowledge and methodological sophistication are essential.
Limitations in focus
The landscape of image generation has transformed dramatically since 2023, yet certain core limitations persist while new ones have emerged.
Bias
While some progress has been made in addressing demographic biases, new forms of bias have emerged. Models now show subtle biases in composition and style that reflect their training data’s Western-centric nature. The challenge has shifted from obvious demographic biases to more nuanced cultural and contextual biases. We will deal with several other critical issues later in the course, including the thorny issue of copyright in terms of both outputs and the use of images in the training of these models.
Technical limitations
While the ‘hands problem’ (distortions in certain components of AI generated images such as with hands, toes, or keyboards) has largely been solved by leading image generators, new technical challenges have emerged around consistent lighting, physics-accurate reflections and maintaining coherent perspective in complex scenes. As text to video generators become more common, we are seeing similar iterations and issues in quality of output, as well as rapid advances and improvements, emblematic of comparable improvements across the board for generative tools. These limitations become particularly evident in educational contexts requiring precise technical or scientific visualisations.
Integration challenges
The integration of text and images has improved significantly, but new limitations have emerged around maintaining consistency across multiple generated images or creating cohesive visual narratives. This poses particular challenges for educational materials requiring sequential or related visuals.
Although some tools have shown significant improvements where text is generated as part of images, even the most recent tools, such as Dall-e3 (used here in ChatGPT 4o subscription), still struggle with such requests. They offer with supreme confidence ‘accurate’ images such as the one below: I asked for “An accurately labelled human body suitable for a primary school classroom”. As you can see, this confidence is often misplaced.
An inaccurate anatomical illustration for a primary school audience, November 2024, via Dall-e3
Limitations due to limited finances
The most significant emerging limitation is perhaps the growing divide between premium and free tools. While top-tier AI models have overcome many previous limitations, access to these capabilities often requires significant financial investment, creating potential equity issues in educational contexts.
The rapid pace of development means that any discussion of limitations must be viewed as a snapshot in time. However, certain fundamental challenges persist: the potential for hallucination, the reflection of societal biases, and the challenge of generating truly original rather than derivative work. For educators, the focus should perhaps be less on identifying unchanging limitations and more on developing frameworks for critical engagement with these tools as they continue to evolve.
What this means for educators and students
But what does all this mean for researchers, lecturers, teachers, and students who just want reliability without having to dig deep into these issues and processes?
In the first months after the release of ChatGPT, you might often have heard talk of AI-proofing assessments based on perceived limitations. Whilst limitations linger, it is not productive to try to build teaching or assessment activities that assume AI deficits given the rapidity of change. What is apparent, though, is that both teachers and students need to be alert to both potentials and limitations if they are to understand how best to use (or to decide not to use) these technologies.
Information literacy that incorporates AI is essential. If we want to ensure that our use of these technologies aligns with our educational goals, we have an obligation to learn what the potentials and constraints are. The major problems with this, of course, is that teachers, irrespective of context, are often time-poor and the rapidity of change means the support mechanisms for developing that understanding are ill-developed. Most of us do not have time to learn exactly how our cars work in order to drive them but, surely, we all have a responsibility to lobby governments and manufacturers and car sales outlets to ensure minimum levels of safety. The same applies to generative AI. As a first step (beyond following this course!) I encourage colleagues to share experiences (good and bad), to be open and honest with students about the issues, to open dialogue about these technologies wherever possible and, above all, to experiment whenever they get the opportunity.
Please note: I sought AI assistance with the subheadings and also took on several recommendations from Claude 3 when re-writing this piece for this course. It has now been through multiple drafts, and I cannot say with certainty or 100% accuracy which bits are AI-augmented and which are my own. I would argue that in this context, this is a perfectly valid use of tools. Can the same be said of student assessments where quality of writing is NOT an assessed goal?
Now that you have completed this step, you have seen that the limitations landscape changes frequently. In the next step, you will hear about some of the ways students are already using and critiquing these tools.
Additional resources:
Ardito CG. Generative AI detection in higher education assessments. New Dir Teach Learn [Internet]. 2024 [cited 2025 Mar 5];[volume number unknown]:1-18. Available from: https://onlinelibrary.wiley.com/doi/10.1002/tl.20624
Try it out
In this step, you may try one or both of the following tasks:
- Using one of the image-generating tools below, ask for “photo realism” in the prompt and see whether it can produce text. Where might you use such outputs in your role?
a. Deep AI (free and accessible without registration).
b. Dall-e 3 in Copilot (free if you have institutional login to Microsoft).
c. Runway (free version available, registration required).
d. Adobe Firefly (free version available, registration required).
- Using a language model of your choice, ask questions about something you know a LOT about. Preferably, make it as niche as possible. How long is it before you identify something that is inaccurate?
Join the conversation
Which of the tasks did you try out? Share your thoughts on each in the comments.
- If you tried task 1, where might you use such outputs in your role? Are there any risks you anticipate with the use of image generation?
- If you tried task 2, how long did it take until you identified something inaccurate? How does this make you feel about the use of LLMs for learning?
Reach your personal and professional goals
Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.
Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.
Register to receive updates
-
Create an account to receive our newsletter, course recommendations and promotions.
Register for free