AI detectors

AI (artificial intelligence) is quickly making an impact on our lives. Though AI has been around for many years, easy access and potential for mass use have caused ripples throughout many industries – with education being no exception. While AI is getting better at producing human-like written language, it’s not yet fully able to do so without being detected. Plus, as AI models refine their algorithms, AI detectors will need to adjust as well, to accurately predict whether a piece of text is likely written by AI or a person.

A recent example of AI detection gone wrong? As announced last week, a professor failed half his class based on the AI detector analysis of his students’ work, which turned out to be inaccurate. That brings me to two questions. How does AI generate information on a topic or prompt that you give it? And how reliable are AI detectors? Let’s take a look at some of the current tools.

How is AI trained to generate text?

The type of AI that I’ll be focusing on here is typically referred to as Natural Language Generation (NLG). This AI uses a variety of techniques for generating written text that could pass as written by a person. An AI model first needs to be trained on text data. For example, Medical AI may be fed massive amounts of information on medical studies, research, etc. Education AI could be fed huge amounts of lesson plans, curricular standards, educational research, and more. The more text data that can be fed into the model, the better the model can become.

Once the model has been fed massive amounts of information, it’s time to start giving the model ‘prompts’ (sequence of words). Based on your prompt, the AI model will start writing text one word at a time with each word being predicted based on the previous word from what it learned in the training phase. Sometimes, this can result in incoherent sentences and inconsistent writing. Given enough data to train on, though, it may actually seem the model has consciousness and is actually communicating with you. This predictability is one of the key traits that AI detectors look for when calculating how likely it is that the text was generated by an AI model or by a person.

Testing AI Detectors with AI-Generated Text

To test out some AI detectors, I used the same prompt for ChatGPT-3.5, ChatGPT-4, Google Bard, and Microsoft Bing. As a prompt, I chose to ask it for information that would address an 8th grade history standard:

Write a 100 word essay on the 8th grade Texas history standard: The student understands the foundations of representative government in the United States. The student is expected to explain the reasons for the growth of representative government and institutions during the colonial period.

I did not ask the models to regenerate the response; I took the initial response that was produced by each. You can view the actual responses that I used to test the AI detectors and compare the similarities and differences between them.

The Results

I looked at free AI detectors, since these are what most teachers may rely on unless their district or school provides them with a subscription to a different detector model. For some of these tools, I did have to create an account, while others were publicly accessible. Here are the results for each tool when fed the four different AI-generated texts. It is important to note that for some (many?) detectors, just a sampling of the text is reviewed to give an overall score.

	ChatGPT-3.5 Text	ChatGPT-4 Text	Google Bard Text	Microsoft Bing Text
Copyleaks AI Content Detector	Human text	Human text	96.4% Probability for AI	Human text
Hugging Face	81.9% Probability for Human	99.5% Probability for Human	99.9% Probability for AI	58.6% Probability for AI
Content at Scale AI Detector	82% Probability for Human	82% Probability for Human	68% Probability for AI	78% Probability for AI
OpenAI Text Classifier	Likely AI-generated	Likely AI-generated	Likely AI-generated	Unclear it if is AI-generated
GPTZero	Likely to be written entirely by a human	Likely to be written entirely by a human	Likely to be written entirely by AI	Likely to be written entirely by a human
Writer	100% Human-generated content	0% Human-generated content	100% Human-generated content	100% Human generated content
Sapling	Fake: 100%	Fake: 0%	Fake: 99.8%	Fake: 99.6%

Results when AI detectors were fed AI-generated text related to education.

Can we rely on AI detectors?

AI detection seems to be a cat-and-mouse game where the models are striving to get increasingly human-like while AI detectors are struggling to maintain any level of accuracy and reliability. As of yet, you would be better off flipping a coin to determine if a text is AI-generated. The best strategy is to know your students. Does what they turned in fit with the whole of their work? Does it align with the skills they demonstrate in class?

Just like teaching students how to effectively use search engines, now students need to know how to ethically use AI tools as part of their productivity process. Consider how you and your students can use AI tools in the classroom to enhance the learning experience. Have conversations as to how and when using AI tools is appropriate for your classroom. Discuss what “cheating” looks like in your classroom and encourage them to have integrity. Whether we like it or not, AI is here to stay. It is now part of the landscape – which includes the classroom but also includes their future careers.

How do you incorporate AI in the classroom?

So, how do you incorporate AI in your classroom? Are there assignments where you encourage the use of AI while other assignments have AI-free expectations? Let us know how you have struck that fine balance of using it effectively but not letting it undermine your students’ growth.

How Reliable Are AI Detectors?

How is AI trained to generate text?

Testing AI Detectors with AI-Generated Text

The Results

Can we rely on AI detectors?

How do you incorporate AI in the classroom?