ChatGPT from OpenAI is taking the world by storm these days.
The language models powering the service are so good that it can be difficult to detect whether something (whether a social media post or a blog article) is created with ChatGPT or by a human.
There have also been concerns about whether Google can detect ChatGPT and discussions around whether AI content, in general, is against Google guidelines (it's not...).
But can text coming from ChatGPT be detected?
The short answer is yes.
To test this each of the following detectors below, have been giving the same prompt and text.
TheAI-generated text was:
“There are several ways to detect if a language model like ChatGPT generated a text. One approach would be to use machine learning techniques to train a model to recognize the specific writing style and patterns of the language model. This could include analyzing the structure, grammar, and vocabulary used in the text. Another approach would be to compare the text to a large dataset of known ChatGPT outputs and look for similarities. Additionally, one could incorporate information such as the presence of specific tokens or patterns in the text known to be generated by the language model.”
We have explored a couple of different tools and how they work below;
UPDATE; We built our own AI Detector, that can both check for general AI and ChatGPT. Try it out, it's free!
10 ChatGPT detection tools
There are more and more tools popping up.
We have gathered 10 examples of tools and taken them for a spin.
They are all free of use, so you can test them on your own text if you want;
The AI detector developed by SEO.ai employs a sophisticated fusion of four cutting-edge detection models, each specifically trained to evaluate text based on prediction, entropy, correlation, and perplexity.
The detector has a high success rate in identifying content generated by GPT algorithms, including long form articles produced with ChatGPT, GPT-3.5, and GPT-4.
Copyleaks AIdetector can detect content generated by some AI text bot, including ChatGPT.
The detector from Copyleaks did not pass the test due to the result. It states that the giving text was written by a human - it is not!.
GPTZero was created by the 22 year old college student, Edward Tian. It determines weather or not a text is created by an AI using two factors: "Perplexity" and "burstiness." Perplexity measures the complexity of text; if GPTZero is perplexed by the text, then it has a high complexity and it's more likely to be human-written.
The detector from GTPZero did not pass the test. It states that it is likely to be human-written.
4. GPT-2 Output Detector
GPT-2 Output Detector rates on a scale from 0-100 if the text is writing by an AI.
The detector from GPT-2 Output Detector did pass the test. It states that the text was likely to be AI-generated.
This ChatGPT content detector tool helps you identify whether content was written by a human or by ChatGPT. It free version, based on their algorithms, only states if the text is writing by a human or not.
PoemOfQuotes detector did not pass the test as it think it is written by a human.
Corrector is a fast and free online tool with a maximum of 300 words per run.
Corrector detector did pass the test as it think it is an AI-generated text.
7. Content at Scale
Content at Scale AI Content Detection is a tool that allows users to check the authenticity of their written content. It provides a score out of 100 to indicate the human-like quality of the content and the likelihood that it will be detected as artificial by Google.
Content at Scale detector passed the test as it think it is more likely to be AI-written than human-written.
8. Roberta OpenAI Detector - Huggingface
Roberta OpenAI Detector is a classifier that can be used to detect text generated by GPT-2 models. It might give inaccurate results in the case of ChatGPT-generated input.
Roberta OpenAI Detector passed the test too as it think it is AI-generated text.
9. ChatGPT Detector - Huggingface
ChatGPT Detector can detect whether a piece of text is ChatGPT generated, using linguistic features or using PLM-based classifiers.
ChatGPT detector passed the test too as it think it is AI-written.
GLTR is an abbreviation of “Giant Language model Test Room”. GLTR is trained to predict the next word given an input context.
The detector from GLTR states that it is more likely to be AI-written than human-written.
Writer AI Content Detector is a free tool developed by Writer.com. The tool is designed to help users detect AI written content generated by an AI writing tool, such as GPT-3 or GPT-2. The web-based tool provides a detection score, indicating the likelihood that the content was created by an AI.
The detector from Writer did not pass the test as it think it is human-generated content.
We covered this topic a bit earlier regarding whether Google could detect text coming from ChatGPT.
Who would need a ChatGPT detector?
So who would be interested in dissecting whether an LLM or a person makes a text?
You might initially think of schools as one of the places where they would like to know if a student has used their own set of skills to answer or not.
And yes, the ban on ChatGPT and moving back from computers to paper and pen have already started. (link)
But there are many different types of groups that might want to be able to detect ChatGPT-made content;
Several groups of people and organizations could benefit from being able to detect if a language model like ChatGPT generated a text.
- In the field of information security, organizations and individuals could use this detection to identify and counter attempts at deception or impersonation using AI-generated text.
- In online communication, platforms could use this detection to prevent the spread of misinformation or spam generated by AI.
- within journalism and media, fact-checkers and editors could use this detection to identify and label AI-generated content and ensure that readers are aware of its source.
- In education, schools and universities could use this detection to ensure that students are using their skills and knowledge to complete assignments and exams, rather than relying on AI-generated content.
ChatGPTs own answer on how ChatGPT might be detected;
So I asked ChatGPT what the different ways a text could be spotted as coming from ChatGPT were
There are several ways to detect if a language model like ChatGPT generated a text.
One approach would be to use machine learning techniques to train a model to recognize the specific writing style and patterns of the language model. This could include analyzing the structure, grammar, and vocabulary used in the text.
Another approach would be to compare the text to a large dataset of known ChatGPT outputs and look for similarities.
Additionally, one could incorporate information such as the presence of specific tokens or patterns in the text known to be generated by the language model. (quote)
In other words, machine learning can be looking at a text to see if it can recognize a pattern or looking at statistical analysis to see how likely words are used.
We dived a bit deeper in our article about Google and ChatGPT detection.
How to make ChatGPT text undetectable
As technology advances, so does the ability to detect AI-generated content.
However, it is important to note that creative people will always find ways to get around detection tools.
For example, AI-generated content can be modified to appear more human-like, making it harder to detect.
You can even ask ChatGPT to write more human or to make it less likely to detect by a ChatGPT detector.
So let's make an example of this and see if it's possible.
I took the text highlighted above, where ChatGPT answers how it could be detected.
This scores like 96% "fake" in one of the detectors, and thereby clearly not human-made.
And I prompted ChatGPT and tried;
- Rewrite the following text so that it's harder to detect as coming from ChatGPT by a ChatGPT detector
- Rewrite the following text. Write like a human, more spoken language, more line breaks
- Paraphrase the following, use random words and rare words. Write more like a human
And although some of the text now became more creative, the detector could still identify it as AI generated.
So I gave it a last try with the command "Rewrite following as a 24 year old human would;"
It returned the following;
So, one idea is to use fancy computer stuff to teach a computer to understand how the language model writes, like the way it uses words and grammar. Another way is to check the text against a bunch of stuff ChatGPT has written before and see if it matches. And, we could also look for certain words or patterns that only ChatGPT uses.
And now the score came down to indicating it's made by a human (61%)
So it is clear that even ChatGPT can output text that is more difficult to spot. But it seems hard to replicate and make constant.
But when AI-generated content is combined with human-written content, it becomes more difficult to detect.
And this is what we recommend you do most of the time.
Not to avoid detection, but to ensure that the content will be high-quality with some human input and direction.
Nevertheless, organizations and individuals must stay vigilant and continue to develop new and improved detection methods to identify AI-generated content.
Is Google interested in detecting whether ChatGPT writes a text?
Google is always looking for ways to present the best search results to its users.
To do this, they are working on preventing spammy content from showing up in search results. Both duplicate content, plagiarism, and earlier also, what they named as automatically-generated content.
This is because earlier automatically-generated content could be perceived as low quality and is sometimes used to manipulate search rankings rather than helping users.
Google has been trying to detect and remove this type of content from search engine results pages (SERPs) to maintain the integrity of its search results.
They want to focus on high-quality, human-generated content to provide a better experience for users and maintain the search engine's credibility.
In their guidelines, they describe it in this way;
Spammy automatically generated (or "auto-generated") content is content that's been generated programmatically without producing anything original or adding sufficient value; instead, it's been generated for the primary purpose of manipulating search rankings and not helping users.
However, with the rise of new AI technologies, it's becoming harder to differentiate between human-generated and AI content.
This is why the question of whether Google wants to differentiate between the two is becoming less clear.
But Google has clearly stated that they are not against AI content, as long as it is helpful for the users. They did this as recently as January 12th 2023 on Twitter.
Read more about why the misconception that Google is against AI content is still thriving.