Research done on the subject shows, among other things, that large language models are susceptible to abuse through creative engineering of the input to AI, giving people reason to be even more skeptical of what they read.
With near-universal access to models that deliver human-sounding text within seconds, we’ve reached a turning point in human history, according to new research from WithSecure™ (formerly F-Secure Business).
The research describes a series of experiments conducted with GPT-3 (Generative Pre-trained Transformer 3) language models that use machine learning to generate text.
The experiments used “prompt engineering,” a concept related to large language models that involves detecting input that produces desired or useful results, in order to produce different content, which the researchers here see as harmful.
Several experiments assessed how changes in the input to the currently available models affected the output of the synthetic text. The goal was to identify how AI language generation can be abused through malicious and creative prompt engineering, with the hope that the research could be used to guide the creation of safer large language models in the future.
The experiments covered phishing and “spear-phishing”, harassment, social confirmation for fraud, appropriation of a written style, the creation of deliberately divisive opinions, using the models to create input for malicious text, and “fake news”.
“The fact that anyone with an Internet connection now has access to powerful, large language models has a very practical consequence: it is now reasonable to assume that any new communication you receive has been written by a robot,” said Andy Patel, Intelligence Researcher, WithSecure, who also led the research. “Going forward, detection strategies will be needed to detect both malicious and useful content created by AI that also understands the meaning and purpose of the written content”.
The responses from the models in these use cases together with the general development of GPT-3 models led the researchers to several conclusions, including:
- “Promt engineering” will develop as a discipline. Same with the creation of malicious input.
- Criminals will develop traits enabled by large language models in ways we have never seen before.
- Identifying harmful or offensive content will become more difficult for platform providers.
- Large language models already give criminals the ability to make any targeted communication as part of an attack more efficient.
“We started this research before ChatGPT made GPT-3 technology available to everyone,” said Patel. “This development made us increase the pace of our efforts. This is because we are all, to some extent, Blade Runners now, trying to figure out if the facts we are presented with are actually ‘real’ or artificial”.