Skip to main content
Get the article on our blog: What you need to know about your data before you launch your AI project.

back

AI #1: Deus ex Machina: The Antidote for AI-Powered Data Disasters


The topic of AI is making headlines and capturing imaginations. It’s sparking warnings and calls to pause development from AI experts and industry leaders. It’s also perpetuating mass confusion about what AI is—and isn’t. Should companies be embracing it or running from it? Maybe. If this answer makes no sense to you, you’re not alone. Here’s our take on where we see AI now and down the road, as well as our perspective on its impact for data usage.

When people talk about “AI,” they tend to think of it as “artificial general intelligence” (AGI)—software that can do any intellectual task a human can do. That’s not what’s passing for “AI” in most technology products and marketing language. That’s because there’s no definition or set of standards by which something can be judged to be intelligent. For a great discussion of the difficulty of defining what makes something intelligent in the AGI sense, check out the Harvard Business Review.

What’s passing for AI is actually a subset of capabilities known as machine learning (ML). In most cases, ML is a general-purpose tool designed to deliver actionable predictions that improve business efficiency. Also called predictive analytics, ML drives operational decisions in dozens of uses cases, from determining the likelihood of a fraudulent transaction to monitoring environmental conditions in perishable food shipping containers. ML is already widely used today.

What’s raising controversy is generative AI. This AI is based on algorithms that can be used to create “new” content, including audio, code, imagery, text, simulations, and video. Algorithms are based on large language models (LLMs) trained on massive datasets of existing content. For example, the recently announced Google PaLM 2 LLM is trained on 3.6 trillion strings of words (Source: https://www.computerworld.com/article/3697649/what-are-large-language-models-and-how-are-they-used-in-generative-ai.html). In the case of ChatGPT, its dataset is content scraped directly off the internet.

Uh, does anyone see a problem here? First, how can you call content “new” when it’s gathered (stolen) from existing content created and owned by real humans? It’s piracy and copyright infringement, at best.

Worse, the real-world consequences of using content without consent are mind-numbing. Consider the case of a company’s software developer who asks ChatGPT to write some code. He feeds in his requirements and ChatGPT returns code based on the request. The developer copies and pastes it into his code. So…where did that code come from? Who owns it and the resulting combined code? Is it licensed? If it is, do you now have to open source all of your code? Can you be sued for infringement? How secure can that code be if ChatGPT simply found it on the internet? What if the code came from an exploit website? Why wouldn’t bad actors build libraries of exploitable code, feed them to ChatGPT and then follow the poisoned breadcrumbs? Of course they will—it’s a massive, free distribution channel.

Worse x10, the request fed into ChatGPT revealed the developer’s need and project details. The request itself is now content fodder for ChatGPT. Anyone can have access to it and know what that company is developing. AI programming will do what you tell it to do and it will learn. How about a competitor designing some programming to destroy competing brands or suppress their search rankings?

AI is not yet a problem for most companies, but many are already raising concerns. It’s sure that AI will revolutionize attack vectors, making your task of protecting confidential data and IP exponentially more difficult. If you don’t know the data that runs your company, where it’s moving, how it’s classified, how people are using it, and who’s responsible for it—you’re already behind.

It’s time for a deus ex machina—an “unexpected device or event introduced suddenly to resolve a situation or untangle a plot.” In this case, it’s Flying Cloud CrowsNest data surveillance. CrowsNest uses preventive AI to protect data at the binary level from programmatic attacks—methodologies used by malware, phishing, BEC, C2, and other threats. Using the same preventive AI technology, we’re already helping companies block requests and content from ChatGPT. CrowsNest identifies requests based on destination address and effectively blocks them from leaving the organization. It identifies data on your network that is generated elsewhere and doesn’t fit your normal data content and traffic patterns. As you face demands for AI-driven tools, you’re going to need a deus ex machina to identify legitimate requests, legitimate users, data provenance, and safe data usage patterns. Let us help you proactively defend against data ownership and usage threats created by emerging AI technologies.