Skip to main content
Get the article on our blog: What you need to know about your data before you launch your AI project.


AI #6: “The Data” is Not THE Data

In Show Me What You’ve Got , we gave you seven questions to ask when assessing the quality of your data for use in an AI model. Questions are great. Answers are better. Crucial to assessing the quality of your data is understanding the difference between “the data” as most people talk about data—and data as the binary driver behind every organizational process. Only then can you decide how to operate successfully within an increasingly AI-driven world.

When the industry, media, analysts and other people talk about “the data” they’re almost always talking about data as the river. They aren’t thinking about the DNA of everything that’s in the water. For example, data security solutions like DLP, NAC, and Zero Trust focus on preventing unauthorized access to the systems, networks, and devices that house data—thereby assuming that “the data” is protected. None of these were designed to look at data and its behavior at the binary level.

For the purposes of AI and an AI strategy, you need a way to gain visibility into not just the data itself, but also the unique ways that it’s used. Data must be seen and accounted for within the specific context of the organization—and as the foundation of its AI strategies.

That’s why organizations need data surveillance. They can know exactly the state of their data with positive proof and data chain-of-custody accounting. In seconds, they’ll know what every piece of data is doing, where it goes, how it proliferates, who’s accessing it, and how it’s used. CrowsNest data surveillance delivers visibility into any and all data, both structured and unstructured. This includes data like diagnostic imaging, video, email, PowerPoint presentations, collaboration threads, spreadsheets, audio streams, asset inventories, application code, device configurations, and internet searches. CrowsNest AI technology can even identify screen shots or pictures shot with a phone if that binary data comes across the network.

Data surveillance begins by interfacing with any data repository through a simple API. Next, CrowsNest fingerprints the data, cataloging all identified data without touching or modifying the data in any way. Working at the binary level, CrowsNest identifies where the data originates, as well as its purpose, level of sensitivity, structure, movement, and relationship to other data and users.

Track the Data

Once data is fingerprinted, CrowsNest follows the data everywhere it goes on the network. Patented machine learning and automation quickly establish a baseline of normal and acceptable data patterns.

When fingerprinted data behaves out of character with the rolling baseline, CrowsNest alerts you to a security event.

CrowsNest can automatically classify data, eliminating manual methods of tagging data or relying on users to make decisions about where to place documents. Create your own categories—file type, keyword, devices, users, time sensitivity, or others—and determine where you want content to reside. You can “data fence” content, restricting its movement with granular specificity based on the content, IP address, or other parameters. This means you can create policy for data that restricts which content can go where—into AI models or down to physical spaces within buildings.

CrowsNest also recognizes non-fingerprinted data on the network that fits your policy or classification requirements. This means you can be alerted to sensitive data that is moving or being used in violation of security or compliance requirements, and stop it before it becomes a potential breach.

Defend the Data

CrowsNest defends your data by identifying anomalous data behavior in real time. Data policies in CrowsNest can include tunable data exfiltration parameters. Any attempt to exfiltrate data—whether on the network to an external location or any movement of data attempting to leave a specific area—triggers an alert.

CrowsNest also identifies and isolates cyber threat activity occurring in data. It automatically detects data behaviors that are characteristic of ransomware, botnets, malware, Bitcoin, back doors, and command-and-control software. It will alert your team, as well as trigger action by other security solutions, if desired.

Your team receives contextual analysis, including reconstructed events, extracted payloads, and play-by-play analysis of the activity. Teams will know exactly what happened, where, and by whom—gaining a data chain of custody to support response and remediation. You can also have CrowsNest deliver full digital forensics data to a SIEM.

There’s No Time Like Now

There’s a lot at stake with AI, and it’s critical for organizations to proceed with caution. Even many original AI champions are backing off of their earlier enthusiasm as real-world wrinkles are emerging. What do you do today, and what should you be doing for the near-term future?

  1. You can block generative AI content from coming into your organization and prevent user queries into AIs like ChatGPT for now. Flying Cloud CrowsNest is already performing this function for several ;arge enterprises. Eventually however, AI will be baked natively into business tools and development IDEs. Blocking it won’t be an option.
  2. Formulate a data strategy. Your strategy should include assessing your data as it is today before moving forward with any AI, data security, or data governance initiatives. Data surveillance enables you to automatically gain visibility into all of your data and create a rolling baseline of normal usage.
  3. Determine the policies you’ll need for AI usage. Policies should define which data can be used for AI purposes; where data sets can come from; data quality thresholds, based on data surveillance findings; what data users are allowed to feed into other organization’s AI applications, knowing that the queries become part of the other organization’s AI model; and ensuring that sensitive data or IP is not used.
  4. Set human policies around AI.These might include articulating rules or best practices for developing AI projects; revisiting user privilege and password policies; and ensuring that humans are part of AI-based processes with authority and ability to monitor, intervene if necessary, and question results.

The Care and Feeding of Your AI

The bottom line with AI will always be the data fed into it. Data surveillance is a foundational capability for any organization moving into an increasingly AI world whether you are developing your own AI or using others’ AI tools. With more than two decades of security expertise and nine data surveillance patents, Flying Cloud is enabling companies to look at their data analytically and forensically with the ability to completely control where it moves. For the first time, you can easily see, track, characterize, and defend your data—to build a solid foundation for your AI future.