Inaccuracies of Machine Learning

Automation has found its way into every major technical industry.

And it’s no wonder why.

Streamlining operations with machines increases productivity and efficiency, especially for fields where large volumes of information are a factor.

When it comes to data loss prevention however, the wrong type of machine-learning tools can cost a network just as much as it benefits. Administrators often find themselves in an approach-avoidance conflict.

The Problems

Many DLP programs with machine learning functions rely on pre-set algorithms and regular expression patterns in order to function. These algorithms determine what “sensitive data” is and determine what controls and safety measures are activated in any given scenario. “This leads to serious issues in identification accuracy, the key to effective DLP coverage.”

First and foremost, false positives are produced by an overly expansive and generalized identification approach. Markers that are meant to be specific, but in the context of terabytes of data, end up being highly generic, and cause the false positives to pile-up. Additionally, false negatives allow important files and data streams to slip through the cracks. Pre-set programs are incapable of detecting subtleties in content beyond their own algorithmic structure. Factors such as context, timing, and the users involved in a particular data transfer, are often not factored into the programs assessment.

“In the Box” thinking

The “in-the-box” thinking of many machine-learning programs often leads to loopholes to circumvent protocols. Files that have already been flagged as sensitive can find their way through the program’s security protocols simply by having various elements of the file changed such as format conversion, copying, extracting, embedding, re-typing, compression, or file extension changes.

One major problem in particular pointed out by cyber security researchers is the issue of the constant data “switch-up” in the contemporary business world. Today’s companies introduce new confidential or proprietary data into their systems frequently. Many DLP platforms are not equipped to deal with the constant flow of diverse information.

The Solution: Math & Science

Using patented, scientific and mathematical models, the GTB data protection detection engines use an intelligent approach system to manage sensitive data. Rather than rely on set models, GTB programs regularly analyze data with intelligent algorithms. This approach virtually eliminates false positives by honing in on relevant data and only real exfiltration threats. False negatives are also prevented with these methods. Based on the indicators learned from already identified files and data streams, GTB can track sensitive content even when elements of a file or data stream are changed.

GTB’s Data detection that Works allows administrators to have the best of both worlds: High security assurance, along with the benefits of an automated platform.

adroll_adv_id = “UIOFH72HVBDSPBBLAJUZE6”;
adroll_pix_id = “HNO2CUNA4BAINCHLEPH2JH”;
/* OPTIONAL: provide email to improve user identification */
/* adroll_email = “username@example.com”; */
(function () {
var _onload = function(){
if (document.readyState && !/loaded|complete/.test(document.readyState)){setTimeout(_onload, 10);return}
if (!window.__adroll_loaded){__adroll_loaded=true;setTimeout(_onload, 50);return}
var scr = document.createElement(“script”);
var host = ((“https:” == document.location.protocol) ? “https://s.adroll.com” : “http://a.adroll.com”);
scr.setAttribute(‘async’, ‘true’);
scr.type = “text/javascript”;
scr.src = host + “/j/roundtrip.js”;
((document.getElementsByTagName(‘head’) || [null])[0] ||
document.getElementsByTagName(‘script’)[0].parentNode).appendChild(scr);
};
if (window.addEventListener) {window.addEventListener(‘load’, _onload, false);}
else {window.attachEvent(‘onload’, _onload)}
}());