Anomalies In Big Data Are Not Guilty - That"s Stereotyping But Mathematicians Don"t Get It
The individual could be a person, a business, or something else, but it is this erroneous belief that is starting to challenge our system.
Have you ever heard that saying; "if you work 10% harder than everyone else, you will stand out 50% above the crowd.
" The reason that is a truism, and has become part of our common sense is because very few people want to put in any more than they have to.
Those who do should be congratulated and rewarded for their efforts, not chastised or called into question as they present themselves as an anomaly within the data.
We seem to want to build algorithms as we scour big data which help us see things which are unusual or shouldn't be.
This might be good for catching people who are fraudsters, money launderers, or criminals for instance.
However, just because an anomaly shows up in the big data, doesn't guarantee the person is guilty of anything, they just might be one of the people who are working 10% harder than everyone else.
You see that point yet? In many regards we tell our citizens not to stereotype, not to assume guilt, after all, this is the United States of America and everyone is innocent until proven guilty, but that is not how we design our computer data analyzing tools is it? I find this rather ironic because it is the mathematicians who are designing the tools to find anomalies in the data, and they themselves, being mathematicians in the first place, are anomalies amongst our society.
That is to say not everyone is good enough at math to even write the algorithms, or understand the math and the data as well as they are.
Of course, being anomalies themselves, they know what to look for, and they know when something is out a place, or perhaps not out a place, but out of the norm of the rest of the data.
That's all well and good, but sometimes the stereotypes are incorrect, and anomalies we find are actually positives, unfortunately, we are labeling them as negatives, creating false positives when we look for corruption, fraud, or criminal activity.
Perhaps this is one of the biggest problems I have in looking at the use of big data by law enforcement agencies, governments, and regulatory groups.
Indeed I hope you will please consider all this on a philosophical bases, because we might be shooting at the wrong people, the very people we need to take us to the next step.