∩ Security and Network Analysis (or, There’s No Excuse for Sloppy Thinking)

The original story in The Guardian by Ryan Gallagher was about multinational security firm Raytheon has developed a scrape-and-dump program called RIOT (for Rapid Information Overlay Technology), which gathers huge amounts of information about people from social media, and uses it to predict their movements.  There are all sorts of problems with this.

In a separate piece in The Guardian, James Ball points out that even the most innocuous information can be damaging in the wrong hands:

It’s easy to believe those with nothing to hide have nothing to fear – and most of us are essentially decent people, with frankly boring social network profiles. But, of course, to (say) a petty official with a grudge, almost anything is enough: a skive from work, using the wrong bins, anything. Everyone’s got something someone could use against them, even if only for a series of annoyances.

and it’s all too easy to to just forget that it can be taken out of context, a point also made by Jay Stanley at the ACLU:

When we post something online, it’s all too natural to feel as though our audience is just our friends—even when we know intellectually that it’s really the whole world. Various institutions are gleefully exploiting that gap between our felt and actual audiences (a gap that is all too often worsened by online companies that don’t make it clear enough to their users who the full audience for their information is).

Furthermore, Ball reminds us that one’s online privacy depends a great deal on other people’s technological ability and awareness:

It’s also tempting to believe that with good privacy settings and tech savvy, we can protect ourselves. Other people might be caught, but we’re far too self-aware for that. But stop and think. Do you trust every friend you have to lock their privacy settings down? Your mum? Your grandad? Do they know to strip location data from photos? Not to tag you in public posts? Our privacy relies on the weakest point of each of our networks – and that won’t hold.

But for me the heart of the matter is the misuse of social network analysis. Gallagher writes: “Using Riot it is possible to gain an entire snapshot of a person’s life – their friends, the places they visit charted on a map – in little more than a few clicks of a button.“ 

This vastly overstates the case: you cannot get a snapshot of the person’s life, only their social media trail.  The software also creates a “network” from these scraped connections, with every link treated as equally meaningful.  This creates two related problems: 1) it’s sold as a complete package, and its end users believe the hype; 2) it is used to create profiles of “suspects” who have no real relationship to the original subject of investigation.  Forbes writer Michael Peck hit that nail on the head:

There is no mention of violence in the video. Yet it’s worth noting that software that assembles a profile of someone’s movements would also be useful for government agencies who arrange for appointments between suspected terrorists and drone-launched Hellfire missiles.    

Context matters in network analysis.  I follow the National Intelligence Council (@ODNI_NIC) and @BronxZoosCobra as well as @OccupyWallSt on Twitter.  I see no evidence that this program is able to differentiate among my relationships to any of these entities at all, let alone better than a human analyst.  Am I a closet Slytherin, perhaps, plotting to take over the revolution (and thence the world)?  Then how does the fact that I also follow @BettyMWhite figure in?  An example I often use to demonstrate how meaningless “closeness” can be in a network is this: my thesis adviser could introduce me to the Secretary General of the U.N., who could introduce me to the President of the U.S.  So I’m three links away from the president.  What does that mean for my input on policy? Absolutely nothing.  I’m “close” (whatever that means), but it has no meaning, because I don’t have any impact at all.

People are not one-dimensional, and incredible amounts of data in one dimension do not (and cannot) predict behavior or thoughts in other dimensions. The incredible amounts of data that are becoming available need more theoretical underpinning, more thought and judgement applied, and more empirical hypothesis testing.  Just gathering data and dumping it in a blender will find even more spurious correlations than ever, otherwise.  Given how many people are already “collateral damage” because they were in the wrong place at the wrong time, it behooves us to be more careful about positing meaningful relationships, not less.

In the meantime, as Peck notes, programs that scrape-and-dump can be countered by two simple tactics: either stay off social media altogether, or spoof it.  Spoofing it could be a lot more fun – after all, on the Internet, nobody knows you’re a dog.

February 19th, 2013 6:07pm

Posted in