Tuesday, June 11, 2013

Blowing apart some of the myths of the PRISM reports

Ever since the adopting of the internet by the great unwashed, it seems that there is an endless luddite presence determined to find a "big evil" in the system. From predatory stalkers, evil trolls, crazy personality impersonators in chatrooms to outright fraudsters, it appears there is no end of bad people sitting there waiting to steal your identity, poke into your wallet, and the biggest one of all is those who quietly gather information about your life.

Lets get one thing straight here. Every time you logon to the internet, you're entering somebody elses service, built up of many moving parts. Most of these parts log, some of them store, and some even surreptitiously or otherwise gather information about you ever time you logon to practically anything. This is the price you pay for connectivity: in order to enter the system, somebody has to pay for the cost of maintenance, purchase, migration, whatever, and the way chosen, the way that appears to work, it seems, is a combination of a charge for basic service access at ISP level, free services in return for vast information gathering rights, and finally, premium services paid for by the user(s). This is the model that has evolved since the heady days of the collapse of not only dot.coms, but large infrastructure builders such as Nortel and Worldcom. What people forget is that these service went bankrupt in the race to provide basic infrastructure services to the world. In Nortel's case, hardware, in Worldcom's case, the UUNET backbone.

To this day, the very existence of the internet itself as we know it depends on the continued existence of "core" backbone services that translate addresses from human languages into 0s and 1s. These companies, services and the organisations that have made them up have come to depend on commercial funding for their existence. The days when the internet could survive on the back of academic and limited corporate services is long gone. Big backbone providers need funding in order to continue to exist, and the one and only model which appears to have worked is that of the advertising model. Think about email for example. How many people actually pay for an email service nowadays? Or if you do (as I do myself), its almost certainly bundled with web addressing, reverse DNS, webspace. This doesn't cover the full cost.

To give you an idea of costing very basic services, let me allow you into the world of infrastructure. First you need a data centre location with tightened security, backup power resources, "clean power" and cooling. Then you need to install at least one or more network infrastructures locally and connect to external provider (a cost of 1 million euros for this alone isn't unusual in Ireland, the UK/US costs for a major city would be about 1/10 of this due to the tacit monopolisation of the Irish telecoms infrastructure by €ircon). Then there is racking, stacking, putting in place network cabling underfloor and ceiling level, fire prevention, cooling, resilient power and network systems. This probably a few more million in the build phase and a degree of maintenance.

And we haven't even put in a single server yet.

Basically, if you are buying hardware, you're buying one of 4 things:
· appliances that just do a specific thing, like firewalls, proxies, load balancers, packet shapers, VPN boxes, etc
· general network devices - typically routers and switches
· servers - usually in the form of rack mount individual servers or enclosures that contain "blades" - basically these share certain resources that make management easier
· storage - this ranges from simple, rackmounted devices that are basically a large server with many disks to huge, multi rack devices containing disks, optical storage, tape libraries, etc

You also find numerous other kinds of devices - monitor/keyboard devices, console switches to enable you to connect to a large number of servers from a single keyboard/monitor, tape libraries and robots. All kinds of stuff. It depends on the company, the services they're providing, to whom and at what scale. Most moderate sized multinationals will have 4-5 rows of racks with maybe 500-600 physical servers in a "small" DC. This is where I earn a lot of my bread. Virtualisation of course, allows you to connect storage, networks and servers in a fashion that lets you turn this figure into potentials hundreds more servers. Typically, I see "virtual" servers running 10-20 additional servers. These usually require software licenses - which we haven't even started on. It used to be the case that hardware cost considerably more, but that has reversed. A typical enterprise grade server that I would recommend comes in at around 10k (USD or euro) plus tax. I'd never be "buying" them in ones - I normally buy "clusters" of identical servers at maybe batches of 4 or more. Sometimes a LOT more. Depends on the project. So a typical order is usually anything from 50-300k.

Storage is pricier. The last quote I saw for ent grade storage was 400,000 euros. This was for a "small" company!! This is where current "big budgets" go. Again, software adds up the cost. Generally a lot of companies sell you the hardware with some services or capacity disabled and you pay a little more (well "little" can mean hundreds of thousands) and they either dial in remotely and make it instantly available or provide you with a license key.

A typical company using a lot of databases will have a huge bill. Oracle charge annually, Microsoft are still once off. Again I saw Oracle do a "deal" with a "small" 300 man company for 1.5 million for just one year. Microsoft - 80k is the starting point for a single one cpu SQL license, but you usually cluster in pairs if its hardware, so you usually double up. With big "olap cubes" - basically 3D representations of complex data sets, multiple cpus are the norm, so this can mean hundreds of thousands in license fees.

Operating systems are usually a thousand or two per machine, but support is pricey. Basic services start at around 5k a year but most companies pay 10-100k. You kinda need some level so its worth it.

So you can see now why we need security. A typical small DC with 100 physical boxes can host millions of dollars of equipment - thats before we count software and the value of data. Thats where we start. Staff - operations start at about 35k per year in Ireland, running up to contract Database staff who earn around 100k a year. Most sys admins earn 50-60k in Dublin and between 35-45k outside. You can hire offshore, but realistically, I found even a ratio of 5:1 for offshore onshore still falls short on the skills side. So generally you hire offshore for very junior roles - remote monitoring, operations, basic administration, and get your senior staff in Europe or the US.

This why we love cloud computing. Amazon can sell you a basic server running 24x7 for as little as 240 euro a year plus VAT. It still adds up and you need your own staff, but you could replicate the above for about 24k per year plus software/network costs. You'd need to trust them though, and this is where we get back to trust and confidence.

Bear in mind these pricings though, because we'll come back to them later when considering the feasibility of what we've been told about PRISM by the press.

The cost of "free" services

Services like Google, Facebook etc are on face value, "free" to the end user. Except they're not. Your data, usually subject to a complex web of legal jargon, is basically handed over for marketing purposes. The theory goes, they pick up all they can about you, and then try to customise their advertising clients ads to the right people. We know Google do this effectively - basically they look at your browsing habits and customise around them. Google do this quite cleverly. Other companies are somewhere close to dire. Despite many article about Facebook information picking up on personality traits like your sexuality, in practice, they make an arse of this. For example, the things I don't openly disclose "on paper" are sexual orientation (partly because I have abandoned the concept of fixed identity there), martial status, relationship status. To be honest, its a big hole for offence to be built. I found in the past partners generally got upset about I said or didn't say online. So I stopped saying anything. I stick to that now, and see a lot of friends broadly the same.

Its extraordinary how coy people are though. I noticed a conversation with one FB contact last year who mentioned in passing that a relationship had broken up without specifying the gender of her ex. I was a very curious conversation as lots of friends sympathised, but clearly a few were not "in" on the detail. Eventually it got to a point where somebody said "who or what broke your heart?" It was funny in a sense that the poor bewildered chap hadn't picked up the extremely subtle subtext that the relationship was a same-sex relationship but the poster didn't want (or feel a need) to say this upfront! So I'm sure she's also getting the same adverts from Facebook for bridal wear, wedding flowers, etc, that I am . . .

There are two truths to all of this:
1. Once you make your information into somebody elses "system" (using the word in the systemic context of "ecosystem" or a collection of things which put together, do a purposeful act, like a computer service), you essentially lose control over where that data may end up. Ultimately, its use depends on the ethicality, consideration, internal security policy and skills of the provider.
2. Data mining is in its early phases and only as good as the social understanding of that data. If, like FB, data mining capability is weak on the marketing niche side, data in use may simply be weak demographic material if misunderstood. In other words, sure you can figure out my sexual orientation from my "friends" list, but if you don't look for it, and assume I'm straight, your advertisers value is being devalued. In other words, if you don't know what you're looking for, you won't find it.

Data theft is not uncommon and rarely audited for. Auditing of files is actually mostly limited in businesses to "tagged" documents which usually indicates confidentiality levels. Simple systems are add-ons to Microsoft Office that request users flag their document as being for internal or external use, confidential or personal. It doesn't look at the content, so you decide yourself. But even well trained users fail to flag documents correctly, which means plenty of documents are incorrectly monitored.

"Monitoring" in IT

I started my enterprise career in monitoring and systems management so I've copious experience and knowledge about this.

There are really two schools of monitoring/management: firstly scouring logs for specific events that indicate an exceptional condition that is then reported. This is generally done by automated tools. The problem is this: its HUGELY costly to build and implement, because you need to decide in advance what needle you are looking for in the haystack. This is often unknown or poorly designed. While makers of such systems (one standout is BMC) are good at building generic profiles to identify problem scenarios, they are almost entirely "systems failure" scenarios, not user monitoring. Also, lots of people don't know what to look for, so in the end some systems basically only monitor generic services and processes, missing entirely larger patterns of behaviour.
Security monitoring fills the gap, but again the tendency is to look for exceptions, because very often a considerable amount is blocked off or secured otherwise. So companies use systems to find patterns of behaviour. Typical examples are denial of service (DOS) attacks, continuous failed logins, traffic coming from a particular source (China is usually a red flag). Sometimes we do properly log VPN (basically access into the corporate network from outside) logs, but mostly to impose policy or locate inactive users.

This last kind really belongs to the second kind of monitoring: active monitoring. This requires real people and has been a dying art since the mid 2000s. This involved highly skilled teams who used their combined expertise to identify patterns and manage incident. They were "intelligent" agents designed to tie together information from multiple sources and make good decisions. Large scale patterns of behaviour missed by "blind" computer monitoring were obvious to them.

Thing is, this way of doing business is dead. Last team like that I worked on disbanded and were laid off in 2006, to be replaced by a 3/4 year attempt towards "total automation", building an ├╝ber system to automatically do everything our brains did. It failed. Not just in that engineering giant, but everywhere else I worked.

So what happened and what exists now?

Most of the big companies left their inadequate half-automated systems in place and rehired to replace the old systems management teams. Except that the teams are very different: welcome to idiotville. Staff are now basically training in watching a red light turn green, check a web page and ring somebody who actually possesses a brain. One of my colleagues at big blue had a hilarious act that imitated one of these workers, ferociously typing on his keyboard with his eyes crossed and tongue hanging out. In other words, an archetype of idiocracy. So yes, there is "monitoring" but in practice a lot of SMEs have abandoned the exercise entirely. Some 3rd parties run expensive service monitoring key critical services where they basically get their own idiocracy instead of your hiring your own, but in reality, the cost and skill levels and headcount needed for all-singing, all-dancing manual or automated systems isn't there. And this is my problem with most of the reporting and interpretation of PRISM. If the corporates struggle themselves to deliver perfect zero-day monitoring systems, how are government going to be able to come in and place reliable surveillance on the same systems at user level?

In practice, the actual level of work needed to monitor an individual user is resource-intensive. Its not as automated as you might think. In addition, users can be "anywhere" in the infrastructure. For example, if its an employee in a typical corporate, you're talking logons (check your DCs and NIS boxes), his/her computer (take a copy), shared storage (hard - as rarely properly recorded), user data on shared storage, data on optical media, proxy info for web browsing, and then the HUGE world of applications. The latter is the worst because sometimes these are bespoke - and they certainly would be for Google/Facebook. So you'd need specialist training. They'd need up to date accounts that don't get repudiated, they'd need VPN pipes in. In fact, ideally, what they'd actually need is full time staff sitting on the providers site, with open access to staff.

I've done 3rd party audits and data recovery for data commissioner requests. I've done restores for "legal reasons." I've done this kinda work and it isn't a "typical" casual small piece of work. They call senior managers, it filters through the various levels, until call the sys admin happens. You're the end of a long chain of communications. Generally its very specific: exchange mailbox to be extracted, a copy taken of a users SAN based home directory. Maybe a citrix profile. AD login trails. Most of these are easy enough to find but need a domain-level systems administrator with experience. And sometimes a dogsbody to go find backup tapes, drive them to the right site, load up the tapes. The last one I did I reckon cost the equivalent of 1 day of desktop contractor day rate plus my day rate (the customer in fact was paying 1k a day for me). Its not cheap. Now multiply that by a couple of thousands. I can see why Twitter refused: almost all Twitter data is ostensibly public by Twitter's very nature - so why should they have to go to any further levels? But for the others, its not something you could put in place without a big programme, a programme which would be large and costly.

In fact, going on my knowledge of one old client, who had a metabase of customer information for 1 million customers of a particular service which came in at an Oracle database set which ran to 4-5 800GB backup tapes, I know for a fact that a complete extract was 4 TB. We had to go out and buy an external hard drive to take a copy for a particular extract. And it took myself and a database administrator nearly a day to complete. That was literally "direct access": we hooked up the external drive in the DC and ran a database extract. We needed it for application work that was taking place. Painful. Now multiple by 4000 (assuming we mean "Amercian million"). Thats Facebook alone. 16,000 TB of data that is.

Lets look at my storage options for storing 16,000 TB of pure logs. 16k TB is basically 16petabyte. Believe it or not, there are storage appliances designed just for this (of course!) - but that kind of capacity is not widely needed. So lets look at some typical options. As a result its hard to know the real world pricing without a quotation. What I do know though, that the installation price for that scale of system is around half a million (thats fairly standard), plus annual support for around about 1 million dollars. We're assuming also, by the way, this is US pricing - go abroad and you will pay considerably more, despite the fact that the products are manufactured, typically, in the far east. Even in the far east you generally pay a premium.

Lets take a really good example - Netapp's FAS6290 - this gives a raw capacity of 5760 TB. Thats a LOT of storage. But then you have to RAID the disks, so typically, lets say you lose 1/3 of the raw capacity - so lets say we've real capacity of 3600TB. Ok, now we need 5 of these beasts, and a medium sized DC with huge network capacity because collection of this scale of data takes a massive pipe - assuming the line about active/passive realtime monitoring is true and not a hysterical paranoid myth.

So whats the rough cost - very difficult to work out, as the price list (itself REALLY difficult to get hold of, but in the industry, you can, eventually dig it out of a reseller). The base unit is just over 87k, disks are about 350,00 for a nice 18.75TB tray - seems ridiculously expensive by the way? So lets price on a lower end device that is cheaper, and just consider RAW disk capacity, not including base units, memory, fibre switches, cabling, network switches, management servers - well lets say, these clock in at around 2 million - based on installation of 500k, 5 base units for a FAS6290 class device and all the additional software/hardware. 48TB of storage is currently around 30k in dollars (discounted). So that means 16000 will require about 1/3 extra for RAID overheads - lets estimate the raw disk space need at 20,000TB. At this price level the disks are 416 trays @ 30k = 12.5 million. Add the other hardware plus installation and thats just under 15 million. Plus 416 trays will fill more than 16 racks plus base unit and your other servers and switches. By the way, you've already gone past my current data centres capacity. And this is just Facebook.

Now lets multiply this by say, 4, to consider Google, Microsoft, Yahoo, etc. Now we're up to 60 million dollars. Just for STORAGE hardware. Servers are around 10-12k a pop. We'll need a hundred or so - assuming we'll host virtual machines. Plus appliances for secure access to networks. Plus networks. My DC is now bursting.

The cost is now up to 70-75 million. We're told the PRISM budget is 20 million pa.

Oh and seemingly the CIA mole was paid 200k a year. You got it right, 200k. You actually wouldn't need a huge number of him. Maybe about 40 operatives, including managers/service coordinators. So wages for 5 years are now up to 40 million.

So now the 5 year lifecycle for PRISM is 120-125 million. But wait, we're told the budget is 20 million.

Something isn't adding up. For a start, if the budget is 20m, and wages are 200k per head, then if the cost includes headcount (which typically US-based pricing ALWAYS includes as a matter of course, as it is the key operational cost usually - the hardware above is once-off and will last 4-5 years). If we believe the figures given, a staffing figure of just 40 would consume 8 million per year in wages. It doesn't add up.


Lets get one thing clear though. Nothing that the reporting on PRISM has said is technically impossible. You CAN passively log everything that goes on a network - and I really mean everything. Many corporates do, routinely, to spot leaks, keep out viruses, DOS attacks and other security threats. They very actively monitor employee network activity. Even on a basic system, you can mine huge amounts of personal information without much effort. But its hugely labour consuming, time consuming, costly, and full scale surveillance of even targeted systems and users is resource intensive. 20 million isn't enough for the "back door" theory. Its possible, but not at that cost.

Secondly, governments are notorious for failed and overspent IT projects. TAURUS, one of the most notorious ever, cost the UK taxpayer 75 million without actually delivering anything. The NHS-IT project, sadly, a very promising project, that hasn't delivered much, cost 2.7 billion pounds so far. Another 4 billion is earmarked. And on that note lets conclude a few things:

· high level, persistent surveillance isn't a fantasy - its feasible and considerable internal surveillance is already done at corporate level
· extending this to "external" end users, however, requires projects of enormous scale and cost
· considerable staffing levels and levels of cooperation between business & government would be required, much of it at low levels - where leaks are rampant
· the 20 million purported cost considerably underestimates the startup costs (95 million for the first year) and is probably an underestimate
· we also ignore that most systems are bespoke and in house and require extensive ongoing training and maintenance

So in conclusion, I don't think its possible for the "active key logging" element of the Guardian report to be realistic. The cost and scale of the activity is too high, its scale involves too many loose lips, and too much corporate intrusion - beyond the level of what is currently actively done. However there remain real data protection breach potentials where data is held outside of the US or about non domiciled individuals, and there is a question about this. But the idea that big brother in the CIA is watching your every keystroke and listening to your calls - this isn't happening, rest assured. At least, not yet.

No comments: