Saturday, 19 April 2008

The Fallacy of the 'Residual Category'

I've finally succumbed to pressure from the 'Angellists' (if you don't know who they are, it's best not to ask) to start a blog. So it's all their fault.

I’ll start by sounding off on a favourite theme of mine: the only property that systems have in common is that THEY ALL FAIL … eventually. But it’s true. I’m always on the lookout for perverse systems failures, so please let me know if you come across any novel examples.

It’s all to do with the complexity in the interaction between computer systems and human activity systems. The evidence is there for all to see. Consider the problems with biometric databases. Baroness Anelay of St Johns, with a group of British parliamentarians, was once given a demonstration of a facial recognition system. It failed; indeed the system subsequently crashed, twice. The reason? A 'gentleman' at the biometrics company told the baroness that her face was “too bland.”

In 2000, Raymond Easton, a 49-year-old man living in Swindon was charged with a burglary in Bolton, 200 miles away. His DNA matched some found at the crime scene. The problem was Easton was in the advanced stages of Parkinson's disease, and could barely dress himself. Only after an advanced DNA test was the initial match proved to be a ‘false-positive’: this is when innocents are identified as guilty, for whatever reason – ‘false-negatives’ are when the guilty slip through the net.

Television programmes like CSI (Crime Scene Investigation) trumpet the myth of forensic investigators vacuuming up biological material from the scene of crime, and comparing DNA samples with a computerised database, until finally out pops the criminal’s name: end of story! Nothing is that simple. Official figures admit to a 4% error in the database. Felons will vacuum up DNA from football crowds and collect cigarette-ends. Low paid hospital staff will be compromised to supply hospital detritus: samples of blood, skin, saliva, and other biological material. Aspiring criminals, while perpetrating a crime, will randomly scatter an arbitrary collection of DNA material all over the crime scene, and the whole system will be compromised.

Fingerprints are problematic. In 1997 Shirley McKie was a police detective in Kilmarnock, Scotland. During the investigation of the murder of Marion Ross, it was claimed that she had accidently left her thumb print inside the house where the murder took place. McKie was adamant that she had not entered the property, and this was affecting the credibility of the police case. She refused to back down, and was arrested in a dawn raid the following year and charged with perjury. The only evidence was the thumb print allegedly found at the murder scene. Two American experts testified on her behalf at her trial in May 1999 and she was found not guilty. The Scottish Criminal Record Office would not admit any error, but Scottish first minister Jack McConnell later said there had been an "honest mistake". On February 7, 2006, McKie was awarded £750,000 in compensation.

The Chaos Computer Club (CCC), the long-standing German hackers’ club, has shown how to capture fingerprints, transfer them onto a foil, and then wear it to beat biometric readers across Germany. To add insult to injury CCC has published a fingerprint of the German Interior Minister, Wolfgang Schauble, a vocal supporter of biometrics.

All of these cases are examples of problems resulting from self-referential tunnel vision, all caused by the fallacy of the ‘residual category’. In creating a computerised system, designers first identify and then categorise certain entities (and their properties) as being of interest – as data. Focussing on these data categories, everything else is dumped into one big residual category, and ignored. However, the representation of each ‘interesting’ element can only ever be a pale shadow of the totality of the thing itself. All other aspects of that thing are deemed unnecessary, and they too are discarded in the residual category. The categorical representation of ‘each thing as data’ is not identical to the thing-in-itself: because ‘the map {the overall data structure} is not the terrain.’

Each shadow element will remain ‘structurally coupled’ to the ‘rest of the world,’ but in creating the computerised system all these couplings are cut and discarded. Therefore, treating the ‘remainder’ as a separate ‘residual category’ implies that these couplings have simply disappeared, which means the two parts (the data, and everything else in the residual category) no longer comprise the original ‘whole.’

Hence, the system, by its very nature, introduces an asymmetry: the couplings are made to disappear from any representation … but they are still there in the world. The two artificially separated parts continue to operate (and perhaps interact) as the unobservable whole. Because of this asymmetry (between the world as it is, and as it is represented), all data is conditional, but those conditions are necessarily unobservable, unappreciable. However, they can be appreciated by others who take a different perspective, and derive different categorisations outside this self-referential loop.

System designers (and users) always have tunnel vision, assuming that everything in the residual category will mind its own business and not interfere. However, the analysis implicit in the system’s design is not the only one. A different perspective will beget different observations, different interpretations, different categorisations … a different analysis and a different system that will compete with the original. Natural selection, and not mathematical sophistication will decide which system is the most appropriate.

This non-referential aspect of every data entity in the system means that all such data is necessarily a misrepresentation. Another observation is required to clarify the situation, however, that too will introduce new distinctions, bringing with it new partially unobserved interferences, new (mis)representations.

This problem is apparent in all attempts at categorisation. A choice of categories may solve preconceived problems, however, bewildering situations will inevitably arise that finesse, even reverse, the best intentions of analysts.

Truncated and trailing structural couplings, so casually discarded by the system, stay on to haunt and interfere with the user, and they can reassert themselves in the most inconvenient ways. One particularly good example of how opportunists can take advantage of the asymmetry is the so-called ‘click fraud’ in on-line advertising.

Analysts from Google, Yahoo and others have developed the highly profitable pay-per-click system. Anyone can display ‘Google ads’ on their Web sites, and any visitor who clicks on the ‘ad.’ is transferred to the advertiser’s site. Every click is charged to the advertiser, and the income is shared down the food chain, some eventually ending up with the site displaying the advertisement. Apparently advertisers get better value for money than with old media, because they only pay for ‘live ones,’ those interested parties who bothered to click on the ‘ad.’ The value of this business in the US alone is well in excess of 10 billion dollars annually.

Those not interested in the products supposedly don’t click the advertisements. Oh no! People in this residual category of ‘disinterest in the products for sale’ may take a very different perspective, and show a lively interest in the ‘free money’ on offer – hence the ‘click fraud.’ I should add that I’m not sure it is a fraud. If a company announces ‘get someone to click on my site and I’ll pay you,’ then they shouldn’t be surprised if some of the visitors come from the residual category of non-customers.

The opportunists who categorise the world differently set up dummy sites filled solely with ‘Google ads.’ They then hire people to click on the ‘ads’, with no intension of buying anything of course, and in this way sites can make quite a few ‘bucks per click’. Such moneymaking antics have even been automated. It has been estimated that this click fraud costs business around half a billion dollars a year. Google, Yahoo are intercepting the more obvious frauds, but it still goes on.

Aren’t residual categories wonderful? I’d love to hear of the particular experiences of anyone (anonymously of course!) who has set up such a site.


Govind said...

Great to see you on the blogging sphere at last. Welcome !

Conrad said...

Facebook seemed to have the same problem nowadays when it is dealing with more ads and application being added. Each day my Home is flooded with application requests and I remember once after 30 days there were almost hundreds of applications which took me almost an hour to rid of them. the same nowadays come with wall messages in which people spam your wall with horrible data... is Facebook going to fail now? is the life cycle coming to an end for Facebook?

Tim Hannigan said...

hear, hear - This is going to be a fantastic blog!

Recall the funny examples I sent you about Google's adWords mismatching?

Inspired by your various calls to find IS failures in the "real-world", I've started a mobile blog (posting as I see it):

Exousia said...

At last the Angell of Do(o)m has his own blog!!!

mrmainelli said...

Delighted to see you taking residual categories to the masses. Ahh, but the masses versus the elite are also categories introducing further asymettries. You taught us too well! Best wishes,