Conviction of Italian experts who failed to predict earthquake rattles scientists – The Globe and Mail

Conviction of Italian experts who failed to predict earthquake rattles scientists – The Globe and Mail.

This is rich! Speaking of information and predictability! It speaks volumes about the gross misunderstanding that “statistically challenged” people have about the concept of probability. I hope the poor scientists in this story are acquitted on appeal.

BBC News – Planet with four suns discovered by volunteers

BBC News – Planet with four suns discovered by volunteers.

Four suns! Reminds us of the scene from Star Wars when Luke Skywalker gazes on a horizon where a start is setting and simultaneously another one is rising!

What is interesting about this story though is the fact that the star system was missed by machine based detectors and was actually discovered by manual checking – some dedicated researcher was poring over the reams of data and spotted the anomaly that was the signature of the 4 star system. Of course, now in hindsight if you train the machine based system to try and spot those particular  types of patterns, it will do an exceedingly good job, and perhaps find many more such systems that humans would also miss. But it is very fundamental to note that the machine cannot look for something that it was not trained to look for. How could the human do it? Does this suggest that the human mind works in a fundamentally different way than machine learning.

Actually my humble opinion is NO. Human minds are in fact very similar to complex machine learning systems. The difference is that the range and type of training that the human mind receives right from the day of birth (and in fact even in the womb) is colossal and extremely varied. Additionally, there is the huge complexity of the human brain. In the future (perhaps in the far future?) the complexity of learning machine will equal and even surpass humans. Also, the machines will be exposed to equally varied and enormous amounts of data. In fact machines can even look at data that is potentially inaccessible or incomprehensible to humans such as machine logs, or fast global sensing networks. When these conditions are fulfilled machine will probably outperform humans even in very HUMAN tasks like detecting anomalies or looking for “interesting” patterns (where “interesting” is undefined).

There is a debate in the space exploration community about whether it is worthwhile to send humans into space when machines (rovers) can safely go to all types of dangerous places and gather data. The argument for human presence is usually that when a human sees something interesting he has instant feedback and will dig around more out of curiosity, while a machine lacks that capability (at least today). Of course rovers controlled from earth by humans don’t truly qualify as autonomous machines. By machine exploration I mean autonomous machines that look at the environment and make instant-on-the-spot decisions about what to do next by themselves. I think at current level of technology, a human presence is still justified, but in the future humans may actually get in the way rather than being useful for deep space exploration!

Such is the nature of creation: Once you create something substantial and worthwhile, that creation will often surpass you the creator, and assume a “life” of its own!

Computing Now | Facial Analytics: From Big Data to Law Enforcement

Computing Now | Facial Analytics: From Big Data to Law Enforcement.

This is an interesting article on big data in facial analytics. However the reason I am pressing this is not so much about this particular application of analytics (which is already well known, think Picassa face recognition), but rather because this statement caught my eye:

Facial analytics is an emerging soft-biometric technology that examiners can use to contextualize images of people without encroaching on their privacy. A facial analysis system explicitly divorces the recognition component from attribute generation: it doesn’t attempt to identify individuals or confirm their identity but instead generates descriptive metadata about them based on their face. This metadata includes elements like facial expressions, face pose or position, face shape, face age and sex, and other nonuniquely identifiable information.”

I am afraid this statement is very misleading and symptomatic of a serious misunderstanding of what “anonymization” means and what are its limits.

The article basically suggests that if you are able to collect and correlate a lot of detailed information about a person (how he looks, what he wears, his hair style, scars, what he buys, what he likes to eat etc etc) but you do not put a NAME label on the record, you have some how kept his identity secret. But in an information theoretic sense, very little additional information is needed to “connect” a data point to its label. So suppose a company inadvertently releases to the public this “de-anonymized” data, it will take very little effort on the part of experienced data scientists to correlate the data set against other external open databases and fill out the missing labels within no time. In this sense the company might have done little worse if it had actually released the fully labeled dataset in the first place!

If you have a hard time believing this, recall the case of the de-anonymization of the Netflix competition dataset. Researchers were routinely able to identify specific people based on their movie downloading habits captured in the dataset.

Th point I am making is that the notions of anonymity and privacy needed to be drastically reevaluated in the digital age. If you don’t live in a cave and eat raw meat from the bone, chances are that there is stupendous amount of information about you floating out there on the Internet. Much of it may have been put out there actively by yourself (Twitter, FB), while a vast amount may also have been collected from you under a form of duress (fill in this form or we won’t give you this service). So it is basically hopeless to hope for anonymity.

Privacy on the other hand is not the same as anonymity (though anonymity can ensure privacy).   Privacy means a a social contract that companies follow that frowns upon prying into people’s lives even if you can in a technological sense. Of course, prying into people’s lives is often very profitable, and profit is the raison d’etre of businesses, so they are not going to willingly abide by this contract. Hence what is needed is a strong set of laws that ensure that companies that break the contract are punished sufficiently severely that it disrupts their profit-loss equation.

Thus privacy is a legislative question, not a technological one. In the old days before the internet, privacy out of anonymity was feasible, but it is anachronistic in today’s world. What we need is some creative and fundamentally new thinking on privacy, where the consumer is put in the center of the arena as the owner of his information and as the person who gets the lion’s share of the benefit out of it.