Sounds scary? Or just nuts?
But I am pretty sure you can’t imagine how much exactly algorithms are changing the world. Whenever I say, algorithms, people around me start nodding and talking about Google’s search results or how Facebook decides to show us ads or how Zynga prods us to buy extra ammo and other fertilizer. I am not saying that how Facebook interprets your photo with shorts and a silly grin is not important (well…) but this is post is about how algorithms may replace your average general physician, lawyer, accountant, and maybe even your favorite musician!! It is about how algorithms are changing the world in a far more fundamental way.
Before I write any further, all of the examples below are from three fantastic books – Automate This: How Algorithms Took Over Our Markets, Our Jobs and the World by Christopher Steiner, Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schonberger and Kenneth Cukier, The Signal and the Noise: Why So Many Predictions Fail — but Some Don’t by Nate Silver. All three are really good books and I strongly recommend that you buy (and then read) them.
Another quick note: whenever I say algorithms, I mean algorithms combined with power of Big Data. The ability to write excellent algorithms is not new but availability of cheap processing power, cloud computing, powerful data bases and some excellent open source technologies is critical in leveraging the power of algorithms.
The first time power of algorithms caught my eye when I stumbled upon a paper about Google Flu Trends – it was really boring with lots of charts and so, obviously, I ignored it. But then Google Flu Trends is how Christopher Steiner opens his book and that’s when I was blown away.
In 2009, H1N1, the new (as in no cure) flu virus was discovered. And if you don’t have a cure, it’s a really good idea to contain the outbreak, right? But to contain a new virus, you first need to know where the outbreak is happening. Center for Disease Control (CDC) in US asked doctors to report new cases of this virus to them – this way, if one city or region or state doctors report in large numbers, you know where the outbreak is located. Unfortunately, this information was 1-2 weeks outdated by the time it reached CDC – people going to doctor late, doctors reporting late and, CDC themselves tabulated the data only once a week (bureaucrats are same everywhere). As you can imagine, 1-2 week delay is a lot less useful in terms of containing a virus!
Enter Google (as you can judge from number of my posts – a company I find fascinating!) – Google took 50 million common search terms that Americans typed and compared it with historical CDC data on spread of flu (regular kind) between 2003 and 2008. Then they used a 400 million (yes, it is italicized because it is million) mathematical models to determine statistically significant correlation between search terms and spread of flu. Finally, their model discovered 45 search terms that indicated spread of the virus – and remember, this indication was available in near real time – invaluable!!
This one is even better:
So there is this company called Music X-Ray, founded by Mike McCready. As you a musician, you can upload your music on this website, Music X-Ray’s algorithms work its magic – and voila, you now know if you music is likely to be hit!! Sounds strange?
Well it kind of is but what this algorithm does is that use something called Advanced Spectral Deconvolution (don’t ask). This algorithms creates a visual 3-D structure our of tune’s patterns of melody, beat, tempo, rhythm, pitch, chord, progression fullness of sound, cadence and so on. The 3D structure is then compared to similar structures of a number of hit songs from the past. These hit songs create a sort of cluster – and the closer your music is to this cluster, the higher the chances are of it being a hit.
I know what you are thinking – and yes, this will identify songs which are similar to some hit song from the past. But then, how many really original musicians are there – 10?? And of course, this is not a full proof method but it does work. Mike McCready has been instrumental in ‘discovering’ some new artists. For instance, Christopher Steiner talks about Lynne Ferguson, a grandmother of six who was signed up by a label for an album with 11 songs! Music companies now browse Music X-Ray for new artists where they can sort by probability of a hit – so if you have a song you recorded – go ahead and upload on Music X-Ray.
To be really big, I am sure you will need a ‘snake around the neck’ video or equivalent but it’s a start!
Steiner also talks about algorithms which can composes Opera (and good ones- not that I can tell the difference), write Haiku (very short Japanese form of poetry), power matchmaking sites, decide how to handle an irate customer in a call center, match astronaut personalities before they travel to space and so on.
But here is a very interesting one – Doctors!! Apparently, there have been studies where pap tests were conducted by cytotechnologists using an algorithm and they spotted 86% of cancer instances as opposed to 79% instances spotted without algorithm. This may not sound like a large improvement but remember this is cancer – any improvement will lead to an earlier diagnosis and so, better chances of treatment. Similarly, using algorithms demonstrated a 16% improvement in spotting lung cancer nodules. Again maybe not a huge quantity of improvement but 16% in context of cancer is pretty significant.
How many of us have had experiences where our doctor diagnosed our illness incorrectly? I don’t necessarily blame them – in a country like India, doctors probably see dozens, maybe hundreds, of patients a day – there is no way they can remember all your history, past symptoms, biological reactions and so on. So their diagnosis is often based on your history they scribbled down when you first met them. And where there is such manual process- there are bound to be errors or omissions. But imagine if there was an algorithm that remembers every disease, every medicine reaction, every injury you ever had since you were born! What if that algorithm could then be combined with doctors’ diagnosis? I am pretty sure this would improve the diagnosis!
Ok on to another one:
Let’s look at a recent example – Zest Finance. In July 2013, it was funded with $20 million by Peter Thielamong others. Zest Finance identifies loan eligibility of sub-prime mortgage applicants. A Pando Daily articlenoted that Zest Finance’s algorithm tracks 10,000 data points in the mortgage application to arrive at 70,000 signals in less than 5 seconds!! These signals point to likelihood of the loan getting repaid. I can’t, for the life of me, imagine 10,000 data points in a mortgage application but apparently it includes things like whether applicant uses lowercase or uppercase or correct case or slant of their handwriting and so on. I imagine their 60+ data scientists have built this mathematical model which has identified correlations between these signals and likelihood of repayment based on historical data.
We all know how well the earlier sub-prime method worked- so this is definitely worth a try.
And finally, read one of my earlier posts about how Target was able to find out about a girl’s pregnancy before she told her parents.
Apart from the above, there is the obvious field of financial markets. I am not getting into the math here – too complex for me anyway. But on some days, 60% of trading on US bourses is by algorithms – no human intervention, no manual oversight, just machines trading stocks, bonds, commodities, futures, options, and so on. That is scary!!! Algorithms are supposed to be (nobody really knows) responsible for the May 6, 2010 Flash Crash when the Dow Jones lost 1,000 points and then climbed back by almost that much, all in a manner of minutes!! A trader could have literally stepped out for a coffee and missed everything.
As the flash crash showed, there are inherent risks in trusting algorithms completely. Since algorithms, by their nature, depend on historical data – it also follows nothing that is truly new can ever be created or identified or discovered.-If only an algorithm was identifying music talent, we might never have had Beatles or Gun n Roses. Or if only an algorithm was diagnosing patients, maybe no new diseases will be discovered.
But we also cannot deny that as long as we have enough data points, all decision variables – even the softer qualitative ones- can be modelled. And if they can be modelled, correlations can be measured and if correlations can be measured, algorithms can take those decisions and, because of our limitations of memory, fatigue, etc. they can take them better than us.
So if you are considering a career shift – try being a data scientist. There is so much demand that you can probably auction your skills to the highest bidder.
I will leave you with this piece of update, broadcast seconds after third quarter of a Wisconsin – UNLV football game (note the byline):
“Wisconsin appears to be in the driver’s seat en route to a win, as it leads 51-10 after the third quarter. Wisconsin added to its lead when Russell Wilson found Jacob Pedersen for an eight yard touchdown to make the score 44-3”
–written by an algorithm.