I’ve invented a pie machine. It makes the best pies you ever tasted, and you can put anything you like in it. If you put apples in the machine, you’ll get the best apple pie you ever tried out the other end. If you put beef in, you’ll get an amazing beef pie. You can feed it anything, put mud in and you’ll get the best mud pie in the world. It’s still a mud pie though, so I wouldn’t recommend trying it.
|Garbage in, Garbage out|
It’s as true today as it’s ever been. Put bad information in and you’ll get bad information out. It’s now worse though. Before we just had the issue of putting bad data into the machine and getting bad data out. All we had to do was clean up the data and we were winning. Now though, we run the risk of creating what I affectionally think of as robot yes-men. We’ve all seen the manager surrounding themself with people who agree with everything they say. Now we have a much faster way to achieve the same bad answers, but without all those people…and you might not even realise you’re doing it. The problem here is that we’re putting good data into a machine learning algorithm, but that data is not the most effective way to train an algorithm.
So, the secret to success? First we need to define success. Success is sometimes obvious, sometimes it isn’t. The “classic” ML example is house pricing. The tutorials will tell you this is blindingly easy, all you need to do is put in some data (house price, square footage) and the machine will make a prediction for future houses. The machine can find you patterns, so more data is better, right? Add in bathrooms, bedrooms and you’ll get a better outcome. Add in colour and it might notice that brick houses are more expensive generally. Add in bin colour and it might notice that houses with yellow bins are generally twice as expensive as those with grey bins. Not helpful or relevant. This is not what success looks like.
But all of that feels obvious, doesn’t it? Let’s change the scenario to something a little different. Let’s look at “Le Tour” the biggest cycle race in the world. Let’s feed in all of the stats we have and ask the computer to identify what kind of rider crashes a lot. They’re all men, aren’t they! If we choose female riders, therefore, we’ll no longer have crashes in the Tour de France! What about winners though, they too are all men. We may get a race without crashes but there wouldn’t be a single winner amongst them if they were all ladies…
A more serious example though is CV checking. We have a requirement for new staff at management level and a thousand CVs to sift through. To remove any bias we’re going to get a machine to choose candidates because machines are not sexist or racist. Using data on who is currently successful in the organisation at management level we can train the machine to remove those CVs which don’t look like people who will succees. Hopefully you’ve spottedthe problem already – the organisation with mostly white, male managers wanted to remove bias but has reinforced it with existing data training the machine. All of the successful managers were white males and so the machine may very well reject other races or women to meet that success criteria. To make this work, we need to remove that data from the training.
Worse still in crime prediction – whatever profile currently is “statistically likely” to carry out a crime will be flagged as a potential criminal.
Should we stop collecting those statistics altogether? No, of course not. We must use those to measure our success and correct any bias, whether that be in the machine or in the wider environment. If the machine is unbiased and we still get no women in management, or more black criminals then there are potential issues in society, education etc. that may need to be looked at. The machine shouldn’t be trained with that data in every scenario though.
Machine learning is a great way to speed up processing and time to success in your business. It can be a great way to accellerate failure too so treat it with caution and always question your input data and what you consider to be success.