Where do you work? At my place of work the col thing to do is to walk around and say let's get oodles of data on stuff and then analyse it. This methodology is apparently foolproof and will give us the answers to the known universe. 
Of course, a little while ago I would sit in the techie meetings and say things like 'Does it scale?' whenever there was a period of silence. Many of the techies now see me as some sort of wise counsel, in the same way I still see them as a naïve group very able at doing things but truly unable to understand why!
So Big Data is the new thing, the power of computers is unmatched and we will soon all learn it is about the data. O course, in the City, there are a huge amount of quantitative analysts who are very happy with any developments that improve their job prospects. 
But data to me as two huge issues, accuracy and interpretation. More bad data is still no better than less bad data, perhaps worse even. Interpretation is key - are sales falling because of a declining market or an inefficient sales force - can data tell us this? What if the data says one thing but it turns out later it was the other? 
In the Great Financial Crash, data was telling the CEO's of banks their risk of failure was tiny - then lots of banks went bust, quickly. The data had been interpreted spectacularly wrong. 
Anyway, the link is to an article where the Bank of England is going to try to use a wider data set to make decisions. This of course is a waste of time, I can quite happily predict what the Bank is going to do with very limited data - it is going to set rates nice and low to keep the Zombie economy going and allow the Government to create too much debt. It has been doing this for over a decade with no sign of change - how is a bigger data set going to make this a different decision?
Of course it won't and this is some sham PR exercise, as per usual. 
So here's some questions
ReplyDeleteState has created too much debt. Currently at 13 trillion for the UK, civil service pensions etc included as well as the borrowing.
1. When does it go tits up?
2. How do you defend yourself against it going tits up?
3. How do you profit from it going tits up?
Gold.
ReplyDeleteBig data - not different to anythng else in computing - GIGO.
ReplyDeleteAs you int at the data problem you hint at is intersting; data will be skewed by what is captured, which is also a function of pre-existing predicate of how the works works.
Lots of overblown bullshit about AI too - much of the advances are nothing but brute force calculation of massive data - very little inteligence. Of course in both case you cannot expect people not to big themselves up - who would not talk their own book - still no reason to belive the fantasy claims though.
Totally agree there will be no rate rises. I cannot see any way this would be politically acceptable, there will be no change without a truly existential crisis, there will however be a lot of theatre about normalisation - it's not going to happen.
Lord Blagger - my answer for 10 -
(1) we are 10 years in, this is a bit like the house prioce crash that was mooted from at least 2000 - perahps another 10 or 20 years? (no basis for this figure but as good as any other estimate.)
(2) & (3) How to defend and profit - surely depends on how serious you think the crisis gets..
Big data relies upon statistical concepts; but means and standard deviations are artifacts of the data. The chances of having a large number of datapoints being bang average is basically nil. Unless you have a highly standardised/commoditised process generating the data in the first place. You can also increase the chances of finding average datapoints by massively increasing the dataset, or improving the precision of your measurements. It seems highly likely that big data systems will trend towards being accurate for a smaller and smaller portion of the data; they'll have a tendency to fail, quite probably catastrophically, for an increasing amount of the set.
ReplyDeleteAn awful lot of the calls for "more data" seem to flow from not understanding the process generating the data in the first place.
Whatever problems the BoE has, I'm reminded that Haldane, I think, went on a tour of the regions within the last two years, and discovered that regional issues were not showing up in national data. Yup, scale. So some of the BoE's actions, derived from national aggregates, might well have been positively harmful at regional level.
Still, more data eh?
"regional issues were not showing up in national data"
ReplyDeleteThis is why proper scientists hate averages.
Good grief, I'm a proper scientist!
ReplyDeleteIt always amuses me that 0.1% or less is considered to be meaningful in terms of measures of inflation/wage growth/productivity/you name it......one month's data on its own doesn't tell us much especially when much of the data isn't captured anyway eg self-employed in employment stats etc
ReplyDeleteI've no idea how TPTB set about collecting and analysing the data they do have but I'm sure as hell it can't be as accurate as they make out.
Here in the council, I can confirm we're just as interested as ever in collecting and analysing your data. And because we're doing this for the purposes of protecting you all from yourselves we don't even need to ask your permission or tell you we're doing it.
ReplyDeleteAs always, Depends
ReplyDeleteEx-GF was a contract PM on a db migration project for a large ins co in Swindon. They had a spreadsheet of ~1000 ifas that was looked up daily, but not constantly (and not backed up) on a PC.
GF advised Access on a terminal server.
Some guru asked if it scaled.
$large_it_outsourcer that did the IT said 3 dedicated servers (live/test/dr) = £108K + 15K pa for scalable solution.
On big data, I got ~10 years worth of FTSE data and was able to fit a function that predicted the next day's movement with ~78% accuracy.
Except it did not work on future data that well.
in large multidimensional data sets it is soooo easy to pick / find / derive 'significant' results that aren't really, or are but then you have incorrectly identified the causes.
As others have mentioned, you also need to be very careful about the questions being asked in the first place.
Andrew- of course, hence the Hitchhikers Guide to the Galaxy with the best example of all time - 42 being the answer, but what is the question?
ReplyDeleteROBERT McNamara, Harvard, Ford, UsAAF, practically invented data analysis. He trained ww2 armyairforce officers in business skills of efficiency and organisation and statistical interpretation.
ReplyDeleteWhen he was made secretary of defence for Kennedy, he attempted to run the entire military in business lines.
And data was the key.
But during Vietnam there was so much data collected. Rice caches found. People with leftish leanings in former French administrative roles, number of VC rubber soled shoes discovered per 100 acre of jungle, etc, that it was mostly useless.
Tons and tons of “hard data”poured into MacV. And little of value came out.
The pacification said village region X was 32% pacified. But no one knew what that meant. Was that a win? And ?mac said, sure it was, because before it was only 22% pacified.
But 32% only meant the enclosed villages. The regional headquarters. The land the GIs were brushing through that day, was pacified.
Everyone citied figures throughout the entire war. Thus much captured. That much taken. This many killed.
Most of it was wrong data going in. And worse, it was wrongly interpreted coming out.
It’s why the USA was 100% unaware of the Tet offensive when it came. And why, up until then, they thought they were winning quite handsomely.
MMcNamara was presidentvof the world bank after being the longest serving secretary of defence in America’s history.
He did a better job there.
Erm, were is the link??
ReplyDeleteYeah, what did kick this off? Was it the Haldane speech of 30 April? Which is here https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/will-big-data-keep-its-promise-speech-by-andy-haldane.pdf
ReplyDeleteThanks cowshed, did forget the link and that is the appropriate one!
ReplyDeleteData gives a figleaf of rationality to what are basically gut feel matters of judgement. Bad managers love lots of data. Something to blame when you find you have got it wrong.
ReplyDeleteOne of my early mentors told me if you are right 51% of the time you will have a great career. If you are right 49% of the time you won't.
wtf happened there?? Posted before I finished. Anyhoo, the best things governments can do is emulate J.J. Cowperthwaite and not collect any data. If you don't collect it you can't attempt to incompetently manage it - and the free market and individual liberty will take care of everything.
ReplyDeleteShame it never caught on.
Analysis paralysis is the aim.
ReplyDeleteBig data is just one of the latest buzzwords engineered to seperate customers from their cash as they come to the end of big projects. Same as Cloud and Cybersecurity.
ReplyDeleteOT, but anyone heard from the Skripals lately? You'd think HMG would want to keep the Russia drum a-beating.
ReplyDelete(more big transports than usual flying into Fairford US airbase, and RIAT isn't for a couple of months. Is something planned?)