The Infinite Monkey Cage - Big Data
Episode Date: July 16, 2018Brian Cox and Robin Ince are joined on stage by Danny Wallace, mathematician Hannah Fry and science writer Timandra Harkness. They'll be going big on Big Data, and asking just how big is it? They'll b...e looking at where Big Data comes from, should we be worried about it, and what mysteries are hidden within the seemingly endless amounts of information that is collected about us as we go about our daily lives.
Transcript
Discussion (0)
In our new podcast, Nature Answers, rural stories from a changing planet,
we are traveling with you to Uganda and Ghana to meet the people on the front lines of climate change.
We will share stories of how they are thriving using lessons learned from nature.
And good news, it is working.
Learn more by listening to Nature Answers wherever you get your podcasts.
This is the BBC.
Hello, I'm Robin Ince.
And I'm Brian Cox. Today's show is all about big data and how the information you unwittingly
offer up every minute of your life is being used to shape the environment around you and
even to decide what you need. You will know this if you've ever done internet shopping that suddenly that
algorithm i looked up brian cox once once and ever since then i'm inundated with adverts for
how to be a one-hit wonder 1990s club classics volume 17 and astronomer's finger wax which i
don't even really know what that is.
It just says, with Astronomer's Finger Wax,
you can point to anywhere in the universe with confidence.
May not work at Zenith.
You're right, you said that Zenith bink would be quite a low laugh.
Your prediction was good.
A niche joke for astronomers.
How many astronomers in the audience?
Noms.
To be honest, we got more noms than we deserved.
Well, I put your name into a search engine anyway,
and all I get now is adverts for cardigans.
You would.
So, 40 years ago, you would know if someone was collecting data from you
because you would be stopped in the street by a man in a Mac
who would be saying something like,
excuse me, have you got five minutes for me to ask you about your thoughts
on home pride cooking sauce and the education policy of Barbara Castle?
To which I would normally reply, you'd have to be quick, Rumbelows closes at five and I've just blown a valve.
We really debated about the Rumbelows reference.
But we felt that to say Rumbelows would somehow find some of our core demographic
and take them to a place of delight.
Every time you use your loyalty card or your Oyster card,
search online or even switch on your phone,
information about your behaviour is collected.
But where does it go and what is it used for?
To answer those questions,
we are joined by at least 64 petabytes of pure information,
and they are...
My name is Dr Hannah Fry.
I'm a mathematician from University College London
and the author of a new book on algorithms and data
called Hello World. And the most
interesting thing that I have discovered
on the internet is that France was
still using the guillotine when the first
Star Wars film came out.
Hi, I'm Timandra
Harkness. I'm
a lapsed comedian. I'm a mediocre
mathematician, but I am the author
of Big Data. Does
size matter?
That's the correct response. And the most interesting thing i found on the internet is that the very first ever registrar
general of births marriages and deaths for england and wales who was a man called thomas henry lister
also wrote a romantic novel called granby in which the eponymous hero's love for miss german
overcomes her
parents' opposition when he is revealed to be the secret heir to Lord Malton.
Hello, my name's Danny Wallace. I'm a writer and a presenter. And what I find interesting
about the internet is when big companies get their sort of web addresses wrong, their websites,
you know, they've got to come up with something good
and they've got to come up with their URL,
and they just think, that'll do.
Which is why the custom pen manufacturer, Pen Island,
when it's all typed out...
LAUGHTER
..you are way ahead of me...
..becomes Penisland.net.
Or the company Internet Protocol Anywhere,
which, until they noticed, you could find at ipanywhere.com.
So it's just a way of bringing in new audiences, I think, for their products.
And this is our panel.
To my understanding, we'll start off with a definition, because this is, I mean, big data is quite new to everyone on this panel.
When we were born, this was nothing that would be referenced.
It is a new idea.
So what is big data?
Okay, well, obviously it's big.
It's actually quite hard to specify how big because the amount of data in the world proliferates so quickly that if you give it a figure one year, it's out of date.
of data in the world proliferate so quickly that if you give it a figure one year it's out of date.
To give you an idea I interviewed a brain scientist called Professor John D Van Horn and he said that when he got his first post-doctoral research job the lab sent him out to buy the biggest hard drive
they could afford because they were doing brain scans which is a lot of data and he brought it
back and people from the other labs in California came to look at this hard drive because they'd never seen one so big it was four gigabytes
and I'm thinking I'm talking to him on the phone thinking your photo on the internet makes you look
quite young but how long ago was this because my phone has eight gigabytes so so it is it is big
there's lots and lots of data but is there more to it than that yes and uh
i actually came up with a backronym for big data which as you all know is an acronym where you've
reverse engineered it to get the word you wanted that's the only thing anyone's going to remember
about what i say tonight uh a backronym so so data big big data is big obviously uh d it's got lots of
dimensions so you've got lots of different data sets uh so perhaps you know you you don't just
well another brain scientist said yeah okay brain scans are great but i'm much more excited if you
get the brain scans and the medical records and the postcodes where the patients have lived
and the weather reports for those postcodes when they lived there this is
professor paul matthews and i put them all together and then i can study the effects of sunshine
on the progression of multiple sclerosis and that he said that's big data if i just have brain scans
that's just large data so it's got different yeah that was what he said uh so it's got different
dimensions it's collected automatically that's the first A. It's collected pretty much in real time.
It's the T, which means you can also then project it forwards in time
and use it to make predictions.
And the second A is for AI,
because you basically use artificial intelligence-type programs to process it.
So that's big D-A-T-A.
So, Hannah, we've spoken there about, I suppose, collecting data sets in a way that is, well, understandable in a way, so it's a weather data or brain data or brain scans.
But can you give us an overview of the totality of the amount of data that's being collected?
Yeah. Well, in a way, actually, I don't know if I totally agree with everything that Tamanja said said because I think that actually we have had big data in the past I think you know the census for example is this
connected data set tells us all about you know different things about one person and lots of
things about everyone and you know each one of us has contributed to it across the entire country I
mean that really is sort of is big data but I think what has changed is the types of data that
we're now collecting I mean you know
you don't need me to tell you how much data your your phone can collect just by you walking around
um you know you can have a heart rate monitor on your wrist you've got um you know your lights
coming on and off that's all being recorded everything I mean basically every single thought
I ever have I practically type into Google um you know there's there's just a catalogue the the range of different stuff that we know about people now that is different and
you said it's being recorded almost in passing but is it are we now to take for granted the fact
that all those things we do with the phone everywhere that we move every internet search
that we perform is recorded or archived somewhere do we have control over that no no not at the moment but uh
there are uh i think people have in the last couple of years started to wake up to the fact that
actually being able to infer this much about individual people isn't necessarily the kind
of society they all want to live in because Because, of course, there are things like, you know, what gender you are,
you know, your sexual orientation, but other things as well,
very, very personal things, you know, whether you've had an abortion, perhaps,
you know, whether you're having problems conceiving.
All of these things can be inferred from your searches that you're doing online.
And I think that people are starting to wake up to the fact that, actually,
it doesn't feel particularly comfortable to have someone be able to know that about you.
There was a story, actually, a very big story in America a few years ago.
A company called Target.
It's kind of like, sort of like Woolworths, really.
So it sells everything that you can imagine.
You can get some, you know, grocery stuff in there, but also things for your house.
And they have like a club card type system
where they can track what an individual is buying.
And they were doing something called basket analysis,
where you look at one individual
and the things that they're buying over time.
And they brought on a new statistician to look at all of this data.
And this statistician realised that there were some clever tricks
that you could use to work out
whether or not someone was pregnant based on what they were buying so not the obvious stuff not when
they're buying nappies and cotton wool but when they're buying unscented body lotion when they're
in the second trimester and often that would be preceded by someone buying vitamins and stopping
buying alcohol and you could even kind of predict the exact moment that they were going to give birth
by the things they would go on to buy.
So what the company decided to do
is they set up this pregnancy predictor, right?
So if you went past a certain threshold,
they would assume you were pregnant
and they would send out a series of coupons
to you in the mail.
Just to, you know, capture your customer early
and lock you in so you're, you know, a target customer.
And that was all fine.
Don't necessarily have a problem with that.
Except that in 2012, a father of a teenage daughter
walks into a store in Minneapolis and was outraged
that his daughter had been sent this pack of coupons.
And he was like, you're normalising teenage pregnancy.
This is outrageous.
So the store apologised and then called his home later to follow up on that apology.
And by the time they managed to call him, he said,
you know what, actually I've had a chance to have a chat with my daughter
and it turns out that she was pregnant.
Found out through coupons that are mailed through the post.
So I think this is something that people are really kind of waking up to
that actually does
make us feel quite uncomfortable to have people know that much about us can I ask you a question
on behalf of my mum yes my mum uh she doesn't know how this is happening or why but the universe
seems to be telling her that she needs a new mattress and like wherever she goes whatever
she does is the same mattress is targeting my mother,
and she feels victimised by this mattress company,
and she says she hasn't been Googling mattresses,
and I believe her.
What's going on, and should she buy one?
Don't know if I can help you on the latter.
Right.
But, you know, there are things like maybe she's Googling insomnia
if she's having trouble sleeping or back pain.
I mean, there are all of these things that are just loosely associated
that don't directly say...
I mean, not everyone who Googles back pain needs a new mattress.
Maybe your mum doesn't either.
But there's just enough...
If you do it to enough people,
the chances are that you're going to increase your sales.
Twitter thinks I'm a man.
And I know this not only from all the beard care products that it advertises to me.
And the videos of buff guys working out, which is actually okay.
But you can go on Twitter, you can see what it thinks about you.
It thinks I'm aged between 13 and 55.
It's got that right, anyway.
And it thinks I'm aged between 13 and 55. It's got that right, anyway. And it thinks I'm a man.
And I don't know why, because I've never told it.
So, you know, it's not infallible,
which I think is... there's some hope.
It's not always right.
I feel bad that as a man I'm not getting those things.
What do you get on Twitter?
Well, I never get buff men working out.
I don't know what you've been Googling.
But this is... well, I don't know what you've been Googling. But this is... Well, I don't know.
Have you been Googling...
Have you been Googling buff men working out?
That'll be it.
I do remember, actually, my mother saying
that she'd been looking for...
There was a particular scene from a movie
where there was a very moving scene of, I think,
it was Edward Woodward acting, and it was a movie about a prison, and there was a very moving scene of i think it was edward woodward acting and
it was a movie about a prison and there was this very moving emotional scene in the shower where
he was acting really well and so she googled prison shower scene
and she said she she didn't find edward woodward and a lot of the acting wasn't that good
it was quite blurry
and you couldn't really tell what was going on at all.
I mean, you've got me worried now
because this afternoon I was literally googling penis land
and IP anywhere.
Hannah, we're talking about human behaviour here.
So you gave an example, actually, of a single individual
in a case where you can predict something about them.
How accurately can we predict individuals' behaviours?
And then, I suppose, groups of individuals,
does that become easier to predict?
Yeah, well, so groups of...
You can certainly look at what a lot of people are doing.
And actually, there was...
In getting my notes ready for this i i was researching something
on my work computer that was uh slightly regretted because it's quite an interesting story about porn
hub um so uh during the hawaii we don't know what that is what is i mean it's got a backronym
somewhere i'm not sure um so during the hawaii missile alerts um there is
uh porn hub have released the data um of how much they were being used in hawaii at the time
um and as the alert came out the first text message saying you know there's going to be
a missile coming uh there was an 80 drop usage. Still not down to zero.
Still 20% of people there. They were too busy to read their text messages.
That's why.
But as soon as the follow-up message
came through to say that
everything was all okay,
Pornhub then spiked to
50% higher than normal usage
for that time.
What a way to celebrate.
Exactly. You can certainly observe 50% higher than normal usage for that kind of thing. What a way to celebrate. Exactly, exactly.
But I mean, you can certainly observe how people are behaving,
especially at the level of a population like that.
In terms of predicting what an individual will do,
you can do a better job with algorithms and with data
than you can just by guessing,
just by another person sort of trying to make a prediction.
But you can't get absolute perfection.
And that, I think, is one of the slight concerns, really, about all of these algorithms being used.
So to give you an example, algorithms based on data of people's past is being used to try and predict whether or not they'll go on to commit a crime.
Now, this is in a particular scenario so
when a judge is trying to decide whether to give someone bail or not for instance
but also now increasingly if they're sentencing an individual and sending them sending them to jail
and this is something that i think you know the whole sort of big data community has really been
tussling with because you if you're a judge you sort of do need to make a prediction you need to
make a prediction of whether letting someone out uh and and you know giving them their freedom
before they they face trial is a good thing you know you've got you've got to make a prediction
of whether they're going to um you know betray your trust and and break the the conditions of
their bail so in some sense actually an algorithm that makes that prediction better is better but well but but
are they better and and also how would you feel about being sentenced because people like you
in the past uh went on to re-offend or didn't go on to re-offend because what categories are they
using durham police are working on their own version of this, which,
to do them credit, they've looked at the American ones, which have been subject to some controversy
and gone, we want something to help us make this decision whether to kind of keep people out of
jail and put them onto a rehabilitation scheme instead. But we want it to be transparent. We
want everybody to know what's going on, how they're being judged. But the biggest predictor of whether they're going to re-offend
turns out to be the postcode.
So, I mean, I don't know what kind of area you guys live in,
but do you want to go to jail or not on the basis of your postcode?
Is that fair?
Danny, how do you feel about this collection of information?
I mean, are you careful?
Are you canny when you are on the internet, for instance,
and, you know, certain things, you know, ticking a box, whatever it might be, are you careful are you canny when you when you are in on the internet for instance and you know certain things when you're you know ticking a box whatever it might
be do you are you methodical or do you kind of worry about uh no i sort of in a weird way i find
it kind of uh comforting uh in a strange in a strange way it's that kind of you know it's the
the electronic version of the nanny state kind of looking after me making sure my mother has a
mattress things like that reminding her i worry though about the kind of the nanny state kind of looking after me, making sure my mother has a mattress, things like that, reminding her.
I worry, though, about the next stage.
You know, we've all got all these devices that have microphones in them.
And we've got one of those robot ladies in our house.
We have a little box, and you can talk to her.
And you can just go, you know, hello, what's the weather like?
And I don't know why I ask her that, because I've got windows. I'm able go, you know, hello, what's the weather like? And I don't know why I ask her that because I've got windows.
I'm able to, you know, do that for myself.
But you start to wonder, you know, are they listening all the time?
You know, is that how more information will be got?
Sorry, what is this?
Because I...
When you say that, I'm confused as well.
I was just seeing one of those old-fashioned barometers
where either someone comes out with an umbrella
or a lady comes out because it's sunny.
So this doesn't sound very modern at all.
That's exactly what it is.
No, you know, like Siri or Alexa.
Now I've come up with two names.
I can say them because it's all branded, isn't it?
But, you know, you can go, Alexa, do this for me.
And she will. She's very obedient.
But, I mean, although there was a story not long ago
about these Alex these uh these
alexis in particular or alex i seeing as i'm already a four um uh where every now and again
you know they'll only really respond if you say their name um but just recently they've started
to just every now and again when you're having a conversation with someone else or terrifyingly
in the middle of the night, they begin to laugh
maniacally.
Which is not what you want to,
you know, we could do with a few in here.
But you don't want to hear that
at sort of three in the morning, just hearing a
disembodied voice, a woman just laughing
downstairs, or just mocking you.
So I start to think, you know, there are bugs
in this, but could this be the sort of
the next step where we just accept that these things are always on
and then your smart TV is communicating with your phone
and your phone is talking to Alexa
and they're all going, his mum wants a mattress.
See, this sounds like it has, in terms of AI,
I don't know how you feel about it, Andrew,
if we now have these machines which have reached the stage of,
you know, mocking us and probably also understanding irony, have we now got, you know, have these machines which have reached the stage of, you know, mocking us and probably also understanding irony,
have we now got, you know, have these machines reached that point
of passing the Turing test?
It's like that moment where it's not the chess machine winning,
it's when the chess machine gloats as well.
Well, that I think...
Apparently there is somebody in London actually working on
trying to get AI to understand irony.
And I'm like, don't do it, don't do it.
Because when they do become more intelligent than us,
our only hope will be in irony.
We'll live underground and have a secret language based on sarcasm.
Because it is the only thing they don't understand.
I mean, now they can do stairs.
I once talked to a roboticist and I asked him that kind of,
you know, that question that everyone sort of ends up asking a roboticist
which is, you know, will they rise up
against us? And
he sort of laughed and said, probably.
And I said, well, what
are we doing about this?
And he didn't really have many ideas and he just
sort of went, well, at the moment we're making sure that the
off switch is quite readily accessible.
I feel safe.
But in a sense, that's the wrong question
because I actually had really funny
conversations when I was out in California
at Google HQ for an event
and really
spookily, I basically, I was going out there for
this Google event, at the last minute I stuck my
radio recording kit in the bag, just on a whim
and I landed at the airport and I got this message from the BBC saying, this is a few years ago,
saying, yeah, no, get in touch with us urgently, because we want you to co-present this programme
about the singularity, like the super intelligent AI that's going to be more intelligent than us.
And I went, that's quite spooky, because I'm basically on my way to where it probably already
is.
And so I was talking to people in Google and saying, you know, how long do you think we've got before the super intelligent AI?
And several of them said, well, we're in it.
You know, it's already here. Look, think about it.
If you were a super intelligent AI, would you burst on the scene going, no, you are mine, humans?
Or would you quietly sit in silicon valley attracting really clever people
giving them nice food and drink uh getting them to service you and bring you all the data that
you need and then maybe some robotic cars and maybe some drones and uh and who knows what else
you know that that would be what you would do and i'm sitting there in google
in the restaurant talking to this guy going but that means it can hear what we're saying
I actually don't think it's already here but you know if it was if it's going to come it'll come
like that it'll come with clever people in Silicon Valley going here it is we've built it isn't it
great I think we're quite I think calling AI well I just think actually I have a slight problem with
the label AI full stop I think what we've seen recently is a revolution
in computational statistics, not a revolution in intelligence.
And I admit that that is nowhere near as sexy.
Statisticians of the world unite.
Unless you really like statistics.
Statisticians of the world unite.
You have nothing to lose but your Markov chains.
All right?
That's one of the things actually for both you uh and
hannah i imagine i found a quote about this which said about where you are with mathematics and and
statisticians it's like an arms race to hire statisticians nowadays mathematicians are suddenly
sexy i mean in terms of big data at that moment going, it's changed people's conception of both mathematics and mathematicians, hasn't it?
Yeah, I think so.
I mean, I think suddenly we can use all of these techniques
that we were able to use in science for so many years,
and now suddenly we can apply them to ourselves.
And to Mandra's making a face.
Should we? That's the question.
Because it's one thing to, like, you know,
like Brian does underground in Switzerland,
applying big data and a large hadron Collider to protons, isn't it?
Anyway, those subatomic things.
To be honest, he's not as involved as he used to be.
Oh, well...
Yeah.
Telly or great discoveries?
Ooh.
I'm resisting.
We get on very well, by the way.
That's why I said that.
Because one little loo is if to go...
I'm resisting the urge to deflect the programme
from the subject into the...
I'll talk about particle physics all night.
But that's the natural world.
The natural world is absolutely...
Math is brilliant at studying the natural world and statistics,
but human beings, we are part of the natural world,
but we're also not part of it.
Like, we don't just behave like particles.
And, you know, we have free will and we're awkward
and we do things for reasons that we then have to explain
and people don't understand.
And I get squeamish about saying,
oh, well, you know, but you can model how people behave
and we basically behave the same as i don't know ball bearings
there's a difference here though isn't there because we've talked about it sounded quite
sinister actually when you talk about predicting whether someone is likely to re-offend or but but
in terms of groups of people in terms of movement of people through shopping centers or cities
then i think it sounds less sinister and more sensible doesn doesn't it? Yeah. Is it easier to take, to say,
well, I suppose to predict how crowds will behave
in certain environments rather than individuals?
Yeah, it's a perfect example.
You know, you take something like a transport system,
like the tube network, say.
You know, it's actually really important
to have a really clear idea through data
of how people are using your system,
where they're going, when there's a problem,
where they redirect to.
You know, it's absolutely integral
to getting something that works efficiently.
And I think that you can say the same thing about,
you know, to a degree,
and this is where Tamar and me disagree,
but I think to a degree,
you can say the same thing about making,
you know
your policing as efficient as possible um and working out where the best places to to place
your forces are in the city um and I think you know actually across the board really you know
in health care I think in in yeah everything everything every system that humans are part of
I think that we can learn about ourselves by thinking of ourselves through the
eyes of data and make it more efficient. Tivanda, do you feel there's a difference between using
big data to predict the way that crowds will behave, or large groups of people and individuals?
Is it because you said you were concerned, but is it really about the individual
rather than group behaviour? It really is. I mean, that's the root of it, is that...
I mean, Hannah's right.
It can be really, really useful to look at how people behave en masse
in order to find solutions en masse.
And, you know, in crime, for example, if you did find a postcode,
which happened to have a lot of criminals,
it would be useful to go, that's weird, what's happening there?
Is it particularly deprived?
Is there something we can do uh en masse
on a population level it's where i would get really worried is when you then jump and go okay
well if we know that i know 30 of the people in this postcode will end up unemployed then you look
at an individual and go you're 30 likely to be unemployed so we are going to treat you as if you are basically a potential unemployed person
without any regard to you as an individual
and what you might think and want and do.
So that is part of it.
But I do...
I'm also a bit squeamish about the idea
that you kind of guide us without consulting us.
If you look at, I don't know,
wanting us all to live healthier lives
and walk more and get the bus less, us without consulting us if you look at i don't know wanting us all to live healthier lives and
walk more and get the bus less and you go okay well what we'll do is we'll redesign the system
so that you have to actually really put yourself out to get a bus and we're just going to nudge
you into walking without ever saying to you do you want to walk more or not and and that's another
thing i just get a bit you know i mean you, Danny, you seem quite happy with the idea of the Nanny State doing things for your good.
I'm a bit more like, well, ask me.
You know, I might want to walk more.
But I do think it should be my decision, not just kind of nudged into it.
I mean, smart cities is, oh, smart cities and everything works really efficiently and it's a great system.
But the problem with a smart city is it seems to kind of assume
that we're dumb and the city is smarter than we are.
And I don't like that.
I was actually at an event in Hong Kong.
Robin was there as well.
And it was a trade panel and we were talking about things.
And then there was a Hong Kong entrepreneur,
a property entrepreneur there.
And he sort of woke up. I'd said something. I made some joke about brexit or something i can't remember what it was and he looked up and he said this is we're going to beat you in china we're going to
beat you and we're going to and the reason is that data in china is owned by the government
so you do not have the right to restrict data, for example, about your movement through a city.
And he was using that as the example.
So a city like Shanghai, for example,
his assertion was it will be a better city,
it will be more efficient,
because the data is freely available
to the planners of the city,
and therefore you can build a better city,
which is, you can see the point.
And I suppose really what we're talking about here is
we should separate the two in the discussion, really,
from what we can do with big data and then the oversight that government has.
Yeah, what we should do with big data.
And actually, China is a very interesting example
because China has had ID cards for a really long time,
which obviously we rejected in the UK several times. But essentially
that means that the database of ID cards means that the government knows what everyone's face
looks like, right? It owns that data on everyone's face. So facial recognition software is now
widespread across China. There's even an example actually in um in some toilets in beijing
where um the facial recognition system would notice you as you went in um and then if you
came back oh it would only release uh i think uh 60 centimeters of toilet paper right every time
and if you came back within nine minutes it would lock off all of the toilet paper system because clearly toilet paper theft within this particular toilet in China
was so extreme that they needed to register your face.
That's a terrifying moment in 2001.
That's what Kubrick did.
I'm not going to give you any more toilet paper, Dave.
I'm not going to open the toilet door, Dave.
You could have just had some dodgy sushi or something.
Well, I want to ask Timandra if you have examples of extremely positive outcomes from looking at big data sets and analysing data.
Oh, definitely. And, you know, it is true that obviously it can be used to make things more efficient.
I would never say we shouldn't use it.
I think it really is all about oversight.
to make things more efficient.
I would never say we shouldn't use it.
I think it really is all about oversight. And we could, for example, contrast, well, China,
where they're using facial recognition everywhere
and have no compunction.
But even the cities in Europe,
which are introducing all sorts of different smart systems
where different private companies
are just gathering up a lot of data about individuals
and nobody even knows what they are with the city of
oakland in in california where they uh the citizens basically heard their council had got a federal
grant to put in a very integrated surveillance system with facial recognition and number plate
recognition and all sorts of things and when um can we can we just ask where is your privacy policy
on this and when the council went, they had a big campaign.
And they now have, not only is that system kind of quite controlled and scaled back,
but they have a standing privacy commission,
which includes citizens and civil liberties organisations.
And every time they want to bring in new technology,
they sit there and go, OK, well, what do you want the data for?
What are you going to do with it? How long are you going to keep it?
And it has democratic oversight.
And I think that's perfect because they still get the benefits of the technology, but they know what it is.
But if you want to talk about how great big data is and what it could do,
my favorite example, my kind of poster boy of big data, is a professor in Southern California called Eamon Keogh.
And he's a professor of electrical engineering,
but what he works with is insects.
And I said, well, what's the connection there?
And he said, well, you know how when your emails come in,
you've got an algorithm that sorts them and gets rid of the spam
and can forward emails automatically?
I want to do that with insects.
I want to be able to delete them
and forward them and i'm afraid my first thought was great so you could forward wasps to somebody
else's office that wasn't what he meant at all uh and he's basically he's using big data to classify
insects he's got like a global database of insects based essentially on the sound that the wings make, but to not have background noise.
He's got this mad little device using lasers and photodiodes.
So you've got this red light falling on a light gate,
and it produces an electrical signal which you can turn into sound.
And when an insect flies through that or anything interrupts it,
it interrupts the electrical signal, it makes a sound.
So essentially, if an insect flies through this light gate,
then the electrical signal is the sound of its wings,
but without any background noise.
And he's used big data techniques
with millions and millions of these recordings of insects
all around the world to classify this sound
as this species of mosquito.
And there are something like 5,236 species of mosquito?
That's right.
And it can not only tell the species,
but tell whether it's male or female,
and whether it has already sucked blood from some creature.
And so you could track, like,
are the Zika-carrying insects moving across Africa?
We can trap all the ones.
There's all sorts of things you could do with that to control insects,
to know where they are, to know what diseases they're carrying.
And that just made me go, this is it.
This is a great use of this technology.
We can understand insects.
We can do something about the diseases they carry or the crops they eat.
So he's my kind of big data hero.
What about the individual insect, though?
You shouldn't label them in the letter.
That's just because they're insects.
I don't care about them.
I didn't mean that.
I suppose there's a real challenge here,
because obviously collecting large amounts of data for no apparent reason
at first sight can be problematic
you might think you someone should define what they want to do with it before it's collected
but of course much of the opportunity is in finding patterns in the data isn't it so it's
really it's not only the data you collect but but what you do with it and finding new ways of
interrogating the data yes
now i totally agree with that i think um people who work in who work with data are you know
notoriously greedy um you know give us everything and we'll find something and things are changing
though i think that you know there's um there's a new bit of uh european legislation that is now out that changes the sort of ownership of data from the company
and then shifts it slightly more in the hands of the individual it's called gdpr that should just
give us a little bit more control over what companies know about us and in particular on
that point that you just made there of we need to know what our data will be used for a complete list of what it will be used
for um before they're allowed to own it is that a good thing though because um part of the
opportunity i suppose is to find patterns so i can just to invent something it may be that
people who engage in a certain activity are more likely to develop heart disease or something but
it might be something very unexpected like eat too many apples i don't know what it is but something that
we no one ever suspected so do we close off the possibility of making really important public
health discoveries if we restrict the usage of data it's a very very difficult question
without an easy answer but i think one thing that i do know so there was an example of
where um all health records uh of she's wondering maybe you know this one slightly better than me
the um the royal free example oh yeah the royal free hospital uh set up a partnership with
google deep mind which is the the source of artificial intelligence,
although not to Hannah's standards,
but the AI program that beat the world champion at Go.
And they set up a partnership so that the patient's health records
could be analyzed using a new system Google developing
to basically just move data around within the hospital more efficiently it was
fairly innocuous what it was doing i mean other other projects they're doing is about finding
patterns in the data this one really was just about moving it around more efficiently and
tracking it but they didn't ask the permission of the patients to do this with the data the hospital
went well they've signed up to be our patients they they must be fine with it and and afterwards there were a lot of wrists slapped because the information commissioner
as office said no you really should have said to the patient are you okay with us giving your data
to google because it's an outside google deep minds because it's an outside organization
and i think there's a really important trust question here i'm actually really in favor of us sharing our health data for
everyone's benefit i mean it is a real example of we could all benefit enormously from sharing
health data and and like you say brian suddenly finding that oh actually there's a relationship
between this and this and this could be really important but we have to in order to do that we
have to feel that we trust the organizations that are using it and that they have our best interests at heart so you know we all love the
nhs and it saves everyone's lives and so on but there are also cases where the nhs says well i'm
sorry we were short of money so fatties and smokers you're to the back of the queue when you need an
operation so you can imagine if if there's this
great health data sharing research going on and they suddenly go oh well sorry sorry brian your
your store card says that you bought pizza every week for the last 15 years and so you're not
getting that operation my name is not brian and i don't know where you got that information from
and that's and that's why this is terrifying now.
You've conflated the two things.
Danny, would you...
I suppose there are two things here.
One is that you can...
Huge amounts of data about you could be collected.
But if it's anonymised, would you care?
Is it really the personalisation,
the identification of you with that data that matters
oh yeah no i think we're all you know we're all very protective of uh of who we are and what we
get up to even if what we're getting up to is fairly innocuous um but if there's a greater good
i think the health thing is as you know the best example possible we've all you know you have apps
on your phone that tell you uh kind of how many steps you've taken and where you've been and you
can add all this other stuff in and if you're adding all this information in there and it can
be a central database that can look at these patterns and you know track your health and see
you know what troubles you've come up against then like you say you know if they can find
new treatments or or patterns that have never been spotted before that help the greater good
then absolutely but yes we all we all
want to keep that to ourself instinctively but how this thing that i furnish is how easy is it
to actually uh remove the anonymity because i'm not i was reading an article the other day which
was about apparently the system that was used in new york taxes and it was just to see about the
routes of taxes and and and various different you, the pay that was given on different routes.
And some journalists managed to work out
which ones... how much different celebrities...
Yeah.
Now, that seems incredible because...
That was very clever, actually.
Well, the way that they undid all of the data
because there was a very weak encryption.
The data was released for all of the yellow cabs in New York
across a year so that people could do these beautiful visualisations,
work out efficiency in the city, so on and so on and so on.
But they put a very weak encryption on it
so you could work out what cab was what at what time.
And then someone else realised
that if you took paparazzi photographs of celebrities getting into cabs,
if you could see the registration number of the taxi
and know what day it was on you could
work backwards you put those two data sets together and work out often where celebrities lived
but also exactly how much they tipped um isn't this also called stalking yeah well yeah it totally
is but i also think that there's this idea about choice and i think that we're slightly kidding
ourselves if we think that we have much choice in this matter. Tamandra and I
about a couple of months
ago went to a crypto party.
I don't know if you've ever been.
We go to the best parties.
Now the audience at home don't know
this but if you could sense the envy going
on in the studio at this moment
it is palpable. Keep it
below the surface. Crypto parties
I didn't know what it was until I went to one.
Crypto parties are where you go and people teach you how to hide from everything.
So it was people showing you how to have an operating system that only exists on a USB key
so that you can take your whole computer with you when you leave and no trace of you will remain.
How to use the dark web, how to
change all of the settings on your phone so
that no one could track you. And it was
very interesting. I went and I was researching
my book, Tamanja, I think
a similar story.
I couldn't help looking around the room
and thinking, what have these people
got to hide?
Paranoid parties are the best parties.
Come on, wouldn't you want to go to a party
with people that have something to hide?
Surely.
Turn up, it's just an empty room.
No, I think that's unfair.
When people say, you know, nothing to hide, nothing to fear,
I say, nothing to hide, you haven't really lived.
Everyone should have something to hide by the time we're adults but i also think that getting together in a room and going we're going to be really technically technical and and spend
ages changing our settings is not really the answer because you're not actually helping
everybody do that i think the answer is just to make everything more transparent so we can
genuinely choose do we want our day to be part of this and also just to generally say what do we want it to
be used for do we want to be like china where every individual can be tracked and everyone can
be given a credit rating or a social credit rating based on how well behaved they are how polite they
are uh which could affect their chances of getting loans and things or do we want to kind of draw
some lines and say okay gchq you can you can hack into my phone in order to save me from being blown
up by terrorists but you can't hack into my phone to check that i'm not letting my dog poop where it
shouldn't that's the line is it dog pooping. Two very different Liam Neeson films there.
So we asked the audience a question, as usual,
and today the question was,
what is the strangest question you have ever asked the internet?
And I can tell you now, this is the largest number of answers
we've had to throw away
due to the fact that it's just not suitable for 4.30.
What is the strangest question you've ever asked the internet? Why am I so inexplicably
attracted to Brian Cox? Oh, Dominic, it's very explicable. Where is the hippo in Hippocampus?
I really did. Danny, you got some as well? Yeah, Katie Adam, this is a great
question. What is the capital
of space?
Brian?
Well, it's, there
isn't a centre to the universe.
It's the ultimate Copernican principle
at all points. It's called,
it's invariant, essentially, in every direction.
It's all the same's invariant, essentially, in every direction. They're all the same.
Thank you.
That was a lot less exciting than I was hoping for.
It's homogeneous and isotropic.
And almost exactly the same question here.
Is a fraggle a muppet?
If there are an infinity of quantum worlds,
why am I stuck in this one?
I'll tell you, John, you should see the others.
Honestly, it's not as bad as you think.
If things can only get better, when?
Have you got one?
I don't know the answer to this one.
How do whales breastfeed?
Why did you Google that?
What, not the others, just that?
I think
I actually know that one.
This is brilliant. What a moment for me.
Well, the milk is
a lot thicker and
you'll like this, more viscous.
Meaning it doesn't just dissolve
and go away in the water.
And that's little bull whales that do that.
And also they sleep vertically.
Thanks, guys.
So, during the show, we've been collating
data on those who have been
listening via the BBC's patented Soul Thieving Conundrum Machine, So, during the show, we've been collating data on those who have been listening
via the BBC's patented Soul Thieving Conundrum Machine,
or, er, P-S-I-T-I-C-A-M-M-E.
Got to work on the acronym, don't you?
So, whilst this show's been on, we've been using P-S-I-T-I-C-A-M-M-E
to find out what you have been searching for on the internet
during this broadcast, and here are the top three searches.
Top three searches were,
why won't my lightsaber cut ham?
Jacob Rees-Mogg
Mars Base?
What time does
Rumbelows close today?
Thank you very much
to our panel, Hannah Fry, Amanda Harkness and Danny
Wallace. Goodbye.
APPLAUSE In the infinite monkey cage.
Turned out nice again.
In our new podcast, Nature Answers, rural stories from a changing planet,
we are traveling with you to Uganda and Ghana to meet the people on the front lines of climate change.
We will share stories of how they are thriving using lessons learned from nature.
And good news, it is working.
Learn more by listening to Nature Answers wherever you get your podcasts.