1 00:00:02,750 --> 00:00:08,520 everyone so this other sort of wave a 2 00:00:06,029 --> 00:00:12,330 question and that's the question what 3 00:00:08,520 --> 00:00:14,480 does Minority Report Black Mirror a 1984 4 00:00:12,330 --> 00:00:16,860 all have in common 5 00:00:14,480 --> 00:00:18,600 now it's not the fact that they're all 6 00:00:16,860 --> 00:00:21,270 forms of media you know books films TV 7 00:00:18,600 --> 00:00:24,300 shows nor is it the fact that are about 8 00:00:21,270 --> 00:00:26,310 dystopian futures but in says the fact 9 00:00:24,300 --> 00:00:29,789 that they each talked about predicting 10 00:00:26,310 --> 00:00:32,009 crime in one form or another whether 11 00:00:29,789 --> 00:00:33,239 that's the precogs in minority reports 12 00:00:32,009 --> 00:00:35,910 the Ricola 13 00:00:33,239 --> 00:00:39,089 in black mirror or the for police in 14 00:00:35,910 --> 00:00:41,370 1984 each of these forms of media look 15 00:00:39,090 --> 00:00:44,100 at how we could predict crime but more 16 00:00:41,370 --> 00:00:46,140 specifically the repercussions of doing 17 00:00:44,100 --> 00:00:46,800 so and that's what I'm going to be 18 00:00:46,140 --> 00:00:48,809 talking about today 19 00:00:46,800 --> 00:00:52,470 and we talk about how we can use natural 20 00:00:48,809 --> 00:00:53,640 language processing to predict crime but 21 00:00:52,470 --> 00:00:56,489 for those of you that know me well know 22 00:00:53,640 --> 00:00:59,070 I'm not a mathematician and also not a 23 00:00:56,489 --> 00:01:00,870 police officer so why am I talking about 24 00:00:59,070 --> 00:01:03,120 natural language processing which is 25 00:01:00,870 --> 00:01:05,548 quite mathy and predictive policing 26 00:01:03,120 --> 00:01:08,249 which has the name may suggest it's all 27 00:01:05,549 --> 00:01:10,530 to do with law with crime what it comes 28 00:01:08,249 --> 00:01:13,530 down to this quote the idea that 29 00:01:10,530 --> 00:01:15,929 intrusion analysis security analysis 30 00:01:13,530 --> 00:01:19,049 it's about far more than the tools we 31 00:01:15,929 --> 00:01:20,579 use it's about everything and looking at 32 00:01:19,049 --> 00:01:23,399 new ways that we can protect ourselves 33 00:01:20,579 --> 00:01:26,669 from attacks but also predict those 34 00:01:23,399 --> 00:01:27,869 attacks in the first place so what am I 35 00:01:26,670 --> 00:01:29,609 actually going to be talking about today 36 00:01:27,869 --> 00:01:31,319 well I want to break it up into three 37 00:01:29,609 --> 00:01:33,990 main areas I don't talk about what 38 00:01:31,319 --> 00:01:35,069 predictive policing actually is I want 39 00:01:33,990 --> 00:01:37,439 to talk about what natural language 40 00:01:35,069 --> 00:01:38,670 processing is and then finally I wanna 41 00:01:37,439 --> 00:01:41,369 talk about how we can merge these two 42 00:01:38,670 --> 00:01:44,119 ideas together how we can predict crime 43 00:01:41,369 --> 00:01:46,109 using natural language processing 44 00:01:44,119 --> 00:01:48,060 provide you know 45 00:01:46,109 --> 00:01:50,039 my name is James Stevenson and this time 46 00:01:48,060 --> 00:01:51,840 two years ago now I was a student at the 47 00:01:50,039 --> 00:01:54,539 University of South Wales spitting 48 00:01:51,840 --> 00:01:56,429 convenience security before that I was 49 00:01:54,539 --> 00:01:58,349 an intern at alert logic a cloud 50 00:01:56,429 --> 00:02:00,859 security company and these days I'm a 51 00:01:58,349 --> 00:02:03,299 software engineer at BT security 52 00:02:00,859 --> 00:02:05,609 Praetorians ready alert what is 53 00:02:03,299 --> 00:02:07,160 predictive policing I keep thinking 54 00:02:05,609 --> 00:02:08,970 about it what would actually isn't 55 00:02:07,160 --> 00:02:11,640 because if we're going to use natural 56 00:02:08,970 --> 00:02:13,230 language processing to predict crime but 57 00:02:11,640 --> 00:02:15,940 we kind of need to know what predictive 58 00:02:13,230 --> 00:02:17,579 policing it's 59 00:02:15,940 --> 00:02:19,950 and it comes down to two main areas 60 00:02:17,580 --> 00:02:23,560 location-based predictive policing and 61 00:02:19,950 --> 00:02:25,149 individual based predictiveness now 62 00:02:23,560 --> 00:02:26,710 location-based predictive policing as 63 00:02:25,150 --> 00:02:29,890 the name suggests is all about looking 64 00:02:26,710 --> 00:02:33,730 at an area so saying in this area in the 65 00:02:29,890 --> 00:02:35,109 future is a crime likely took it now 66 00:02:33,730 --> 00:02:37,510 this map is a great example of 67 00:02:35,110 --> 00:02:39,130 location-based predictive policing so 68 00:02:37,510 --> 00:02:41,200 this is a map of London between a 69 00:02:39,130 --> 00:02:43,359 specific time period or the darker the 70 00:02:41,200 --> 00:02:45,790 color in terms that the more crime has a 71 00:02:43,360 --> 00:02:48,600 kid now this is a great example because 72 00:02:45,790 --> 00:02:51,579 we can say okay if a crime has occurred 73 00:02:48,600 --> 00:02:53,680 under these specific circumstances in 74 00:02:51,580 --> 00:02:55,570 the past but it means that crime isn't 75 00:02:53,680 --> 00:02:57,610 likely to occur under these same 76 00:02:55,570 --> 00:02:59,440 circumstances in the future 77 00:02:57,610 --> 00:03:01,720 now today were actually gonna be 78 00:02:59,440 --> 00:03:04,120 focusing on individual-based predictive 79 00:03:01,720 --> 00:03:06,250 policing now individual based predictive 80 00:03:04,120 --> 00:03:09,430 policing is all about looking at an 81 00:03:06,250 --> 00:03:12,520 individual and saying how likely is this 82 00:03:09,430 --> 00:03:14,020 individual to Connecticut and when it 83 00:03:12,520 --> 00:03:15,970 comes to this type of predictive 84 00:03:14,020 --> 00:03:18,520 policing with masks different questions 85 00:03:15,970 --> 00:03:21,630 will go down different routes different 86 00:03:18,520 --> 00:03:24,070 avenues I will be left with that school 87 00:03:21,630 --> 00:03:26,590 no today we're going to focus on three 88 00:03:24,070 --> 00:03:29,260 types or three approaches that can allow 89 00:03:26,590 --> 00:03:30,880 us to do justice the first of these 90 00:03:29,260 --> 00:03:33,970 approaches we're gonna look at is called 91 00:03:30,880 --> 00:03:37,000 strain theory now strain theory is the 92 00:03:33,970 --> 00:03:39,700 idea that society puts pressure on 93 00:03:37,000 --> 00:03:42,670 individuals to achieve specific thoughts 94 00:03:39,700 --> 00:03:44,679 like the American dream but when 95 00:03:42,670 --> 00:03:48,100 individuals lack the means to achieve 96 00:03:44,680 --> 00:03:51,250 those goals the more likely técnicos so 97 00:03:48,100 --> 00:03:52,480 that they can achieve them the next 98 00:03:51,250 --> 00:03:55,480 theory that we're gonna look at is 99 00:03:52,480 --> 00:03:58,329 called social control here now social 100 00:03:55,480 --> 00:04:00,730 control theory is the idea that people 101 00:03:58,330 --> 00:04:02,890 who lack close relationships commitments 102 00:04:00,730 --> 00:04:04,690 values on roles are more likely to 103 00:04:02,890 --> 00:04:07,510 commit crimes because they don't have 104 00:04:04,690 --> 00:04:10,989 those relationships or values as an 105 00:04:07,510 --> 00:04:12,489 anchor in society and the final theory 106 00:04:10,989 --> 00:04:15,130 that we're going to look at today it's 107 00:04:12,489 --> 00:04:17,168 called social disorganization view now 108 00:04:15,130 --> 00:04:20,798 what this very states is it says that 109 00:04:17,168 --> 00:04:22,990 location is key if you live for a work 110 00:04:20,798 --> 00:04:25,929 in an area known for a specific type of 111 00:04:22,990 --> 00:04:28,270 crime this theory states an intrinsic 112 00:04:25,930 --> 00:04:30,159 link by being that the more likely 113 00:04:28,270 --> 00:04:31,748 definitely 114 00:04:30,159 --> 00:04:33,909 so so far we've looked at what 115 00:04:31,749 --> 00:04:36,069 predictive policing is different types 116 00:04:33,909 --> 00:04:38,199 of predictive policing and how we can 117 00:04:36,069 --> 00:04:40,869 use predictive policing approaches to 118 00:04:38,199 --> 00:04:42,879 predict call but this talk is all about 119 00:04:40,869 --> 00:04:45,129 natural language process so about how we 120 00:04:42,879 --> 00:04:48,069 can use natural language processing to 121 00:04:45,129 --> 00:04:50,050 do just that but before we talk about 122 00:04:48,069 --> 00:04:53,619 natural language processing we need to 123 00:04:50,050 --> 00:04:55,899 talk about language this serve us as 124 00:04:53,619 --> 00:04:58,239 human beings language comes down to 125 00:04:55,899 --> 00:05:01,179 these three main theories speaking 126 00:04:58,239 --> 00:05:04,149 reading and writing things that we all 127 00:05:01,179 --> 00:05:06,099 do every day so because we do these 128 00:05:04,149 --> 00:05:08,259 things every day most of us or maybe 129 00:05:06,099 --> 00:05:12,459 some of us will be able to answer this 130 00:05:08,259 --> 00:05:19,139 question Harris - grunts plus England 131 00:05:12,459 --> 00:05:22,749 equals what now the answer isn't London 132 00:05:19,139 --> 00:05:25,689 because her sister France has London is 133 00:05:22,749 --> 00:05:28,089 - ain't it now if we didn't know that 134 00:05:25,689 --> 00:05:30,909 that was the answer why did we know that 135 00:05:28,089 --> 00:05:32,289 was it well we would have known that 136 00:05:30,909 --> 00:05:35,199 that was the answer because of the 137 00:05:32,289 --> 00:05:38,378 experiences of head we've read books 138 00:05:35,199 --> 00:05:40,479 go on the internet spoken to people and 139 00:05:38,379 --> 00:05:43,689 that's all built on knowledge base and 140 00:05:40,479 --> 00:05:45,550 our understanding so then if we were to 141 00:05:43,689 --> 00:05:47,919 give that question to our natural 142 00:05:45,550 --> 00:05:51,219 language processing machine would it be 143 00:05:47,919 --> 00:05:54,219 able to answer it well yes but only if 144 00:05:51,219 --> 00:05:57,009 we gave it the right context so this is 145 00:05:54,219 --> 00:05:59,079 the Wikipedia article for London and if 146 00:05:57,009 --> 00:06:00,909 we fed this into our natural language 147 00:05:59,079 --> 00:06:03,669 processing machine you would learn from 148 00:06:00,909 --> 00:06:06,279 that surrounding context northern that 149 00:06:03,669 --> 00:06:09,448 London is a city the only in that London 150 00:06:06,279 --> 00:06:11,349 is in the UK of which England is as well 151 00:06:09,449 --> 00:06:15,369 building that knowledge base and 152 00:06:11,349 --> 00:06:16,959 building down so that's how natural 153 00:06:15,369 --> 00:06:19,569 language processing works had a 154 00:06:16,959 --> 00:06:21,069 sentiment analysis week because 155 00:06:19,569 --> 00:06:23,289 sentiment analysis is all about that 156 00:06:21,069 --> 00:06:25,719 tool looking at a specific piece of text 157 00:06:23,289 --> 00:06:29,318 and saying what is the emotion what is 158 00:06:25,719 --> 00:06:31,329 the sentiment behind that text however 159 00:06:29,319 --> 00:06:34,569 it comes to us as human beings we have 160 00:06:31,329 --> 00:06:36,519 eight main pillars to our emotions but 161 00:06:34,569 --> 00:06:37,300 the sentiment analysis really really 162 00:06:36,519 --> 00:06:40,319 care about 163 00:06:37,300 --> 00:06:43,360 - that's positive emotions and negative 164 00:06:40,319 --> 00:06:43,790 emotions so how do we translate those 165 00:06:43,360 --> 00:06:47,450 eight 166 00:06:43,790 --> 00:06:49,520 hell is down to two well well we're 167 00:06:47,450 --> 00:06:51,590 talking about positive emotions we're 168 00:06:49,520 --> 00:06:55,099 really talking about trust joy anger 169 00:06:51,590 --> 00:06:56,210 this price and given the red herring and 170 00:06:55,100 --> 00:06:58,160 what we're talking about negative 171 00:06:56,210 --> 00:07:02,870 emotions we're talking about disgust 172 00:06:58,160 --> 00:07:04,790 sadness fear and anticipation and so if 173 00:07:02,870 --> 00:07:05,960 those are the emotions that we're 174 00:07:04,790 --> 00:07:07,760 talking about when we referred to 175 00:07:05,960 --> 00:07:11,960 sentiment analysis how do we actually 176 00:07:07,760 --> 00:07:13,370 get that emotion from text well if the 177 00:07:11,960 --> 00:07:16,460 same as in most machine learning 178 00:07:13,370 --> 00:07:17,990 approaches we take a massive dataset now 179 00:07:16,460 --> 00:07:20,659 that data set for us it's going to be 180 00:07:17,990 --> 00:07:23,240 fresh drawn abuse each element in that 181 00:07:20,660 --> 00:07:26,420 set is broken down into two sub sections 182 00:07:23,240 --> 00:07:29,570 the actual review and then the sentiment 183 00:07:26,420 --> 00:07:32,000 of that review for example I love my 184 00:07:29,570 --> 00:07:34,540 local pizza restaurant positive sentence 185 00:07:32,000 --> 00:07:37,670 well this place has really gone downhill 186 00:07:34,540 --> 00:07:40,640 negative sentence we then break this 187 00:07:37,670 --> 00:07:43,760 dataset down into we have our training 188 00:07:40,640 --> 00:07:45,409 set and we have our testing set when it 189 00:07:43,760 --> 00:07:47,450 comes to training our natural language 190 00:07:45,410 --> 00:07:49,130 processing machine we ask it to go 191 00:07:47,450 --> 00:07:51,110 through those entities and to have a 192 00:07:49,130 --> 00:07:53,180 look at what keywords are more prominent 193 00:07:51,110 --> 00:07:55,340 with a positive sentiment and what 194 00:07:53,180 --> 00:07:58,820 keywords are more prominent be negative 195 00:07:55,340 --> 00:07:59,929 certain then when it comes to testing we 196 00:07:58,820 --> 00:08:02,990 ask it to go through the remaining 197 00:07:59,930 --> 00:08:05,270 entities and fit for it to tell us what 198 00:08:02,990 --> 00:08:06,950 the sentiment is and then if that 199 00:08:05,270 --> 00:08:09,770 matches the sentiment we know them to 200 00:08:06,950 --> 00:08:13,190 have cooked so if it doesn't it mean 201 00:08:09,770 --> 00:08:15,109 something as well more so that's how 202 00:08:13,190 --> 00:08:17,060 natural language processing works 203 00:08:15,110 --> 00:08:19,340 that's have sentiment analysis works 204 00:08:17,060 --> 00:08:20,470 what are it exists what are some 205 00:08:19,340 --> 00:08:24,560 examples of natural language processing 206 00:08:20,470 --> 00:08:26,480 in the real well this is a WS comprehend 207 00:08:24,560 --> 00:08:29,360 or specifically comprehending medical 208 00:08:26,480 --> 00:08:31,010 which is a delicious approach to natural 209 00:08:29,360 --> 00:08:34,400 language processing when it comes to 210 00:08:31,010 --> 00:08:36,229 medicine and healthcare a doctor or 211 00:08:34,400 --> 00:08:38,449 health care professional will type in a 212 00:08:36,229 --> 00:08:40,550 patient's details symptoms information 213 00:08:38,450 --> 00:08:42,770 the natural language processing tool 214 00:08:40,549 --> 00:08:44,810 will go off do its thing and it will 215 00:08:42,770 --> 00:08:45,800 come back with T bits of information the 216 00:08:44,810 --> 00:08:49,189 things that have that health care 217 00:08:45,800 --> 00:08:51,770 professional Austin it next we have 218 00:08:49,190 --> 00:08:53,440 Taylor a I know Taylor I was Microsoft's 219 00:08:51,770 --> 00:08:57,050 approach to natural language processing 220 00:08:53,440 --> 00:08:59,180 when it came to a Twitter chat it 221 00:08:57,050 --> 00:09:01,790 Tarek's response to people depending on 222 00:08:59,180 --> 00:09:03,649 how people smoked it now it's quite 223 00:09:01,790 --> 00:09:06,370 controversial it lasted just under 24 224 00:09:03,649 --> 00:09:08,930 hours but nonetheless is a great example 225 00:09:06,370 --> 00:09:11,570 and then finally we have predictive text 226 00:09:08,930 --> 00:09:13,670 so overnight you had an Android or an 227 00:09:11,570 --> 00:09:15,560 iPhone where that predictive text this 228 00:09:13,670 --> 00:09:19,219 probably works in your device by using 229 00:09:15,560 --> 00:09:21,339 natural language process so there we 230 00:09:19,220 --> 00:09:24,680 have three great examples so healthcare 231 00:09:21,339 --> 00:09:26,450 communications and mobile phones but 232 00:09:24,680 --> 00:09:28,310 none of those examples look at how we 233 00:09:26,450 --> 00:09:30,410 can use natural language processing to 234 00:09:28,310 --> 00:09:33,050 predict decline which is why we're here 235 00:09:30,410 --> 00:09:36,079 for this talk so let's do that 236 00:09:33,050 --> 00:09:38,630 well this is Alice and it's Alice's job 237 00:09:36,079 --> 00:09:40,849 to do just that it's Alice's job to 238 00:09:38,630 --> 00:09:43,399 predict crime where she currently does 239 00:09:40,850 --> 00:09:45,019 this she individually and manually goes 240 00:09:43,399 --> 00:09:47,570 to different web sites chat forums 241 00:09:45,019 --> 00:09:49,610 social media accounts and two profiles 242 00:09:47,570 --> 00:09:52,550 individuals on their likelihood of 243 00:09:49,610 --> 00:09:55,490 commit crime but that's slow and 244 00:09:52,550 --> 00:09:58,189 laborious so how can we take this to the 245 00:09:55,490 --> 00:10:00,110 next level well we can automate it we 246 00:09:58,190 --> 00:10:02,510 can scrape these websites the same 247 00:10:00,110 --> 00:10:04,220 information that would help we can use 248 00:10:02,510 --> 00:10:06,620 natural language processing on the 249 00:10:04,220 --> 00:10:09,200 response and then we can return to Alice 250 00:10:06,620 --> 00:10:13,070 a school a score of how likely that 251 00:10:09,200 --> 00:10:14,209 individual is to commit a crime and this 252 00:10:13,070 --> 00:10:16,130 is what we're gonna be focusing on for 253 00:10:14,209 --> 00:10:17,329 the rest of the day we're going to be 254 00:10:16,130 --> 00:10:19,670 talking about how we could build a 255 00:10:17,329 --> 00:10:22,819 conceptual framework that allows us to 256 00:10:19,670 --> 00:10:25,000 do just this so if we were to create 257 00:10:22,820 --> 00:10:28,220 this framework what would we need to do 258 00:10:25,000 --> 00:10:31,160 well first of all Alice we need to sit 259 00:10:28,220 --> 00:10:33,470 down and decide on the impact for we'll 260 00:10:31,160 --> 00:10:36,170 be profiling and what is the impact of 261 00:10:33,470 --> 00:10:38,300 those individuals if that individual was 262 00:10:36,170 --> 00:10:40,810 to commit that crime or was to perform 263 00:10:38,300 --> 00:10:42,829 that attacked what would the impact be 264 00:10:40,810 --> 00:10:44,359 and this comes down for those three main 265 00:10:42,829 --> 00:10:48,890 areas the loss of confidentiality 266 00:10:44,360 --> 00:10:50,600 integrity and available once we have 267 00:10:48,890 --> 00:10:52,970 that impact we will work out our 268 00:10:50,600 --> 00:10:56,029 likelihood what is the likelihood of 269 00:10:52,970 --> 00:10:58,040 that individual committing that crime or 270 00:10:56,029 --> 00:10:59,120 performing that attack well this is 271 00:10:58,040 --> 00:11:00,980 where we're going to go back to those 272 00:10:59,120 --> 00:11:03,829 predictive policing approaches that we 273 00:11:00,980 --> 00:11:06,320 mentioned earlier on first of all you 274 00:11:03,829 --> 00:11:08,120 scrape these websites for that text we 275 00:11:06,320 --> 00:11:10,520 use natural language processing on the 276 00:11:08,120 --> 00:11:12,680 response and now we're saying 277 00:11:10,520 --> 00:11:15,949 does that text continue reference to any 278 00:11:12,680 --> 00:11:19,310 goals or aspirations and then if it does 279 00:11:15,950 --> 00:11:21,650 what is the sentiment next we take that 280 00:11:19,310 --> 00:11:24,020 same bit of text we see now does that 281 00:11:21,650 --> 00:11:26,660 text contain reference to any close 282 00:11:24,020 --> 00:11:30,350 relationships any individuals any groups 283 00:11:26,660 --> 00:11:33,170 in the organizations and if so what is 284 00:11:30,350 --> 00:11:35,510 the sentiment and then finally we take 285 00:11:33,170 --> 00:11:37,219 that same bit of text once again and we 286 00:11:35,510 --> 00:11:40,790 say now does that text contain reference 287 00:11:37,220 --> 00:11:42,860 to the individuals location if so is 288 00:11:40,790 --> 00:11:45,430 that the location no that type of crime 289 00:11:42,860 --> 00:11:47,680 and then finally what is the sentiment 290 00:11:45,430 --> 00:11:50,839 we think of through each of these trees 291 00:11:47,680 --> 00:11:54,589 aggregating a score as we go and that 292 00:11:50,840 --> 00:11:57,110 score is our overall like it we can then 293 00:11:54,590 --> 00:11:59,660 use that whoever our impact to get our 294 00:11:57,110 --> 00:12:03,830 risk and that risk is the risk that this 295 00:11:59,660 --> 00:12:05,630 individual poses to Alice entity now 296 00:12:03,830 --> 00:12:07,700 that we have that risk it's just about 297 00:12:05,630 --> 00:12:09,820 collecting too much information as we 298 00:12:07,700 --> 00:12:12,140 can using natural language processing 299 00:12:09,820 --> 00:12:15,400 moon collect information like come on 300 00:12:12,140 --> 00:12:17,960 topics trans age gender and race 301 00:12:15,400 --> 00:12:21,140 occupation salary and religion and then 302 00:12:17,960 --> 00:12:23,180 any dates and times now the reason why 303 00:12:21,140 --> 00:12:25,189 we haven't focused on this information 304 00:12:23,180 --> 00:12:27,979 today it's because this information has 305 00:12:25,190 --> 00:12:30,740 the scope and the potential of becoming 306 00:12:27,980 --> 00:12:35,540 significantly more bucks and that's 307 00:12:30,740 --> 00:12:37,250 really a talk for another day so finally 308 00:12:35,540 --> 00:12:39,199 as part of this conceptual framework 309 00:12:37,250 --> 00:12:41,750 it's about creating a naming convention 310 00:12:39,200 --> 00:12:44,000 an image invention that we can instantly 311 00:12:41,750 --> 00:12:45,860 clean information from without including 312 00:12:44,000 --> 00:12:49,280 any of that personally identifiable 313 00:12:45,860 --> 00:12:51,740 information so this name is broken down 314 00:12:49,280 --> 00:12:54,949 into four main areas the source of the 315 00:12:51,740 --> 00:12:57,170 data the time it occurred reverse score 316 00:12:54,950 --> 00:13:01,160 and then a pseudo-random word that gives 317 00:12:57,170 --> 00:13:03,260 the name some uniqueness so we looked at 318 00:13:01,160 --> 00:13:05,630 what predictive policing is we look at 319 00:13:03,260 --> 00:13:07,040 what natural language processing is we 320 00:13:05,630 --> 00:13:11,330 also look at how we can use these two 321 00:13:07,040 --> 00:13:12,500 ideas together to predict crime but you 322 00:13:11,330 --> 00:13:14,180 might be thinking well James that's 323 00:13:12,500 --> 00:13:16,190 great but why are we talking about 324 00:13:14,180 --> 00:13:18,800 predictive policing and a computer 325 00:13:16,190 --> 00:13:21,440 security conference and well it comes 326 00:13:18,800 --> 00:13:22,990 back to this quote the idea that 327 00:13:21,440 --> 00:13:25,270 intrusion analysis 328 00:13:22,990 --> 00:13:27,820 security analysis it's about far more 329 00:13:25,270 --> 00:13:29,740 than the tools we use it's about in 330 00:13:27,820 --> 00:13:32,410 evicting but it's also about thinking 331 00:13:29,740 --> 00:13:34,120 outside of the box and looking at new 332 00:13:32,410 --> 00:13:36,910 ways that we can protect ourselves from 333 00:13:34,120 --> 00:13:40,450 attacks but also predict those attacks 334 00:13:36,910 --> 00:13:41,980 in the first place so already knows I'm 335 00:13:40,450 --> 00:13:43,570 gonna go through some questions that I 336 00:13:41,980 --> 00:13:45,730 only get that's one of this book 337 00:13:43,570 --> 00:13:47,589 I'll close the talk off and then if we 338 00:13:45,730 --> 00:13:50,020 have any additional questions I'll give 339 00:13:47,589 --> 00:13:51,520 written them so the first question we 340 00:13:50,020 --> 00:13:54,640 have here is is predictive policing 341 00:13:51,520 --> 00:13:56,740 better than what is now I think the 342 00:13:54,640 --> 00:13:59,020 answer here is no predictive policing is 343 00:13:56,740 --> 00:14:00,970 a tool it's a supplement it's something 344 00:13:59,020 --> 00:14:02,649 that should be used in addition to 345 00:14:00,970 --> 00:14:04,810 normal placing and isn't there to 346 00:14:02,649 --> 00:14:07,360 replace police nor is it that to replace 347 00:14:04,810 --> 00:14:10,479 police intuition the next question we 348 00:14:07,360 --> 00:14:12,160 have is predictive policing bias again 349 00:14:10,480 --> 00:14:15,550 the short answer is yes 350 00:14:12,160 --> 00:14:17,319 but even policing is quite bias the 351 00:14:15,550 --> 00:14:21,310 prison for that is predictive policing 352 00:14:17,320 --> 00:14:24,760 is garbage in garbage out if our data is 353 00:14:21,310 --> 00:14:27,369 bias it means how frameworks are going 354 00:14:24,760 --> 00:14:29,290 to be biased also and the problem with 355 00:14:27,370 --> 00:14:31,899 crime data is that it's intrinsically 356 00:14:29,290 --> 00:14:34,329 bias because we have crimes that aren't 357 00:14:31,899 --> 00:14:36,459 documented maybe like assault means that 358 00:14:34,329 --> 00:14:39,040 the data we get from that isn't accurate 359 00:14:36,459 --> 00:14:42,670 is it representative of the real world 360 00:14:39,040 --> 00:14:45,250 and that's fine as long as we remember 361 00:14:42,670 --> 00:14:46,479 the answer to that first question as 362 00:14:45,250 --> 00:14:49,510 long as we remember that predictive 363 00:14:46,480 --> 00:14:51,670 policing is a tool it's a suite and it's 364 00:14:49,510 --> 00:14:54,790 to be used in addition to what we're 365 00:14:51,670 --> 00:14:56,349 here to do next question there is is 366 00:14:54,790 --> 00:14:58,390 predictive policing used in the real 367 00:14:56,350 --> 00:15:00,430 world so pretty deep listens used in the 368 00:14:58,390 --> 00:15:01,660 UK it's also used in the u.s. one 369 00:15:00,430 --> 00:15:03,790 example that received quite a lot of 370 00:15:01,660 --> 00:15:04,270 media attention was the LAPD in the 371 00:15:03,790 --> 00:15:06,790 States 372 00:15:04,270 --> 00:15:09,010 they had a scheme called laser the way 373 00:15:06,790 --> 00:15:09,760 that laser which is assigned a score to 374 00:15:09,010 --> 00:15:11,430 ex-offenders 375 00:15:09,760 --> 00:15:14,380 that school was based on their behavior 376 00:15:11,430 --> 00:15:16,839 and the way that that worked is anyone 377 00:15:14,380 --> 00:15:20,260 in that top bracket would then receive a 378 00:15:16,839 --> 00:15:22,390 follow-up visit crumples the penultimate 379 00:15:20,260 --> 00:15:24,430 question we have here is how good is 380 00:15:22,390 --> 00:15:27,339 natural language processing at picking 381 00:15:24,430 --> 00:15:29,079 up differences or nuances in text and 382 00:15:27,339 --> 00:15:31,000 there's an example I like living for 383 00:15:29,079 --> 00:15:32,920 this and it was an example of a natural 384 00:15:31,000 --> 00:15:35,290 language processing tool that was 385 00:15:32,920 --> 00:15:36,490 created to simultaneously understand two 386 00:15:35,290 --> 00:15:37,630 different languages 387 00:15:36,490 --> 00:15:39,580 I think that's really interesting 388 00:15:37,630 --> 00:15:42,100 because that goes to share than actually 389 00:15:39,580 --> 00:15:44,020 in some cases natural language 390 00:15:42,100 --> 00:15:46,210 processing can be better at 391 00:15:44,020 --> 00:15:48,970 understanding text than some of us as 392 00:15:46,210 --> 00:15:50,980 human beings and finally what's next 393 00:15:48,970 --> 00:15:52,660 well specifically for this framework is 394 00:15:50,980 --> 00:15:54,460 about creating a proof of concept a 395 00:15:52,660 --> 00:15:56,560 prove a concept that allows us to do 396 00:15:54,460 --> 00:15:58,750 just this allows us to look at risk 397 00:15:56,560 --> 00:16:01,270 levels for individuals that allows us to 398 00:15:58,750 --> 00:16:03,370 pivot off social media information and 399 00:16:01,270 --> 00:16:05,140 allows it to a power ottoman and allows 400 00:16:03,370 --> 00:16:08,860 us to look at crime signals so 401 00:16:05,140 --> 00:16:10,750 indicators the cry however that is this 402 00:16:08,860 --> 00:16:12,640 talk came to a close if you do have any 403 00:16:10,750 --> 00:16:14,020 questions feel free to ask me now come 404 00:16:12,640 --> 00:16:15,880 find me afterwards or i'm also on 405 00:16:14,020 --> 00:16:18,840 twitter and underscore james deals 406 00:16:15,880 --> 00:16:18,840 thanks