1 00:00:03,240 --> 00:00:09,370 so hello good evening my name is Wu and 2 00:00:06,970 --> 00:00:11,739 I'll try to keep this short because I 3 00:00:09,370 --> 00:00:14,219 guess everyone just wants to go and have 4 00:00:11,740 --> 00:00:18,550 dinner and a beer I sure do 5 00:00:14,220 --> 00:00:20,349 so actually some of the issues that were 6 00:00:18,550 --> 00:00:22,689 discussed during the Q&A is from 7 00:00:20,349 --> 00:00:24,810 previous presentations were the topic of 8 00:00:22,689 --> 00:00:28,329 my thesis which is the enrichment and 9 00:00:24,810 --> 00:00:31,179 creation of IOC's of quality out of a 10 00:00:28,329 --> 00:00:34,030 since it was my master's thesis 11 00:00:31,179 --> 00:00:37,570 those are my advisers and I did it in 12 00:00:34,030 --> 00:00:40,060 the university of lisbon why is why do 13 00:00:37,570 --> 00:00:42,190 we care about this because the same 14 00:00:40,060 --> 00:00:45,280 reason why we know will not get out of a 15 00:00:42,190 --> 00:00:48,339 job in the near future it costs a lot of 16 00:00:45,280 --> 00:00:50,680 money 4 percent of global GDP next year 17 00:00:48,340 --> 00:00:55,660 is expected to be lost to cybercrime and 18 00:00:50,680 --> 00:00:57,820 why does this happen two reasons we're 19 00:00:55,660 --> 00:00:59,919 working against people that are highly 20 00:00:57,820 --> 00:01:02,290 focused and highly dedicated to what 21 00:00:59,920 --> 00:01:06,040 they're doing they see that the return 22 00:01:02,290 --> 00:01:10,030 on investments is has a lot of potential 23 00:01:06,040 --> 00:01:12,340 and they also are using new ways of 24 00:01:10,030 --> 00:01:14,290 attacking which also increase the 25 00:01:12,340 --> 00:01:19,479 difficulty that we have in combating 26 00:01:14,290 --> 00:01:22,600 this threat so where are we we started 27 00:01:19,479 --> 00:01:24,700 by having no defenses whatsoever and the 28 00:01:22,600 --> 00:01:26,649 beginning the internet was something 29 00:01:24,700 --> 00:01:29,290 that was supposed to be for sharing and 30 00:01:26,650 --> 00:01:33,810 no one even considered the hypothesis of 31 00:01:29,290 --> 00:01:37,000 it's being misused it quickly we 32 00:01:33,810 --> 00:01:39,100 understood the error of our ways and we 33 00:01:37,000 --> 00:01:40,930 started installing perimeter defenses 34 00:01:39,100 --> 00:01:43,780 the problem with the perimeter defense 35 00:01:40,930 --> 00:01:47,560 is if you build a wall if someone 36 00:01:43,780 --> 00:01:50,200 manages to cross that wall you don't 37 00:01:47,560 --> 00:01:53,500 have anything else so we move to in that 38 00:01:50,200 --> 00:01:55,360 defenses which works fine until you find 39 00:01:53,500 --> 00:01:57,850 an adversary that is capable of adapting 40 00:01:55,360 --> 00:02:01,600 to the what you have within your network 41 00:01:57,850 --> 00:02:05,380 and creates you will change the way he 42 00:02:01,600 --> 00:02:07,750 is attacking to adapt to your new to 43 00:02:05,380 --> 00:02:11,519 what you're monitoring and so we moved 44 00:02:07,750 --> 00:02:16,140 to dynamic response defense which is 45 00:02:11,520 --> 00:02:18,970 three things advanced malware detection 46 00:02:16,140 --> 00:02:21,279 event anomaly detection and what brings 47 00:02:18,970 --> 00:02:25,300 us all here intelligence driven defense 48 00:02:21,280 --> 00:02:29,080 and that's where we decided to work so 49 00:02:25,300 --> 00:02:31,650 where are we now in this field we moved 50 00:02:29,080 --> 00:02:34,420 to from manually sharing their knowledge 51 00:02:31,650 --> 00:02:37,120 to creating platforms to help us share 52 00:02:34,420 --> 00:02:40,899 the knowledge however and most of you 53 00:02:37,120 --> 00:02:43,630 are surely aware there was a report by 54 00:02:40,900 --> 00:02:47,170 in Issa and the beginning of this year 55 00:02:43,630 --> 00:02:48,790 which criticized the initiatives not in 56 00:02:47,170 --> 00:02:50,619 the sense that they weren't needed but 57 00:02:48,790 --> 00:02:52,810 in the sense that we still have a lot to 58 00:02:50,620 --> 00:02:55,000 work until we get to a place where we 59 00:02:52,810 --> 00:02:57,190 can be comfortable what with the efforts 60 00:02:55,000 --> 00:03:00,280 we're putting into this field and so 61 00:02:57,190 --> 00:03:02,590 they I indicated a lot of issues that 62 00:03:00,280 --> 00:03:04,540 appear first of all we have a high 63 00:03:02,590 --> 00:03:06,940 volume of information that is shared 64 00:03:04,540 --> 00:03:08,980 which makes it hard for people to 65 00:03:06,940 --> 00:03:10,560 understand what they're seeing and to 66 00:03:08,980 --> 00:03:13,660 get the information that they need 67 00:03:10,560 --> 00:03:16,090 furthermore we don't have common sharing 68 00:03:13,660 --> 00:03:18,640 standards we have certain standards that 69 00:03:16,090 --> 00:03:22,450 are more use than others but we should 70 00:03:18,640 --> 00:03:25,779 move towards something that is everyone 71 00:03:22,450 --> 00:03:27,609 agrees it's the way to go and we all use 72 00:03:25,780 --> 00:03:29,290 the same standards instead of having in 73 00:03:27,610 --> 00:03:30,970 my platform I have to have a converter 74 00:03:29,290 --> 00:03:33,519 at the end I have to have other 75 00:03:30,970 --> 00:03:35,650 converters so that I can receive input 76 00:03:33,519 --> 00:03:38,470 and then output the information for 77 00:03:35,650 --> 00:03:42,819 something that you other people can use 78 00:03:38,470 --> 00:03:46,720 and we are collecting data we're not 79 00:03:42,819 --> 00:03:49,510 doing intelligence this is a very 80 00:03:46,720 --> 00:03:52,030 serious issue because collecting data is 81 00:03:49,510 --> 00:03:54,250 not the same as creating intelligence 82 00:03:52,030 --> 00:03:57,599 creating intelligence is answering a 83 00:03:54,250 --> 00:04:01,810 question that you need to have answered 84 00:03:57,599 --> 00:04:03,488 collecting data is just or D and that's 85 00:04:01,810 --> 00:04:06,069 what most of the platforms are currently 86 00:04:03,489 --> 00:04:08,560 doing we're ordering data but then we 87 00:04:06,069 --> 00:04:11,768 have a hard time making it into 88 00:04:08,560 --> 00:04:14,080 something useful so out of these issues 89 00:04:11,769 --> 00:04:16,539 or the thesis we had to choose something 90 00:04:14,080 --> 00:04:19,840 to focus on and we decided to focus on 91 00:04:16,539 --> 00:04:21,760 three challenges first we wanted to 92 00:04:19,839 --> 00:04:24,460 reduce the quantity of information that 93 00:04:21,760 --> 00:04:26,349 reaches the analysts we don't want the 94 00:04:24,460 --> 00:04:27,960 analysts to have to sort through all the 95 00:04:26,350 --> 00:04:30,210 information that is receiving 96 00:04:27,960 --> 00:04:32,430 a daily basis if we consider that there 97 00:04:30,210 --> 00:04:35,068 are tens of thousands of new malware 98 00:04:32,430 --> 00:04:37,169 samples that are detected daily it's 99 00:04:35,069 --> 00:04:40,199 impossible for a human being to deal 100 00:04:37,169 --> 00:04:41,758 with that kind of information second we 101 00:04:40,199 --> 00:04:44,310 want to increase the quality of the 102 00:04:41,759 --> 00:04:45,210 information that is being shared or of 103 00:04:44,310 --> 00:04:48,419 the intelligence 104 00:04:45,210 --> 00:04:50,400 this means four things we have to reduce 105 00:04:48,419 --> 00:04:52,229 the timeliness the time between 106 00:04:50,400 --> 00:04:54,989 detection and the information actually 107 00:04:52,229 --> 00:04:57,090 sharing reaching its goal so the 108 00:04:54,990 --> 00:04:59,190 information analysts the security 109 00:04:57,090 --> 00:05:01,619 analysts or the systems that are 110 00:04:59,190 --> 00:05:03,479 defending our network we have to 111 00:05:01,620 --> 00:05:05,460 guarantee that it's both accurate and 112 00:05:03,479 --> 00:05:07,400 relevant so we have to guarantee that 113 00:05:05,460 --> 00:05:10,799 the information that we are using 114 00:05:07,400 --> 00:05:13,560 actually has some answers or needs and 115 00:05:10,800 --> 00:05:16,259 it's not if I work in a bank and I 116 00:05:13,560 --> 00:05:16,740 actually do I don't care about the 117 00:05:16,259 --> 00:05:19,199 treads 118 00:05:16,740 --> 00:05:21,930 that's effects like those navigation 119 00:05:19,199 --> 00:05:26,159 systems of an airplane I just care about 120 00:05:21,930 --> 00:05:28,289 stress that's will impact on what I have 121 00:05:26,159 --> 00:05:31,199 within my network and finally 122 00:05:28,289 --> 00:05:34,710 completeness and here it's something 123 00:05:31,199 --> 00:05:37,650 that it was approached in the last talk 124 00:05:34,710 --> 00:05:40,530 in the sense which is if we use 125 00:05:37,650 --> 00:05:44,698 different names when we're analyzing a 126 00:05:40,530 --> 00:05:46,830 different same sample when we reach the 127 00:05:44,699 --> 00:05:49,500 end and if I'm analyzing the network 128 00:05:46,830 --> 00:05:52,620 part of the malware and someone is 129 00:05:49,500 --> 00:05:55,800 analyzing the system part we'll reach 130 00:05:52,620 --> 00:06:00,599 two different IO sees that won't be 131 00:05:55,800 --> 00:06:03,180 related and in the end maybe they are in 132 00:06:00,599 --> 00:06:05,520 my database both of them but I'll lose 133 00:06:03,180 --> 00:06:06,240 part of that information because they 134 00:06:05,520 --> 00:06:11,159 aren't connected 135 00:06:06,240 --> 00:06:14,400 and finally automation is key we don't 136 00:06:11,159 --> 00:06:16,500 want to lose time having someone to have 137 00:06:14,400 --> 00:06:19,530 to work on that information we want to 138 00:06:16,500 --> 00:06:21,719 have something that we let go we 139 00:06:19,530 --> 00:06:23,758 configure it and after that it's just 140 00:06:21,719 --> 00:06:27,599 running on our system and completing it 141 00:06:23,759 --> 00:06:29,490 so we looked at NIST as a solution to 142 00:06:27,599 --> 00:06:32,460 start implementing our solution and what 143 00:06:29,490 --> 00:06:35,759 we found is new events can be duplicates 144 00:06:32,460 --> 00:06:37,979 which means it increases the storage 145 00:06:35,759 --> 00:06:40,960 requirements and it's information that's 146 00:06:37,979 --> 00:06:45,070 not useful to anyone that's 147 00:06:40,960 --> 00:06:47,770 we're soaring there ii miss creates 148 00:06:45,070 --> 00:06:49,930 direct connections which means and we've 149 00:06:47,770 --> 00:06:52,120 seen multiple representations during the 150 00:06:49,930 --> 00:06:53,740 day you have an event you see all the 151 00:06:52,120 --> 00:06:56,710 events that share attributes with that 152 00:06:53,740 --> 00:07:00,100 one what if the next one on that level 153 00:06:56,710 --> 00:07:03,120 is something useful to us you would have 154 00:07:00,100 --> 00:07:05,560 to go manually or lose the information 155 00:07:03,120 --> 00:07:08,880 which means that you're either losing 156 00:07:05,560 --> 00:07:13,180 time or losing information so we try to 157 00:07:08,880 --> 00:07:16,240 resolve this situation how first of all 158 00:07:13,180 --> 00:07:18,280 we considered clustering and aggregating 159 00:07:16,240 --> 00:07:20,979 information that is related to one 160 00:07:18,280 --> 00:07:23,948 another so as to create a new enriched 161 00:07:20,979 --> 00:07:26,710 IOC which brings all the information 162 00:07:23,949 --> 00:07:31,270 that is connected into a single report 163 00:07:26,710 --> 00:07:33,520 that will reach the analyst this means 164 00:07:31,270 --> 00:07:35,320 working at two levels of the 165 00:07:33,520 --> 00:07:36,758 architecture we have to work in the 166 00:07:35,320 --> 00:07:39,009 configuration of the threat intelligence 167 00:07:36,759 --> 00:07:41,560 platform which means we have to know 168 00:07:39,009 --> 00:07:43,360 what we want to answer so that we make 169 00:07:41,560 --> 00:07:46,659 the correct choices when preparing our 170 00:07:43,360 --> 00:07:48,930 platform to answer them and second we 171 00:07:46,659 --> 00:07:51,520 have to work in the internal processing 172 00:07:48,930 --> 00:07:54,580 capabilities which is the platform needs 173 00:07:51,520 --> 00:07:59,229 to be able to do this operation by 174 00:07:54,580 --> 00:08:03,760 itself instead of doing it you instead 175 00:07:59,229 --> 00:08:06,669 of disappear the analyst doing it so 176 00:08:03,760 --> 00:08:09,400 then lastly Automation by the design 177 00:08:06,669 --> 00:08:11,620 which is basically something that is 178 00:08:09,400 --> 00:08:16,239 required at pre requirement for anything 179 00:08:11,620 --> 00:08:20,020 we do so we designed this solution we 180 00:08:16,240 --> 00:08:24,400 have the sources which should be focused 181 00:08:20,020 --> 00:08:27,609 on thread feeds that matter to us we 182 00:08:24,400 --> 00:08:30,130 have a layer of other threat 183 00:08:27,610 --> 00:08:33,669 intelligence platforms to try to use 184 00:08:30,130 --> 00:08:35,799 what they bring to the to the what they 185 00:08:33,669 --> 00:08:37,838 bring to the product and by this I mean 186 00:08:35,799 --> 00:08:41,828 for instance using in talent you to 187 00:08:37,839 --> 00:08:43,870 enrich IPS and DNS so that when we have 188 00:08:41,828 --> 00:08:46,390 the information reaching what we 189 00:08:43,870 --> 00:08:48,700 developed we have hooks and hooks here 190 00:08:46,390 --> 00:08:52,300 are information that will allow to 191 00:08:48,700 --> 00:08:54,820 create connections to other events that 192 00:08:52,300 --> 00:08:57,400 we have in our database and we created 193 00:08:54,820 --> 00:09:00,160 two modules that the duplicator module 194 00:08:57,400 --> 00:09:03,040 which has the name indicates allow us to 195 00:09:00,160 --> 00:09:05,260 eliminate information that no serves no 196 00:09:03,040 --> 00:09:07,150 purpose because it's a duplicate and a 197 00:09:05,260 --> 00:09:09,910 correlator module which has an 198 00:09:07,150 --> 00:09:13,480 aggregation part and a representation 199 00:09:09,910 --> 00:09:17,140 part to create even rich deoxy so how do 200 00:09:13,480 --> 00:09:20,020 the duplication work we considered an 201 00:09:17,140 --> 00:09:22,569 event as a set and if we consider an 202 00:09:20,020 --> 00:09:26,530 event as a set of attributes you can use 203 00:09:22,570 --> 00:09:30,670 set theory to compare two events and to 204 00:09:26,530 --> 00:09:32,589 decide which one should be this means 205 00:09:30,670 --> 00:09:35,050 that you have to have criterias and the 206 00:09:32,590 --> 00:09:37,840 criterias are found within the metadata 207 00:09:35,050 --> 00:09:40,390 so you'll see for instance if you have 208 00:09:37,840 --> 00:09:42,910 to add events that are have the same 209 00:09:40,390 --> 00:09:45,069 information you see if one of them has 210 00:09:42,910 --> 00:09:47,680 already been validated by human so if 211 00:09:45,070 --> 00:09:50,020 the dress level is higher you can 212 00:09:47,680 --> 00:09:52,770 consider that that one is more valuable 213 00:09:50,020 --> 00:09:57,600 than the previous one which was still 214 00:09:52,770 --> 00:10:01,780 has so hadn't been analyzed and so forth 215 00:09:57,600 --> 00:10:04,030 regarding aggregation methods we the we 216 00:10:01,780 --> 00:10:05,949 defined two methods one of them is 217 00:10:04,030 --> 00:10:08,560 closer to what nest Pass which is the 218 00:10:05,950 --> 00:10:11,770 naive method and we basically focus on 219 00:10:08,560 --> 00:10:15,189 the naive methods on direct connection 220 00:10:11,770 --> 00:10:17,170 so we look at an event we see if it 221 00:10:15,190 --> 00:10:20,380 shares attributes with other events and 222 00:10:17,170 --> 00:10:22,449 then we take that group of events as a 223 00:10:20,380 --> 00:10:26,470 cluster and a potential new enriched 224 00:10:22,450 --> 00:10:29,770 yolk this has a problem an event can 225 00:10:26,470 --> 00:10:34,090 appear on multiple clusters which is 226 00:10:29,770 --> 00:10:36,880 logical we have another alternative 227 00:10:34,090 --> 00:10:38,680 which is the N level aggregation in the 228 00:10:36,880 --> 00:10:42,640 end a level of aggregation what we do 229 00:10:38,680 --> 00:10:45,910 and similarly what you were doing is we 230 00:10:42,640 --> 00:10:49,210 create a graph with where the nodes or 231 00:10:45,910 --> 00:10:52,150 new events or all the events in our 232 00:10:49,210 --> 00:10:55,150 database we then look at shared 233 00:10:52,150 --> 00:10:58,150 attributes to create edges and we set 234 00:10:55,150 --> 00:11:00,730 filters to allow only certain edges to 235 00:10:58,150 --> 00:11:03,310 be created and then we that identify all 236 00:11:00,730 --> 00:11:06,130 the sub graphs and those sub graphs will 237 00:11:03,310 --> 00:11:08,380 form and you enrich to your pour a new 238 00:11:06,130 --> 00:11:12,670 possible enriched yaagh 239 00:11:08,380 --> 00:11:14,650 just an image to give an example if we 240 00:11:12,670 --> 00:11:16,990 were using the knife approach consider 241 00:11:14,650 --> 00:11:19,510 that we have these events in the 242 00:11:16,990 --> 00:11:21,880 database when we look at the first 243 00:11:19,510 --> 00:11:24,520 events it shares one attribute with the 244 00:11:21,880 --> 00:11:27,400 second one in rich dark we move to the 245 00:11:24,520 --> 00:11:31,660 second another energy are another 246 00:11:27,400 --> 00:11:34,590 enriched job and so forth if we move to 247 00:11:31,660 --> 00:11:37,990 an end level aggregation with the same 248 00:11:34,590 --> 00:11:41,560 using exactly the same database we will 249 00:11:37,990 --> 00:11:46,060 first create the notes represent the 250 00:11:41,560 --> 00:11:50,439 notes we would then for each relation 251 00:11:46,060 --> 00:11:54,310 create an edge and finally we would 252 00:11:50,440 --> 00:11:56,560 identify all the sub graphs that exist 253 00:11:54,310 --> 00:11:59,310 in this case we have two and these two 254 00:11:56,560 --> 00:12:04,599 would be possible 255 00:11:59,310 --> 00:12:06,400 enriched yachts so we needed to make a 256 00:12:04,600 --> 00:12:09,820 proof of concept so we developed an 257 00:12:06,400 --> 00:12:12,520 peyten tree over a mis installation we 258 00:12:09,820 --> 00:12:15,010 focused in these two modules so we 259 00:12:12,520 --> 00:12:17,290 basically used everything that miss we 260 00:12:15,010 --> 00:12:19,270 could use out of nests so as not to lose 261 00:12:17,290 --> 00:12:22,000 time I didn't have that much time to 262 00:12:19,270 --> 00:12:24,250 develop so it was basically get the most 263 00:12:22,000 --> 00:12:27,580 out of nests and we made the 264 00:12:24,250 --> 00:12:30,790 implementation to allow two two choices 265 00:12:27,580 --> 00:12:32,950 the first one is we could choose subsets 266 00:12:30,790 --> 00:12:37,300 of yolks that are in our database and 267 00:12:32,950 --> 00:12:40,180 use them as do the selection that way 268 00:12:37,300 --> 00:12:42,069 well at this time we were just working 269 00:12:40,180 --> 00:12:45,069 in a proof-of-concept so we didn't have 270 00:12:42,070 --> 00:12:47,200 a specific target in the future this 271 00:12:45,070 --> 00:12:50,170 would allow if we want to focus on 272 00:12:47,200 --> 00:12:52,540 specific sectors or in specific threats 273 00:12:50,170 --> 00:12:54,099 we can select only those yokes that are 274 00:12:52,540 --> 00:12:56,380 in our database that relate to that 275 00:12:54,100 --> 00:13:01,210 issue instead of losing time analyzing 276 00:12:56,380 --> 00:13:04,420 all the database then we also we also 277 00:13:01,210 --> 00:13:06,400 made valid relationships so we set 278 00:13:04,420 --> 00:13:09,430 different filters that could be used to 279 00:13:06,400 --> 00:13:11,590 make the creation of the enriched yachts 280 00:13:09,430 --> 00:13:14,349 and we made two important assumptions 281 00:13:11,590 --> 00:13:16,870 the first one is trust level correlates 282 00:13:14,350 --> 00:13:20,020 to quality which means that we are 283 00:13:16,870 --> 00:13:22,020 pressing the other participants that 284 00:13:20,020 --> 00:13:24,930 contributes to our database that 285 00:13:22,020 --> 00:13:26,970 if they set a trust level at two they 286 00:13:24,930 --> 00:13:28,979 did their job and that the quality of 287 00:13:26,970 --> 00:13:31,740 the information of that invent is 288 00:13:28,980 --> 00:13:33,630 actually better than another event that 289 00:13:31,740 --> 00:13:36,180 is in the network without having being 290 00:13:33,630 --> 00:13:38,820 certified and that black lists are 291 00:13:36,180 --> 00:13:41,279 correctly tagged this is extremely 292 00:13:38,820 --> 00:13:43,950 important and black lists aren't the 293 00:13:41,279 --> 00:13:46,260 only case there are other types of 294 00:13:43,950 --> 00:13:49,140 events that appear that can mess up the 295 00:13:46,260 --> 00:13:50,580 way we're doing things because they if 296 00:13:49,140 --> 00:13:53,580 you have an event that creates 297 00:13:50,580 --> 00:13:56,100 relationships that aren't indeed useful 298 00:13:53,580 --> 00:13:56,670 it will create an enriched yuk that has 299 00:13:56,100 --> 00:13:58,920 no value 300 00:13:56,670 --> 00:14:01,740 so if you have an a black list it will 301 00:13:58,920 --> 00:14:04,199 bring events from difference incidents 302 00:14:01,740 --> 00:14:06,690 because some of them only lists I piece 303 00:14:04,200 --> 00:14:09,779 without caring if they are related to a 304 00:14:06,690 --> 00:14:13,070 single threat and they will cooperate 305 00:14:09,779 --> 00:14:16,830 everything around them and it's a mess 306 00:14:13,070 --> 00:14:20,880 so we selected an experimental set we 307 00:14:16,830 --> 00:14:23,339 had like we opened 34 feeds from of 308 00:14:20,880 --> 00:14:25,770 organizations collected eleven hundred 309 00:14:23,339 --> 00:14:29,310 and seventy-four events most of them as 310 00:14:25,770 --> 00:14:33,060 you can see are of a high trust level so 311 00:14:29,310 --> 00:14:36,989 in our vision of the world they actually 312 00:14:33,060 --> 00:14:39,630 have high quality and we ran our 313 00:14:36,990 --> 00:14:40,200 platform on that they decide to see how 314 00:14:39,630 --> 00:14:44,220 it would work 315 00:14:40,200 --> 00:14:46,620 and so we did we selected the subset 316 00:14:44,220 --> 00:14:48,839 only those with a trust level of two 317 00:14:46,620 --> 00:14:52,800 were selected and we eliminated all 318 00:14:48,839 --> 00:14:55,529 those that had a tag of blacklist and we 319 00:14:52,800 --> 00:14:58,290 said okay let's see with all the filters 320 00:14:55,529 --> 00:15:00,510 that are currently connected how does it 321 00:14:58,290 --> 00:15:02,969 work and as you can see there are two 322 00:15:00,510 --> 00:15:07,020 factors here that are important or that 323 00:15:02,970 --> 00:15:10,079 are interesting as the filter goes 324 00:15:07,020 --> 00:15:12,360 deeper in detail you reduce the number 325 00:15:10,079 --> 00:15:14,430 of enriched of potentially enriched 326 00:15:12,360 --> 00:15:17,399 yolks that you have which makes sense 327 00:15:14,430 --> 00:15:19,439 because you allow less connections and 328 00:15:17,399 --> 00:15:22,410 the connections that you allow are those 329 00:15:19,440 --> 00:15:24,149 that interest you the most and if you 330 00:15:22,410 --> 00:15:25,860 use the native approach you get a lot 331 00:15:24,149 --> 00:15:28,680 more potential than rich jerks 332 00:15:25,860 --> 00:15:31,050 then if you use the closer approach so 333 00:15:28,680 --> 00:15:33,959 after doing this first experiments we 334 00:15:31,050 --> 00:15:35,819 focused on the cluster approach with the 335 00:15:33,959 --> 00:15:40,880 most restrictive filter 336 00:15:35,820 --> 00:15:44,130 and what we got was this we found 11 337 00:15:40,880 --> 00:15:46,920 potentially enriched chucks which we 338 00:15:44,130 --> 00:15:49,920 then manually went through the their 339 00:15:46,920 --> 00:15:53,790 components and they make sense in here 340 00:15:49,920 --> 00:15:56,099 it's make sense because it's it's the 341 00:15:53,790 --> 00:15:58,800 data sets is still not we needed a 342 00:15:56,100 --> 00:16:01,280 bigger data set we needed to have 343 00:15:58,800 --> 00:16:05,810 evaluate this in a different way we have 344 00:16:01,280 --> 00:16:09,120 developed metrics like you did to try to 345 00:16:05,810 --> 00:16:11,579 to make sense of the information and try 346 00:16:09,120 --> 00:16:15,030 to validate what we have but it's still 347 00:16:11,580 --> 00:16:17,490 an ongoing process to be able to relate 348 00:16:15,030 --> 00:16:20,250 for certain the relevance of these 349 00:16:17,490 --> 00:16:22,140 enriched drugs and actually and you'll 350 00:16:20,250 --> 00:16:26,130 see that that's something at work that 351 00:16:22,140 --> 00:16:28,890 is currently being done so this is a 352 00:16:26,130 --> 00:16:31,740 representation of what we got when we 353 00:16:28,890 --> 00:16:33,569 represented the graph and we're going to 354 00:16:31,740 --> 00:16:37,170 look just at the details of one of them 355 00:16:33,570 --> 00:16:40,200 so this is the enriched df9 it's 356 00:16:37,170 --> 00:16:43,380 composed of several difference attribute 357 00:16:40,200 --> 00:16:46,190 events that are all related or mostly 358 00:16:43,380 --> 00:16:48,870 related through this vulnerability and 359 00:16:46,190 --> 00:16:51,330 another one appears related through this 360 00:16:48,870 --> 00:16:54,210 vulnerability the filter we were using 361 00:16:51,330 --> 00:16:59,430 was if the events shared vulnerability 362 00:16:54,210 --> 00:17:01,650 or attackers so to make to bring forth 363 00:16:59,430 --> 00:17:04,800 something that made more sense and would 364 00:17:01,650 --> 00:17:08,760 be more useful and so here we have the 365 00:17:04,800 --> 00:17:11,760 the list of events that compose it not 366 00:17:08,760 --> 00:17:13,709 very interesting one interesting factor 367 00:17:11,760 --> 00:17:16,589 is and that's something that has 368 00:17:13,709 --> 00:17:19,140 appeared in the literature is DF you can 369 00:17:16,589 --> 00:17:21,179 see if you look at this kind of data you 370 00:17:19,140 --> 00:17:22,770 can see the evolution of a track or in 371 00:17:21,180 --> 00:17:27,180 this case for instance the evolution 372 00:17:22,770 --> 00:17:31,530 from 2014 to 2016 of a vulnerability 373 00:17:27,180 --> 00:17:34,380 over time who used it when why to attack 374 00:17:31,530 --> 00:17:36,750 who so this sort of information at the 375 00:17:34,380 --> 00:17:39,300 strategic level we are already getting 376 00:17:36,750 --> 00:17:41,820 something that is useful but now at the 377 00:17:39,300 --> 00:17:44,070 tactical level were where we're going to 378 00:17:41,820 --> 00:17:47,399 use these Yuk's to inject into a defense 379 00:17:44,070 --> 00:17:49,200 network so needs some working to do just 380 00:17:47,400 --> 00:17:49,380 to be sure that when we create rules we 381 00:17:49,200 --> 00:17:51,060 are 382 00:17:49,380 --> 00:17:53,840 actually creating rules that are useful 383 00:17:51,060 --> 00:17:58,379 and not just cluttering the system so 384 00:17:53,840 --> 00:18:01,280 conclusion we created a new system to to 385 00:17:58,380 --> 00:18:03,660 create intelligence out of us since we 386 00:18:01,280 --> 00:18:05,760 defined two methods to correlate and 387 00:18:03,660 --> 00:18:07,410 aggregate threat intelligence we 388 00:18:05,760 --> 00:18:11,400 developed the platform that proves that 389 00:18:07,410 --> 00:18:14,700 the methods sort of work and we did an 390 00:18:11,400 --> 00:18:16,860 experiment that shows that our proof of 391 00:18:14,700 --> 00:18:19,650 concept was working and as I was saying 392 00:18:16,860 --> 00:18:21,990 currently we are this was work within 393 00:18:19,650 --> 00:18:24,660 the project of DCM which is a European 394 00:18:21,990 --> 00:18:27,530 sponsored project and we're working with 395 00:18:24,660 --> 00:18:32,810 a partner to evaluate the risk score of 396 00:18:27,530 --> 00:18:32,810 the detected yolks and thank you 397 00:18:33,110 --> 00:18:41,908 [Applause] 398 00:19:55,700 --> 00:20:01,950 so repeat repeating the question if I 399 00:19:58,770 --> 00:20:05,100 understood it correctly is if the 400 00:20:01,950 --> 00:20:07,140 approach we're using is not comfort 401 00:20:05,100 --> 00:20:08,969 productive in the long run because we're 402 00:20:07,140 --> 00:20:11,970 limiting the access of the information 403 00:20:08,970 --> 00:20:15,150 and it's a good point what the one you 404 00:20:11,970 --> 00:20:19,490 make it's a good point because indeed it 405 00:20:15,150 --> 00:20:22,890 can happen and even if it doesn't and 406 00:20:19,490 --> 00:20:25,919 when it happens like you could have had 407 00:20:22,890 --> 00:20:28,620 the information in the past but this is 408 00:20:25,919 --> 00:20:31,770 still like this is the beginning of the 409 00:20:28,620 --> 00:20:33,870 beginning of doing this one part of the 410 00:20:31,770 --> 00:20:37,168 problem with this issue is for instance 411 00:20:33,870 --> 00:20:39,899 or threat intelligence quality there is 412 00:20:37,169 --> 00:20:41,700 nothing to quantify it I've looked for 413 00:20:39,900 --> 00:20:44,090 it because it should be something that 414 00:20:41,700 --> 00:20:46,679 you should be able to quantify it but 415 00:20:44,090 --> 00:20:48,480 it's something that you're it's on a 416 00:20:46,679 --> 00:20:52,559 case-by-case basis and what you're 417 00:20:48,480 --> 00:20:55,350 saying makes sense but what I can say is 418 00:20:52,559 --> 00:20:57,149 it all depends on the configurations and 419 00:20:55,350 --> 00:20:59,189 that's why it's so important at the 420 00:20:57,150 --> 00:21:01,679 beginning it I'm not saying that you're 421 00:20:59,190 --> 00:21:04,320 wrong you're absolutely correct that you 422 00:21:01,679 --> 00:21:07,020 should have the best database possible 423 00:21:04,320 --> 00:21:09,120 the idea is to only get the information 424 00:21:07,020 --> 00:21:11,070 that matters at that moment to the 425 00:21:09,120 --> 00:21:13,979 people that are working because you can 426 00:21:11,070 --> 00:21:17,040 always have that in storage you can 427 00:21:13,980 --> 00:21:19,050 accumulate whatever knowledge you think 428 00:21:17,040 --> 00:21:21,418 will useful in the future in your 429 00:21:19,050 --> 00:21:24,780 storage and then use it to process it 430 00:21:21,419 --> 00:21:27,240 later on because the the idea is that is 431 00:21:24,780 --> 00:21:31,770 that this should will work automatically 432 00:21:27,240 --> 00:21:35,250 so currently it's on a I need to run the 433 00:21:31,770 --> 00:21:37,260 the platform basis but it was created 434 00:21:35,250 --> 00:21:39,179 and I designed it so that you could put 435 00:21:37,260 --> 00:21:42,540 it as a thread and it will basically be 436 00:21:39,179 --> 00:21:44,700 running over and over on your database 437 00:21:42,540 --> 00:21:47,070 and checking for the creation of new 438 00:21:44,700 --> 00:21:50,070 enriched yokes or potentially enriched 439 00:21:47,070 --> 00:21:53,939 jobs and yes currently I have a concern 440 00:21:50,070 --> 00:21:57,450 about false positives because we don't 441 00:21:53,940 --> 00:22:01,410 know what a false positive is so if you 442 00:21:57,450 --> 00:22:04,200 don't we got eleven potential enriched 443 00:22:01,410 --> 00:22:06,660 yokes and that's for me and it's an 444 00:22:04,200 --> 00:22:08,940 issue that I have with my instance with 445 00:22:06,660 --> 00:22:11,370 my advisors because for them it's yeah 446 00:22:08,940 --> 00:22:15,929 you created these enriched yokes 447 00:22:11,370 --> 00:22:19,080 no I created eleven events that have the 448 00:22:15,929 --> 00:22:22,080 information from other events that 449 00:22:19,080 --> 00:22:24,600 should be correlated but there is no way 450 00:22:22,080 --> 00:22:27,059 to be certain at this point that they 451 00:22:24,600 --> 00:22:29,750 are actually what we've created is 452 00:22:27,059 --> 00:22:32,908 better than what we had in the past and 453 00:22:29,750 --> 00:22:34,950 so it's right now we have the proof of 454 00:22:32,909 --> 00:22:37,320 concept and we created something the 455 00:22:34,950 --> 00:22:39,600 future is with the risk score seeing if 456 00:22:37,320 --> 00:22:43,980 that's something that we're creating has 457 00:22:39,600 --> 00:22:45,840 added value and we can see with the 458 00:22:43,980 --> 00:22:48,300 human eye some added value because you 459 00:22:45,840 --> 00:22:50,928 can see for instance difference on a 460 00:22:48,300 --> 00:22:55,168 vulnerability that reappears over time 461 00:22:50,929 --> 00:22:58,980 which would probably not be that's is it 462 00:22:55,169 --> 00:23:02,070 easy to detect you can create other like 463 00:22:58,980 --> 00:23:04,740 we created measures to see how the how 464 00:23:02,070 --> 00:23:07,710 different events and event evolves over 465 00:23:04,740 --> 00:23:11,600 time so you can get those metrics but 466 00:23:07,710 --> 00:23:19,140 this is thus the beginning of a new age 467 00:23:11,600 --> 00:23:21,199 more questions no thank you 468 00:23:19,140 --> 00:23:21,200 you