1 00:00:01,319 --> 00:00:03,919 hello and welcome uh to the presentation 2 00:00:03,919 --> 00:00:06,799 of our work coin GNN based control flow 3 00:00:06,799 --> 00:00:08,599 adastation for embedded 4 00:00:08,599 --> 00:00:11,679 devices uh we did this work with our 5 00:00:11,679 --> 00:00:13,480 colleagues from the Technical University 6 00:00:13,480 --> 00:00:17,119 of damad Nvidia and the University of 7 00:00:17,119 --> 00:00:19,800 Southampton but first of all why do we 8 00:00:19,800 --> 00:00:22,760 even need uh control flow 9 00:00:22,760 --> 00:00:25,880 adastation um over the recent years we 10 00:00:25,880 --> 00:00:28,039 saw a steady increase in remote code 11 00:00:28,039 --> 00:00:30,480 execution vulnerabilities but also at 12 00:00:30,480 --> 00:00:33,879 the same time um an increase in iot 13 00:00:33,879 --> 00:00:35,680 devices 14 00:00:35,680 --> 00:00:39,280 worldwide and this um leads to an 15 00:00:39,280 --> 00:00:43,520 increased exposure uh to uh code reuse 16 00:00:43,520 --> 00:00:48,120 attacks um in these uh iot devices with 17 00:00:48,120 --> 00:00:49,760 low 18 00:00:49,760 --> 00:00:52,359 security so some background on control 19 00:00:52,359 --> 00:00:55,160 flow attestation CFA what is control 20 00:00:55,160 --> 00:00:57,680 flow attestation in the most generic and 21 00:00:57,680 --> 00:01:00,120 highlevel way we have U three parties 22 00:01:00,120 --> 00:01:02,120 the verifier the approver and the third 23 00:01:02,120 --> 00:01:05,600 party and the third party uh wants to be 24 00:01:05,600 --> 00:01:08,880 sure of the state of execution of the 25 00:01:08,880 --> 00:01:12,280 Prov so it goes like this that the 26 00:01:12,280 --> 00:01:14,439 verifier uh sends an attestation 27 00:01:14,439 --> 00:01:17,280 challenge to the prover and ask it to uh 28 00:01:17,280 --> 00:01:21,439 provide evidence um of his State the 29 00:01:21,439 --> 00:01:23,600 Prov is Computing or collecting this 30 00:01:23,600 --> 00:01:25,920 evidence and then returning it to the 31 00:01:25,920 --> 00:01:29,079 verifier where the verifier is verifying 32 00:01:29,079 --> 00:01:31,320 this evidence for example against the 33 00:01:31,320 --> 00:01:34,439 database of uh matching evidences or 34 00:01:34,439 --> 00:01:37,200 through other means timing and so on and 35 00:01:37,200 --> 00:01:40,079 then the verify is a disseminating a 36 00:01:40,079 --> 00:01:43,360 report uh which is uh most of the time 37 00:01:43,360 --> 00:01:45,520 signed by the verifier which shows the 38 00:01:45,520 --> 00:01:49,799 third party the state of the Prov for 39 00:01:49,799 --> 00:01:52,240 example if it's benign if the 40 00:01:52,240 --> 00:01:56,039 computations are correct or 41 00:01:56,039 --> 00:02:00,399 not so also some background about code 42 00:02:00,399 --> 00:02:04,399 reuser Texs I stated them earlier but uh 43 00:02:04,399 --> 00:02:06,600 what are code reuser Texs and what 44 00:02:06,600 --> 00:02:09,318 examples are there for code Reus Texs uh 45 00:02:09,318 --> 00:02:11,120 here on the right you see a control flow 46 00:02:11,120 --> 00:02:12,599 graph also called 47 00:02:12,599 --> 00:02:15,959 CFG and this control flow graph has uh 48 00:02:15,959 --> 00:02:18,800 two different benign executions uh for 49 00:02:18,800 --> 00:02:22,640 example the uh green one from V1 to V7 50 00:02:22,640 --> 00:02:27,160 but also the uh Violet one from V1 to V7 51 00:02:27,160 --> 00:02:30,680 um over different nodes 52 00:02:30,680 --> 00:02:33,360 um and a return oriented programming 53 00:02:33,360 --> 00:02:36,160 attack also called Rob attack um adds 54 00:02:36,160 --> 00:02:38,840 extra transitions to this a control flow 55 00:02:38,840 --> 00:02:42,239 graph as you can see here from uh V1 to 56 00:02:42,239 --> 00:02:44,560 V7 um the 57 00:02:44,560 --> 00:02:46,760 execution has basically the same start 58 00:02:46,760 --> 00:02:49,720 and end point but it is jumping from V6 59 00:02:49,720 --> 00:02:54,760 to V uh 3 uh for example to create um 60 00:02:54,760 --> 00:02:57,720 malicious functionality which is not in 61 00:02:57,720 --> 00:03:01,120 the initial benign control flow graph on 62 00:03:01,120 --> 00:03:03,560 the other hand uh a data oriented 63 00:03:03,560 --> 00:03:05,599 programming attack or also called dop 64 00:03:05,599 --> 00:03:08,840 attack uh reuses theb9 transitions but 65 00:03:08,840 --> 00:03:10,840 still Alters the control flow but not 66 00:03:10,840 --> 00:03:13,239 the control flow graph so as you can see 67 00:03:13,239 --> 00:03:16,360 here the execution goes from V1 over V4 68 00:03:16,360 --> 00:03:19,720 to V7 and the first part is reused from 69 00:03:19,720 --> 00:03:23,280 the green uh execution v9 execution and 70 00:03:23,280 --> 00:03:24,840 the second part is reused from The 71 00:03:24,840 --> 00:03:27,560 Violet B9 72 00:03:28,239 --> 00:03:30,959 execution um so we come to the uh 73 00:03:30,959 --> 00:03:33,959 related work in this scenario uh about 74 00:03:33,959 --> 00:03:37,680 control flow attestation schemes um you 75 00:03:37,680 --> 00:03:40,480 can divide them in three topics on 76 00:03:40,480 --> 00:03:42,239 control flow adapation schemes relying 77 00:03:42,239 --> 00:03:44,879 on a complete control flow graph as a 78 00:03:44,879 --> 00:03:47,840 reference um on them relying on the 79 00:03:47,840 --> 00:03:50,959 memory access to the memory content and 80 00:03:50,959 --> 00:03:54,200 uh relying on custom Hardware the 81 00:03:54,200 --> 00:03:56,519 problem is that um having a complete 82 00:03:56,519 --> 00:03:58,040 control flow graph is a very strong 83 00:03:58,040 --> 00:04:00,319 assumption it's uh 84 00:04:00,319 --> 00:04:04,640 sometimes not possible to um calculate 85 00:04:04,640 --> 00:04:07,959 one um access to the memory content can 86 00:04:07,959 --> 00:04:10,079 lead to information leakage for example 87 00:04:10,079 --> 00:04:11,760 between the prover when it's collecting 88 00:04:11,760 --> 00:04:14,480 the evidence and uh to the and sending 89 00:04:14,480 --> 00:04:17,720 it to the verifier so 90 00:04:17,720 --> 00:04:20,358 um either on the way or on the verifier 91 00:04:20,358 --> 00:04:23,160 side this information can be leaked and 92 00:04:23,160 --> 00:04:25,720 custom Hardware is uh expensive in 93 00:04:25,720 --> 00:04:27,680 design and it's not applicable to 94 00:04:27,680 --> 00:04:30,800 off-the-shelf devices 95 00:04:30,800 --> 00:04:33,560 which brings us to our uh question 96 00:04:33,560 --> 00:04:35,479 research question is it possible to 97 00:04:35,479 --> 00:04:37,320 relax the assumptions by leveraging 98 00:04:37,320 --> 00:04:41,680 Machine learning uh to not have to rely 99 00:04:41,680 --> 00:04:44,320 on these uh strong 100 00:04:44,320 --> 00:04:46,840 assumptions so uh we looked into 101 00:04:46,840 --> 00:04:49,120 suitable machine learning techniques uh 102 00:04:49,120 --> 00:04:51,280 and chose uh graph neural network as a 103 00:04:51,280 --> 00:04:54,199 promising solution um and as evidence we 104 00:04:54,199 --> 00:04:56,520 chose to collect basic block addresses 105 00:04:56,520 --> 00:04:59,680 of the execution Trace uh this avoids 106 00:04:59,680 --> 00:05:01,320 information leakage to the verifier 107 00:05:01,320 --> 00:05:03,360 because we only sending the program 108 00:05:03,360 --> 00:05:06,479 counter values uh we then process the 109 00:05:06,479 --> 00:05:09,120 trace to an execution 110 00:05:09,120 --> 00:05:12,639 graph while extracting uh the features 111 00:05:12,639 --> 00:05:15,280 and uh here we don't impose any 112 00:05:15,280 --> 00:05:17,160 assumptions about its completeness 113 00:05:17,160 --> 00:05:21,120 because it it's just uh extracted from 114 00:05:21,120 --> 00:05:23,800 one execution 115 00:05:23,800 --> 00:05:27,160 trace and finally we extract per node 116 00:05:27,160 --> 00:05:29,680 embeddings from the graph using a graph 117 00:05:29,680 --> 00:05:32,560 neural network and this 118 00:05:32,560 --> 00:05:35,600 way uh we maintain a correlation between 119 00:05:35,600 --> 00:05:37,919 elements uh in the different 120 00:05:37,919 --> 00:05:39,639 representation as you can see here in 121 00:05:39,639 --> 00:05:42,240 red from the execution tray uh over the 122 00:05:42,240 --> 00:05:44,319 execution graph and over the execution 123 00:05:44,319 --> 00:05:46,479 embeddings there's a one to one uh 124 00:05:46,479 --> 00:05:51,120 correlation M maintained this is our key 125 00:05:51,120 --> 00:05:54,280 intuition so um this key intuition is 126 00:05:54,280 --> 00:05:56,400 then utilized by our approach called 127 00:05:56,400 --> 00:05:59,680 rage uh which does uh not rely on 128 00:05:59,680 --> 00:06:01,560 complete control flow graphs as I as I 129 00:06:01,560 --> 00:06:04,880 said earlier uh we use unsupervised 130 00:06:04,880 --> 00:06:07,280 graph neural networks to relax this 131 00:06:07,280 --> 00:06:11,280 assumption and this model we we 132 00:06:11,280 --> 00:06:13,840 uh yeah we we we show that this model 133 00:06:13,840 --> 00:06:15,240 learns the execution patterns and 134 00:06:15,240 --> 00:06:17,800 correctly separates benign and malicious 135 00:06:17,800 --> 00:06:21,440 executions um it also does not leak any 136 00:06:21,440 --> 00:06:23,880 uh data to the verifier or on the way on 137 00:06:23,880 --> 00:06:26,840 the way between verifier and prover uh 138 00:06:26,840 --> 00:06:29,000 because we only collect program counter 139 00:06:29,000 --> 00:06:31,560 addresses 140 00:06:31,599 --> 00:06:34,560 uh it is also not platform specific as 141 00:06:34,560 --> 00:06:37,800 we will show later um it can be run on 142 00:06:37,800 --> 00:06:42,560 any platform um which has a 143 00:06:42,560 --> 00:06:47,520 te and uh we don't need any extra 144 00:06:47,520 --> 00:06:49,919 Hardware we also don't need uh data 145 00:06:49,919 --> 00:06:54,039 labeling on our on the machine learning 146 00:06:54,039 --> 00:06:56,960 site uh it is very lightweight the model 147 00:06:56,960 --> 00:06:59,960 counts uh less than 9,000 parameters and 148 00:06:59,960 --> 00:07:02,199 can run even on a rasp P where I will 149 00:07:02,199 --> 00:07:04,240 also show evaluation 150 00:07:04,240 --> 00:07:08,599 later and uh as stated uh for the code 151 00:07:08,599 --> 00:07:10,440 reuse attacks the model can precisely 152 00:07:10,440 --> 00:07:13,240 detect Rob and do attacks including real 153 00:07:13,240 --> 00:07:16,840 world attacks which were uh never seen 154 00:07:16,840 --> 00:07:20,120 before so 155 00:07:20,120 --> 00:07:23,160 um uh uh we use variational graph uh 156 00:07:23,160 --> 00:07:26,160 Auto encoder in our um graph neural 157 00:07:26,160 --> 00:07:29,720 network uh approach and graph variation 158 00:07:29,720 --> 00:07:32,960 graph Auto encoders or also called VGA 159 00:07:32,960 --> 00:07:36,000 are state-of-the-art generative models 160 00:07:36,000 --> 00:07:39,240 uh trained to learn to reconstruct input 161 00:07:39,240 --> 00:07:42,000 graphs to Output graphs and we designed 162 00:07:42,000 --> 00:07:44,639 our custom encoder able to extract the 163 00:07:44,639 --> 00:07:48,840 latent features of the input graph and 164 00:07:48,840 --> 00:07:52,319 the um you can say that the control flow 165 00:07:52,319 --> 00:07:54,319 graph behavior is therefore captured in 166 00:07:54,319 --> 00:07:56,479 the output 167 00:07:56,479 --> 00:08:00,479 embeddings and uh this customized 168 00:08:00,479 --> 00:08:03,440 version of a VJ is used in our approach 169 00:08:03,440 --> 00:08:05,520 um which I will explain in 170 00:08:05,520 --> 00:08:08,800 detail so here on the right you can see 171 00:08:08,800 --> 00:08:13,759 um uh our encoder design uh using 172 00:08:13,759 --> 00:08:18,199 um on the our a custom encoder design uh 173 00:08:18,199 --> 00:08:20,400 which consists of four graph 174 00:08:20,400 --> 00:08:24,280 convolutional uh layers uh which vary in 175 00:08:24,280 --> 00:08:27,280 dimensionality from 15 to 48 uh 176 00:08:27,280 --> 00:08:29,639 connected by Dropout layers for regular 177 00:08:29,639 --> 00:08:33,360 ing the training and we selected the VJ 178 00:08:33,360 --> 00:08:35,839 as a model of choice as it was matching 179 00:08:35,839 --> 00:08:37,919 our requirements specifically through a 180 00:08:37,919 --> 00:08:40,320 phase of model selection we designed our 181 00:08:40,320 --> 00:08:43,399 custom encoder keeping in mind the 182 00:08:43,399 --> 00:08:45,360 trade-off between complexity of the 183 00:08:45,360 --> 00:08:47,880 model and performance uh of of 184 00:08:47,880 --> 00:08:49,880 extracting these embeddings that are 185 00:08:49,880 --> 00:08:51,839 rich enough to capture the control flow 186 00:08:51,839 --> 00:08:55,120 behavior and as I said earlier the en 187 00:08:55,120 --> 00:08:57,000 encoder counts of four layers with 188 00:08:57,000 --> 00:09:00,279 varying Dimensions uh specifically we uh 189 00:09:00,279 --> 00:09:02,200 increase the dimensionality for 190 00:09:02,200 --> 00:09:03,839 expanding the features we extracted 191 00:09:03,839 --> 00:09:06,399 during the feature engineering phase and 192 00:09:06,399 --> 00:09:08,480 consequently we reduce it slightly again 193 00:09:08,480 --> 00:09:12,839 to discard the most noisy 194 00:09:12,920 --> 00:09:15,920 features where which brings us uh to our 195 00:09:15,920 --> 00:09:19,079 system overview uh to train our approach 196 00:09:19,079 --> 00:09:21,920 we trace the executed software as stated 197 00:09:21,920 --> 00:09:24,680 earlier this one Trace is pre-processed 198 00:09:24,680 --> 00:09:26,760 into a graph representation and features 199 00:09:26,760 --> 00:09:29,800 are extracted this graph is then used to 200 00:09:29,800 --> 00:09:31,959 uh train the model uh extract the 201 00:09:31,959 --> 00:09:34,160 embeddings and tune the attestation 202 00:09:34,160 --> 00:09:36,760 threshold on the validation set to 203 00:09:36,760 --> 00:09:38,800 attest an execution the software is 204 00:09:38,800 --> 00:09:41,959 traced again pre-processed and forwarded 205 00:09:41,959 --> 00:09:44,880 to the VJ encoder which then extracts 206 00:09:44,880 --> 00:09:47,320 the embeddings computes the distance 207 00:09:47,320 --> 00:09:50,600 between them and uh the training 208 00:09:50,600 --> 00:09:52,760 embeddings and this distance is then 209 00:09:52,760 --> 00:09:55,000 tested against the threshold and the 210 00:09:55,000 --> 00:09:57,440 system discerns between a benign and 211 00:09:57,440 --> 00:10:00,959 malicious execution 212 00:10:01,720 --> 00:10:04,560 uh we uh did a real world uh evaluation 213 00:10:04,560 --> 00:10:08,120 on code reuser attacks um we extended 214 00:10:08,120 --> 00:10:11,399 the r framework uh with a gadget length 215 00:10:11,399 --> 00:10:14,079 configuration option for testing 216 00:10:14,079 --> 00:10:16,399 different uh Rob chain length and the r 217 00:10:16,399 --> 00:10:22,160 framework is a collection of um uh uh um 218 00:10:22,160 --> 00:10:24,360 code reuser text 219 00:10:24,360 --> 00:10:27,720 robex um uh we extended it also to 220 00:10:27,720 --> 00:10:30,279 include uh benign application logic to 221 00:10:30,279 --> 00:10:32,600 collect traces from additional software 222 00:10:32,600 --> 00:10:35,440 and we chose to use software from MCH 223 00:10:35,440 --> 00:10:36,760 iot data 224 00:10:36,760 --> 00:10:40,440 set and we tested this on uh a beagle 225 00:10:40,440 --> 00:10:43,720 bone black a siling Ultra scale plus and 226 00:10:43,720 --> 00:10:44,959 an Nvidia 227 00:10:44,959 --> 00:10:47,279 Jetson um 228 00:10:47,279 --> 00:10:49,959 device and as you can see here on the 229 00:10:49,959 --> 00:10:52,920 right uh rage correctly separates B9 and 230 00:10:52,920 --> 00:10:56,399 detect traces in every case uh with a 231 00:10:56,399 --> 00:10:59,880 great distance even this is because this 232 00:10:59,880 --> 00:11:03,680 um ripe framework attack is not stealthy 233 00:11:03,680 --> 00:11:06,160 the Rob attack is basically not 234 00:11:06,160 --> 00:11:09,680 returning um the execution flow back to 235 00:11:09,680 --> 00:11:12,600 the benign application but is 236 00:11:12,600 --> 00:11:16,079 um uh basically uh terminating the 237 00:11:16,079 --> 00:11:20,079 program and resulting in a in a 238 00:11:20,680 --> 00:11:25,120 shell to also be able to evaluate stey 239 00:11:25,120 --> 00:11:30,160 uh cras we uh simulated um Rob and dop 240 00:11:30,160 --> 00:11:34,639 attacks on tra on Bine Traces by um 241 00:11:34,639 --> 00:11:37,440 creating an algorithm which adds 242 00:11:37,440 --> 00:11:41,240 malicious traces which uh show Rob and 243 00:11:41,240 --> 00:11:43,800 do attacks in these traces uh to the 244 00:11:43,800 --> 00:11:46,680 traces we uh did it this way that the 245 00:11:46,680 --> 00:11:48,920 control flow is always returned to the 246 00:11:48,920 --> 00:11:52,079 program which uh as I stated earlier 247 00:11:52,079 --> 00:11:53,440 ripe attacks 248 00:11:53,440 --> 00:11:57,160 don't so these are more stealthy uh we 249 00:11:57,160 --> 00:11:59,920 evaluated on 23 software 250 00:11:59,920 --> 00:12:02,120 um collecting around one terabyte of 251 00:12:02,120 --> 00:12:04,680 Trace data on which we evaluated here on 252 00:12:04,680 --> 00:12:06,760 the right you can see that the data sets 253 00:12:06,760 --> 00:12:10,959 we use are um open SSL and thee 254 00:12:10,959 --> 00:12:15,360 mentioned mben iot data set so as you 255 00:12:15,360 --> 00:12:19,040 can see um for Rob and do attacks the F1 256 00:12:19,040 --> 00:12:22,320 score on both sides is in the high 90 257 00:12:22,320 --> 00:12:26,360 range which means that we are able to um 258 00:12:26,360 --> 00:12:30,279 detect even Steepy simulated um Rob and 259 00:12:30,279 --> 00:12:33,120 do attacks which are even more more sty 260 00:12:33,120 --> 00:12:35,920 than in the real world uh however Rob 261 00:12:35,920 --> 00:12:38,519 attack is easier to detect as it creates 262 00:12:38,519 --> 00:12:41,240 new edges in the control flow graph and 263 00:12:41,240 --> 00:12:44,079 dop attacks are harder uh as it reuses 264 00:12:44,079 --> 00:12:47,760 benign edges but and this detection of 265 00:12:47,760 --> 00:12:50,760 the do attack also heavily relies on the 266 00:12:50,760 --> 00:12:53,360 extracted features uh where we have also 267 00:12:53,360 --> 00:12:57,040 evaluation in the paper so uh refer 268 00:12:57,040 --> 00:12:59,199 refer to the paper for uh this 269 00:12:59,199 --> 00:13:02,880 evaluation we also evaluated um 270 00:13:02,880 --> 00:13:05,639 different uh Rob chain length in the 271 00:13:05,639 --> 00:13:11,920 paper um so yeah refer to the 272 00:13:11,959 --> 00:13:16,720 paper here in detail for the op SSL data 273 00:13:16,720 --> 00:13:20,360 set uh you can see that for uh both Rob 274 00:13:20,360 --> 00:13:23,720 and do attack the distance of the benign 275 00:13:23,720 --> 00:13:27,760 ones in uh yellow and the uh malicious 276 00:13:27,760 --> 00:13:32,000 ones in blue uh is rather high and it's 277 00:13:32,000 --> 00:13:35,600 uh and most are easy to 278 00:13:36,399 --> 00:13:39,480 detect uh finally we also evaluated the 279 00:13:39,480 --> 00:13:43,160 performance we uh used um a Raspberry Pi 280 00:13:43,160 --> 00:13:47,959 for B2 gigabyte version to show that uh 281 00:13:47,959 --> 00:13:50,079 the approver and the verifier can both 282 00:13:50,079 --> 00:13:53,199 be an edge device such an such as an 283 00:13:53,199 --> 00:13:56,600 Raspberry Pi and for on the Prova side 284 00:13:56,600 --> 00:13:59,680 you can see that the average Trace step 285 00:13:59,680 --> 00:14:05,800 length is around um 100 to 100,000 to uh 286 00:14:05,800 --> 00:14:09,480 10 million uh the but still 287 00:14:09,480 --> 00:14:11,920 pre-processing time is only around 4 288 00:14:11,920 --> 00:14:15,800 seconds the processed Trace size is um 289 00:14:15,800 --> 00:14:20,680 only um in average around 0.73 kilobyte 290 00:14:20,680 --> 00:14:23,480 which is set sent over the network uh 291 00:14:23,480 --> 00:14:26,480 here you can also do the pre-processing 292 00:14:26,480 --> 00:14:29,720 on the verifier side but then 293 00:14:29,720 --> 00:14:32,959 uh the whole Trace needs to be sent over 294 00:14:32,959 --> 00:14:37,079 so this is a um tradeoff between um 295 00:14:37,079 --> 00:14:40,199 preprocessing on the Prov or sending 296 00:14:40,199 --> 00:14:42,120 over the 297 00:14:42,120 --> 00:14:45,120 network uh on the verifier side the 298 00:14:45,120 --> 00:14:47,440 initial model training time only takes 5 299 00:14:47,440 --> 00:14:49,000 minutes on a Raspberry Pi which is 300 00:14:49,000 --> 00:14:52,639 feasible and the inference runtime uh of 301 00:14:52,639 --> 00:14:55,079 the uh VJ model is only 302 00:14:55,079 --> 00:15:00,000 0.005 seconds uh per execution 303 00:15:00,000 --> 00:15:04,600 Trace thank you very much