1 00:00:00,840 --> 00:00:03,600 hi I'm VES Dudu I'm a PhD student in 2 00:00:03,600 --> 00:00:05,480 Secure systems group at the University 3 00:00:05,480 --> 00:00:08,119 of watero uh today I'll be talking about 4 00:00:08,119 --> 00:00:10,679 our systematization on unintended 5 00:00:10,679 --> 00:00:12,519 interactions among machine learning 6 00:00:12,519 --> 00:00:15,080 defenses and risks this is a joint work 7 00:00:15,080 --> 00:00:17,600 with Sebastian Schiller and our adviser 8 00:00:17,600 --> 00:00:20,359 Professor Ana shokan machine learning 9 00:00:20,359 --> 00:00:22,560 models are susceptible to a wide range 10 00:00:22,560 --> 00:00:25,840 of risk to security privacy and fairness 11 00:00:25,840 --> 00:00:28,560 uh prior Works have uh proposed defenses 12 00:00:28,560 --> 00:00:31,439 to mitigate against each of these risks 13 00:00:31,439 --> 00:00:33,440 uh however the effectiveness of these 14 00:00:33,440 --> 00:00:36,200 defenses is only evaluated with respect 15 00:00:36,200 --> 00:00:39,680 to the risks that they protect against 16 00:00:39,680 --> 00:00:42,160 um but in practice uh machine learning 17 00:00:42,160 --> 00:00:45,120 models have to be deployed while uh 18 00:00:45,120 --> 00:00:47,600 incorporating multiple defenses 19 00:00:47,600 --> 00:00:50,559 simultaneously this raises two questions 20 00:00:50,559 --> 00:00:53,960 one uh can two defenses uh interact 21 00:00:53,960 --> 00:00:56,120 negatively with each other and the 22 00:00:56,120 --> 00:00:59,399 second is uh when a defense uh against a 23 00:00:59,399 --> 00:01:02,280 specific risk is effective uh can it 24 00:01:02,280 --> 00:01:04,680 increase or decrease the susceptibility 25 00:01:04,680 --> 00:01:08,240 to some other unrelated risk uh these 26 00:01:08,240 --> 00:01:10,280 are what we refer to as unintended 27 00:01:10,280 --> 00:01:12,680 interactions so to answer the first 28 00:01:12,680 --> 00:01:14,560 question uh we have unintended 29 00:01:14,560 --> 00:01:18,360 interactions among defenses uh where um 30 00:01:18,360 --> 00:01:21,040 prior work including ours have shown 31 00:01:21,040 --> 00:01:23,000 that combining multiple defenses 32 00:01:23,000 --> 00:01:25,079 together may result in 33 00:01:25,079 --> 00:01:29,159 conflicts um the second uh the second is 34 00:01:29,159 --> 00:01:31,720 um unint intended interactions between a 35 00:01:31,720 --> 00:01:35,000 defense and other unrelated risk and 36 00:01:35,000 --> 00:01:36,920 here we find that there is limited 37 00:01:36,920 --> 00:01:39,439 evaluation because prior work consider 38 00:01:39,439 --> 00:01:42,320 only specific risks uh defenses 39 00:01:42,320 --> 00:01:44,439 interactions or do not consider the 40 00:01:44,439 --> 00:01:45,600 underlying 41 00:01:45,600 --> 00:01:48,399 causes uh so there is no systematic 42 00:01:48,399 --> 00:01:51,159 framework to uh explore unintended 43 00:01:51,159 --> 00:01:54,200 interactions and this is the focus of uh 44 00:01:54,200 --> 00:01:57,479 our paper um so we present a systematic 45 00:01:57,479 --> 00:01:59,960 framework to understand un unintended 46 00:01:59,960 --> 00:02:02,039 interactions and we conjecture that 47 00:02:02,039 --> 00:02:04,360 overfitting and memorization are the 48 00:02:04,360 --> 00:02:07,439 underlying causes and their uh and 49 00:02:07,439 --> 00:02:09,560 factors which influence them are likely 50 00:02:09,560 --> 00:02:11,959 to influence these interactions as 51 00:02:11,959 --> 00:02:15,879 well um then we uh survey existing 52 00:02:15,879 --> 00:02:18,760 literature on unintended interactions 53 00:02:18,760 --> 00:02:22,280 and situate them uh within our framework 54 00:02:22,280 --> 00:02:24,599 uh and finally we present a guideline to 55 00:02:24,599 --> 00:02:27,280 conjecture about previously unexploded 56 00:02:27,280 --> 00:02:30,360 interactions uh and we consider uh 57 00:02:30,360 --> 00:02:33,120 and empirically validate the conjecture 58 00:02:33,120 --> 00:02:35,239 from the guideline for two previously 59 00:02:35,239 --> 00:02:37,280 unexplored 60 00:02:37,280 --> 00:02:40,319 interactions so I would I want to give a 61 00:02:40,319 --> 00:02:42,120 background of different machine learning 62 00:02:42,120 --> 00:02:44,239 risks and differences considered in our 63 00:02:44,239 --> 00:02:47,800 work so we considered the risks to 64 00:02:47,800 --> 00:02:50,720 security uh which includes uh evasion 65 00:02:50,720 --> 00:02:52,800 poisoning and unauthorized model 66 00:02:52,800 --> 00:02:56,480 ownership uh several risk to privacy and 67 00:02:56,480 --> 00:02:58,280 discriminatory Behavior as a risk to 68 00:02:58,280 --> 00:03:01,239 fairness and on the left left we have uh 69 00:03:01,239 --> 00:03:04,519 corresponding defenses which uh protect 70 00:03:04,519 --> 00:03:06,799 against each of the risks to 71 00:03:06,799 --> 00:03:09,000 systematically explore unintended 72 00:03:09,000 --> 00:03:11,680 interactions we consider all pairwise 73 00:03:11,680 --> 00:03:13,840 combination between a defense and 74 00:03:13,840 --> 00:03:17,120 unrelated risk um so for instance 75 00:03:17,120 --> 00:03:19,680 considering adversarial training uh 76 00:03:19,680 --> 00:03:21,480 which protects against the risk of 77 00:03:21,480 --> 00:03:25,239 evasion we want to see what the 78 00:03:25,239 --> 00:03:27,840 interaction with all the remaining risks 79 00:03:27,840 --> 00:03:31,000 are other than evasion so we previously 80 00:03:31,000 --> 00:03:32,640 mentioned that overfitting and 81 00:03:32,640 --> 00:03:35,560 memorization are the underlying causes 82 00:03:35,560 --> 00:03:37,560 um and there are two reasons for this 83 00:03:37,560 --> 00:03:41,840 one um effective defenses May induce 84 00:03:41,840 --> 00:03:44,480 reduce or rely on overfitting and 85 00:03:44,480 --> 00:03:47,760 memorization and uh all the risks tend 86 00:03:47,760 --> 00:03:50,680 to exploit overfitting and memorization 87 00:03:50,680 --> 00:03:52,400 overfitting and memorization are 88 00:03:52,400 --> 00:03:55,840 distinct and can occur simultaneously so 89 00:03:55,840 --> 00:03:57,799 overfitting is measured as the 90 00:03:57,799 --> 00:04:00,680 difference between the accuracy on train 91 00:04:00,680 --> 00:04:02,760 and test data sets and this is an 92 00:04:02,760 --> 00:04:05,480 aggregate metric as it is as accounts 93 00:04:05,480 --> 00:04:08,280 for all data records in both the data 94 00:04:08,280 --> 00:04:11,319 sets uh memorization on the other hand 95 00:04:11,319 --> 00:04:13,239 is a score that is assigned to 96 00:04:13,239 --> 00:04:15,200 individual data records in the training 97 00:04:15,200 --> 00:04:17,519 data set it is measured as the 98 00:04:17,519 --> 00:04:19,759 difference in the model's predictions on 99 00:04:19,759 --> 00:04:22,840 a data record with and without that data 100 00:04:22,840 --> 00:04:26,520 record in the training data set so to 101 00:04:26,520 --> 00:04:28,479 illustrate the relationship between 102 00:04:28,479 --> 00:04:30,639 overfitting and memorization we consider 103 00:04:30,639 --> 00:04:33,720 a simple experiment uh we take a 104 00:04:33,720 --> 00:04:36,560 synthetic data set with two claes uh 105 00:04:36,560 --> 00:04:41,080 orange CLA and blue CLA and uh here the 106 00:04:41,080 --> 00:04:43,160 training data records are indicated in 107 00:04:43,160 --> 00:04:45,680 circle while the crosses indicate test 108 00:04:45,680 --> 00:04:49,520 data records uh the Bas case is uh where 109 00:04:49,520 --> 00:04:51,919 there is no overfitting and memorization 110 00:04:51,919 --> 00:04:53,759 because the training data records are 111 00:04:53,759 --> 00:04:57,160 linearly separable um and uh and the 112 00:04:57,160 --> 00:04:59,320 test data distribution is similar to the 113 00:04:59,320 --> 00:05:01,759 training data distribution so when we 114 00:05:01,759 --> 00:05:04,479 train a multi-layer perceptron it learns 115 00:05:04,479 --> 00:05:07,680 a linear uh uh decision boundary to 116 00:05:07,680 --> 00:05:10,479 distinguish between both these claes we 117 00:05:10,479 --> 00:05:13,520 take this base example and add noise to 118 00:05:13,520 --> 00:05:15,800 the testing data records of both the 119 00:05:15,800 --> 00:05:17,800 claes such that they fall on the wrong 120 00:05:17,800 --> 00:05:20,080 side of the decision boundary as seen 121 00:05:20,080 --> 00:05:23,840 here and uh this results in a decrease 122 00:05:23,840 --> 00:05:27,039 in test accuracy and we have a case 123 00:05:27,039 --> 00:05:28,960 where there is overfitting but no 124 00:05:28,960 --> 00:05:30,639 memorization 125 00:05:30,639 --> 00:05:32,880 now instead of adding noise to the test 126 00:05:32,880 --> 00:05:35,680 data records we take the base example 127 00:05:35,680 --> 00:05:37,600 and add noise to the train data records 128 00:05:37,600 --> 00:05:41,080 now this results in the training data 129 00:05:41,080 --> 00:05:44,800 records being closer uh together and now 130 00:05:44,800 --> 00:05:47,360 they're no longer linearly separable and 131 00:05:47,360 --> 00:05:50,080 hence the uh the the neural network 132 00:05:50,080 --> 00:05:53,160 cannot learn a simple um linear 133 00:05:53,160 --> 00:05:55,240 classifier but it learns this complex 134 00:05:55,240 --> 00:05:58,440 decision boundary to fit um uh to fit 135 00:05:58,440 --> 00:06:01,080 each of the train in data records for 136 00:06:01,080 --> 00:06:04,360 both the classes perfectly here um 137 00:06:04,360 --> 00:06:07,400 memorization is non zero uh but there is 138 00:06:07,400 --> 00:06:10,800 no overfitting finally we have a case 139 00:06:10,800 --> 00:06:13,759 where we add noise to uh both the 140 00:06:13,759 --> 00:06:15,919 training and testing data records for 141 00:06:15,919 --> 00:06:18,199 both classes and this is where we 142 00:06:18,199 --> 00:06:20,680 observe overfitting and memorization to 143 00:06:20,680 --> 00:06:23,280 occur simultaneously now given the 144 00:06:23,280 --> 00:06:25,919 complexity of the data sets and the 145 00:06:25,919 --> 00:06:30,360 capacity of the model in in current uh 146 00:06:30,360 --> 00:06:32,919 machine learning Paradigm uh this is the 147 00:06:32,919 --> 00:06:35,199 setting that is most likely to occur in 148 00:06:35,199 --> 00:06:37,199 practice and this is what we assume for 149 00:06:37,199 --> 00:06:39,280 the rest of the paper that is 150 00:06:39,280 --> 00:06:43,680 overfitting and memorization uh occur 151 00:06:43,680 --> 00:06:46,720 simultaneously now I want to describe a 152 00:06:46,720 --> 00:06:49,639 framework uh which is used to evaluate 153 00:06:49,639 --> 00:06:52,240 uh unintended interactions now a Fame 154 00:06:52,240 --> 00:06:54,639 framework consists of different factors 155 00:06:54,639 --> 00:06:57,680 which influence uh overfitting and 156 00:06:57,680 --> 00:07:00,440 memorization um and this allows us to 157 00:07:00,440 --> 00:07:03,560 have a fine grained uh understanding of 158 00:07:03,560 --> 00:07:06,319 what influences these unintended 159 00:07:06,319 --> 00:07:08,840 interactions so we start by factors 160 00:07:08,840 --> 00:07:11,720 which influence overfitting uh bias and 161 00:07:11,720 --> 00:07:14,479 variance are two uh two underlying 162 00:07:14,479 --> 00:07:17,599 reasons why overfitting occur where bias 163 00:07:17,599 --> 00:07:20,520 is the error from poor hyperparameter 164 00:07:20,520 --> 00:07:22,960 choices for the machine learning model 165 00:07:22,960 --> 00:07:26,199 um so as an example if uh you consider a 166 00:07:26,199 --> 00:07:28,639 very small model uh which indicates a 167 00:07:28,639 --> 00:07:31,960 high bias it prevents learning relations 168 00:07:31,960 --> 00:07:35,039 between attributes and labels well uh 169 00:07:35,039 --> 00:07:37,599 variance on the other hand is the error 170 00:07:37,599 --> 00:07:39,520 from sensitivity to changes in the 171 00:07:39,520 --> 00:07:42,400 training data set and high variance is 172 00:07:42,400 --> 00:07:44,560 when the model fits the noise in the 173 00:07:44,560 --> 00:07:47,000 training data set now there's a tradeoff 174 00:07:47,000 --> 00:07:51,039 between bias and variance uh which uh 175 00:07:51,039 --> 00:07:54,280 which will result in overfitting uh and 176 00:07:54,280 --> 00:07:56,360 this trade-off can be balanced using two 177 00:07:56,360 --> 00:07:58,639 factors one is the size of the training 178 00:07:58,639 --> 00:08:01,680 data and and the model capacity we 179 00:08:01,680 --> 00:08:03,680 categorize the factors influencing 180 00:08:03,680 --> 00:08:05,759 memorization depending on whether 181 00:08:05,759 --> 00:08:07,400 they're related to the data set 182 00:08:07,400 --> 00:08:10,840 objective function or the model um so 183 00:08:10,840 --> 00:08:13,560 for data set related factors we have day 184 00:08:13,560 --> 00:08:15,560 length of the distribution where data 185 00:08:15,560 --> 00:08:17,759 records constituting the tail of the 186 00:08:17,759 --> 00:08:19,479 distribution are likely to be more 187 00:08:19,479 --> 00:08:22,360 memorized we also note that number of 188 00:08:22,360 --> 00:08:25,560 attributes um and the and whether the 189 00:08:25,560 --> 00:08:28,440 model focuses on learning stable 190 00:08:28,440 --> 00:08:30,360 attributes which do not change with 191 00:08:30,360 --> 00:08:33,640 change in distribution of the data also 192 00:08:33,640 --> 00:08:35,760 correlates with uh 193 00:08:35,760 --> 00:08:39,679 memorization um we also have objective 194 00:08:39,679 --> 00:08:41,679 function related factors which includes 195 00:08:41,679 --> 00:08:43,559 curvature smoothness and 196 00:08:43,559 --> 00:08:45,760 distinguishability in model observables 197 00:08:45,760 --> 00:08:48,800 across data set subgroups and and models 198 00:08:48,800 --> 00:08:51,200 themselves where these model observables 199 00:08:51,200 --> 00:08:54,240 are basically uh predictions or 200 00:08:54,240 --> 00:08:56,920 intermediate activations and finally the 201 00:08:56,920 --> 00:08:59,640 distance of training data records to to 202 00:08:59,640 --> 00:09:03,680 the decision boundary also uh uh 203 00:09:03,680 --> 00:09:06,880 influences memorization finally uh we 204 00:09:06,880 --> 00:09:09,120 have model capacity which is a factor 205 00:09:09,120 --> 00:09:13,040 that influences both overfitting and 206 00:09:13,040 --> 00:09:15,760 memorization given this framework with 207 00:09:15,760 --> 00:09:17,560 all these different factors we now 208 00:09:17,560 --> 00:09:21,240 situate prior work um and we indicate 209 00:09:21,240 --> 00:09:23,600 whether the prior work shows uh an 210 00:09:23,600 --> 00:09:25,440 increase in the risk for a specific 211 00:09:25,440 --> 00:09:28,000 combination of defense and risk or 212 00:09:28,000 --> 00:09:29,839 whether it's a decrease in the RIS risk 213 00:09:29,839 --> 00:09:32,079 or a particular combination has not been 214 00:09:32,079 --> 00:09:35,440 explored we also indicate whether prior 215 00:09:35,440 --> 00:09:37,880 evaluates the influence of factors 216 00:09:37,880 --> 00:09:40,519 empirically theoretically or simply 217 00:09:40,519 --> 00:09:43,440 conjectured uh about a 218 00:09:43,440 --> 00:09:46,240 factor now we revisit different risks 219 00:09:46,240 --> 00:09:50,160 and defenses that we uh showed in the 220 00:09:50,160 --> 00:09:53,839 background for each defense D we 221 00:09:53,839 --> 00:09:56,480 indicate whether uh the effectiveness of 222 00:09:56,480 --> 00:09:58,160 the defense correlates with the change 223 00:09:58,160 --> 00:10:00,200 in a particular Factor 224 00:10:00,200 --> 00:10:03,519 and we also U show whether a change in a 225 00:10:03,519 --> 00:10:06,079 factor correlates with the change in 226 00:10:06,079 --> 00:10:09,480 susceptibility to a risk R we use the 227 00:10:09,480 --> 00:10:11,399 upward arrow for positive correlation 228 00:10:11,399 --> 00:10:13,440 and a downward arrow for negative 229 00:10:13,440 --> 00:10:16,760 correlation and our table basically U 230 00:10:16,760 --> 00:10:19,160 enumerates all the defenses and risks 231 00:10:19,160 --> 00:10:21,600 and the factors and how they correlate 232 00:10:21,600 --> 00:10:22,920 with those 233 00:10:22,920 --> 00:10:25,880 factors Now using this we want to 234 00:10:25,880 --> 00:10:28,079 present a guideline by which uh 235 00:10:28,079 --> 00:10:29,880 researchers and practition can 236 00:10:29,880 --> 00:10:32,680 conjecture about unintended interactions 237 00:10:32,680 --> 00:10:35,720 so for a defense d a risk R and a common 238 00:10:35,720 --> 00:10:39,440 factor F uh we use a pair of arrows that 239 00:10:39,440 --> 00:10:44,240 describe how D and R correspond to F for 240 00:10:44,240 --> 00:10:47,120 a given common factor if both the arrows 241 00:10:47,120 --> 00:10:49,600 align then it indicates that the risk 242 00:10:49,600 --> 00:10:53,399 increases when a defense is effective uh 243 00:10:53,399 --> 00:10:57,200 and that's depicted by a red circle and 244 00:10:57,200 --> 00:10:59,800 otherwise um if the arrows are not 245 00:10:59,800 --> 00:11:02,560 aligned then it's a green circle now 246 00:11:02,560 --> 00:11:04,160 there could be multiple factors which 247 00:11:04,160 --> 00:11:06,920 are common for a combination of uh 248 00:11:06,920 --> 00:11:09,920 defense and a risk and uh if all the 249 00:11:09,920 --> 00:11:12,920 factors suggest the same thing uh then 250 00:11:12,920 --> 00:11:15,240 the conjectured overall interaction is 251 00:11:15,240 --> 00:11:18,160 what those factors indicate otherwise we 252 00:11:18,160 --> 00:11:20,639 have to prioritize the conjecture from 253 00:11:20,639 --> 00:11:22,560 dominant factor and this notion of 254 00:11:22,560 --> 00:11:25,320 dominance of factors uh depends on the 255 00:11:25,320 --> 00:11:27,639 attack and we'll we'll come to this in 256 00:11:27,639 --> 00:11:30,600 the next slide finally there could be uh 257 00:11:30,600 --> 00:11:33,920 non-common factors which may affect the 258 00:11:33,920 --> 00:11:36,360 overall interaction as 259 00:11:36,360 --> 00:11:41,480 well so for dominant factors um we note 260 00:11:41,480 --> 00:11:44,079 that there are we we can categorize the 261 00:11:44,079 --> 00:11:46,279 factors as active or passive depending 262 00:11:46,279 --> 00:11:48,600 on whether the factors are directly 263 00:11:48,600 --> 00:11:50,519 exploited by the attacks which is the 264 00:11:50,519 --> 00:11:55,399 case for uh o1 O2 and O3 and passive 265 00:11:55,399 --> 00:11:57,839 factors like data or model configuration 266 00:11:57,839 --> 00:12:00,240 are the rest so at attacks generally 267 00:12:00,240 --> 00:12:03,720 exploit active factors and we deem them 268 00:12:03,720 --> 00:12:06,440 as dominant because any change in those 269 00:12:06,440 --> 00:12:08,279 factors will result in a significant 270 00:12:08,279 --> 00:12:10,800 change in the susceptibility to a 271 00:12:10,800 --> 00:12:13,600 particular uh uh uh 272 00:12:13,600 --> 00:12:16,959 risk so uh there were two cases in our 273 00:12:16,959 --> 00:12:19,720 framework where we had to use uh this 274 00:12:19,720 --> 00:12:22,279 this notion of dominant factors so 275 00:12:22,279 --> 00:12:24,320 differential privacy increases the 276 00:12:24,320 --> 00:12:27,079 susceptibility to evasion as shown in uh 277 00:12:27,079 --> 00:12:31,199 prior work um and our conjecture uh and 278 00:12:31,199 --> 00:12:33,279 using our guideline we found three 279 00:12:33,279 --> 00:12:36,680 factors in common and the uh and by 280 00:12:36,680 --> 00:12:39,040 identifying the dominant factors which 281 00:12:39,040 --> 00:12:42,920 was o1 and O3 uh we could decide that um 282 00:12:42,920 --> 00:12:45,760 the interaction is given by a red which 283 00:12:45,760 --> 00:12:49,600 also matches uh the empirical results um 284 00:12:49,600 --> 00:12:51,839 second we had group fairness which uh 285 00:12:51,839 --> 00:12:53,440 increases the susceptibility to 286 00:12:53,440 --> 00:12:56,399 membership inference and um the work 287 00:12:56,399 --> 00:12:58,560 that empirically showed this also points 288 00:12:58,560 --> 00:13:01,399 out that um the distance to the decision 289 00:13:01,399 --> 00:13:03,600 boundary plays a role in the 290 00:13:03,600 --> 00:13:06,440 susceptibility to the risk and uh using 291 00:13:06,440 --> 00:13:08,959 a guideline and identifying the dominant 292 00:13:08,959 --> 00:13:13,000 Factor we can um indicate that um the 293 00:13:13,000 --> 00:13:16,440 interaction is is red which matches with 294 00:13:16,440 --> 00:13:19,000 the empirical results now using this 295 00:13:19,000 --> 00:13:23,240 guideline we conjecture uh about two 296 00:13:23,240 --> 00:13:25,880 unexplored interactions and empirically 297 00:13:25,880 --> 00:13:29,120 validate them uh the first is uh group 298 00:13:29,120 --> 00:13:32,600 aess and data reconstruction and uh the 299 00:13:32,600 --> 00:13:35,720 common factor between both of them is uh 300 00:13:35,720 --> 00:13:38,639 is distinguishability across subgroups 301 00:13:38,639 --> 00:13:42,279 which suggests green um and there is one 302 00:13:42,279 --> 00:13:44,519 non-common factor which is the number of 303 00:13:44,519 --> 00:13:48,079 attributes that may uh affect the 304 00:13:48,079 --> 00:13:52,279 susceptibility uh of of the risk so what 305 00:13:52,279 --> 00:13:56,320 we do is we uh perform the attack when 306 00:13:56,320 --> 00:13:58,440 the model is strained without fairness 307 00:13:58,440 --> 00:14:01,680 and with fa fairness and we find that um 308 00:14:01,680 --> 00:14:04,160 the using a fair model the attack sucess 309 00:14:04,160 --> 00:14:07,839 is lower um which confirms our 310 00:14:07,839 --> 00:14:10,480 conjecture um on evaluating the 311 00:14:10,480 --> 00:14:13,000 non-common factor we find that for less 312 00:14:13,000 --> 00:14:16,279 number of attributes um the conjecture 313 00:14:16,279 --> 00:14:18,560 holds uh but as we increase the number 314 00:14:18,560 --> 00:14:22,560 of attributes uh the memorization of uh 315 00:14:22,560 --> 00:14:25,920 individual attributes decreases and 316 00:14:25,920 --> 00:14:30,199 hence um this uh success of the attack 317 00:14:30,199 --> 00:14:32,560 um is is not the same anymore and there 318 00:14:32,560 --> 00:14:34,560 is no change in attack success on using 319 00:14:34,560 --> 00:14:35,519 Fair 320 00:14:35,519 --> 00:14:39,160 model now the second uh interaction is 321 00:14:39,160 --> 00:14:41,600 explanations and distribution inference 322 00:14:41,600 --> 00:14:44,040 and here the common factor it suggests 323 00:14:44,040 --> 00:14:46,079 that there is a uh there's a negative 324 00:14:46,079 --> 00:14:48,880 interaction between them um and there 325 00:14:48,880 --> 00:14:50,800 are two non-common factors which is the 326 00:14:50,800 --> 00:14:52,920 number of attributes and model capacity 327 00:14:52,920 --> 00:14:55,360 which may affect the susceptibility to 328 00:14:55,360 --> 00:14:58,360 different uh distribution inference now 329 00:14:58,360 --> 00:15:02,199 uh for validating this empirically uh we 330 00:15:02,199 --> 00:15:05,240 train a bunch of models on a training 331 00:15:05,240 --> 00:15:06,759 data set with a 332 00:15:06,759 --> 00:15:10,079 distribution um uh alpha 1 which could 333 00:15:10,079 --> 00:15:12,279 be which is the ratio of males to 334 00:15:12,279 --> 00:15:15,680 females and uh another set of models are 335 00:15:15,680 --> 00:15:17,720 trained on a data set with distribution 336 00:15:17,720 --> 00:15:20,800 Alpha 2 um and then the goal of the 337 00:15:20,800 --> 00:15:23,480 attack is to use the model explanations 338 00:15:23,480 --> 00:15:26,160 from both set of models to predict 339 00:15:26,160 --> 00:15:28,519 whether the uh whether the model was 340 00:15:28,519 --> 00:15:30,759 strained on a data set with alpha 1 or 341 00:15:30,759 --> 00:15:35,560 Alpha 2 um and here we find that across 342 00:15:35,560 --> 00:15:37,920 different explanation Al gorithms uh 343 00:15:37,920 --> 00:15:39,759 there is an increased susceptibility to 344 00:15:39,759 --> 00:15:42,639 inference as indicated by an attack 345 00:15:42,639 --> 00:15:46,040 accuracy of greater than 15% 50% for 346 00:15:46,040 --> 00:15:48,880 most ratios now we evaluate to see how 347 00:15:48,880 --> 00:15:51,800 the non-common factors uh influence the 348 00:15:51,800 --> 00:15:53,959 interaction uh starting with the number 349 00:15:53,959 --> 00:15:56,360 of attributes uh on increasing the 350 00:15:56,360 --> 00:15:59,279 number of attributes we find um that the 351 00:15:59,279 --> 00:16:01,360 susceptibility to distribution inference 352 00:16:01,360 --> 00:16:04,600 decreases and we attribute this to uh 353 00:16:04,600 --> 00:16:07,680 lower memorization of the relevant 354 00:16:07,680 --> 00:16:11,639 attributes which are responsible for uh 355 00:16:11,639 --> 00:16:13,360 uh which are required for distribution 356 00:16:13,360 --> 00:16:16,040 inference uh the second is model 357 00:16:16,040 --> 00:16:18,440 capacity and we find that increasing the 358 00:16:18,440 --> 00:16:21,399 model capacity uh results in a higher 359 00:16:21,399 --> 00:16:23,160 attack success due to increased 360 00:16:23,160 --> 00:16:25,759 memorization these examples show that 361 00:16:25,759 --> 00:16:28,319 our guideline can be used to conjecture 362 00:16:28,319 --> 00:16:30,880 about unex explor interactions uh and 363 00:16:30,880 --> 00:16:33,839 also we validate our uh guidelines 364 00:16:33,839 --> 00:16:36,759 predictions with uh uh interactions that 365 00:16:36,759 --> 00:16:39,480 have already been considered in Prior 366 00:16:39,480 --> 00:16:41,680 work uh however there are some 367 00:16:41,680 --> 00:16:44,680 exceptions to our guideline uh first uh 368 00:16:44,680 --> 00:16:46,800 the differences in adversary model can 369 00:16:46,800 --> 00:16:49,639 change the interaction type and we find 370 00:16:49,639 --> 00:16:52,880 uh three such cases in our survey where 371 00:16:52,880 --> 00:16:56,360 uh difference in adversary assumptions 372 00:16:56,360 --> 00:16:59,079 can uh result in different interaction 373 00:16:59,079 --> 00:17:01,720 type but our guideline continues can 374 00:17:01,720 --> 00:17:06,079 only predict one of them because uh uh 375 00:17:06,079 --> 00:17:07,599 because it does not account for the 376 00:17:07,599 --> 00:17:11,160 adversary model uh the second is uh that 377 00:17:11,160 --> 00:17:13,480 there are some risks and defenses which 378 00:17:13,480 --> 00:17:15,959 have too few factors uh evaluated in 379 00:17:15,959 --> 00:17:18,799 Prior work and as a result all their on 380 00:17:18,799 --> 00:17:21,439 all the uh interactions corresponding to 381 00:17:21,439 --> 00:17:24,280 these defenses and risk uh are hard to 382 00:17:24,280 --> 00:17:26,000 predict using our 383 00:17:26,000 --> 00:17:28,439 guideline so as part of our current work 384 00:17:28,439 --> 00:17:30,600 we are developing a software framework 385 00:17:30,600 --> 00:17:33,760 for systematic empirical evaluation of 386 00:17:33,760 --> 00:17:36,440 unexplored interactions this acts as a 387 00:17:36,440 --> 00:17:39,919 tool for practitioners and researchers 388 00:17:39,919 --> 00:17:43,240 um to evaluate the models uh and assess 389 00:17:43,240 --> 00:17:45,720 the risk of uh using a defense on a 390 00:17:45,720 --> 00:17:48,679 particular model there's also a need to 391 00:17:48,679 --> 00:17:51,240 understand uh how different variants of 392 00:17:51,240 --> 00:17:54,679 defenses and risk impact the 393 00:17:54,679 --> 00:17:57,480 interactions so to summarize unintended 394 00:17:57,480 --> 00:17:59,600 interactions are an important concern in 395 00:17:59,600 --> 00:18:03,039 practice and uh by looking at the common 396 00:18:03,039 --> 00:18:05,760 factors between a defense and a risk uh 397 00:18:05,760 --> 00:18:07,880 we can conjecture about the nature of 398 00:18:07,880 --> 00:18:10,400 such interactions and as part of future 399 00:18:10,400 --> 00:18:13,280 work uh we want to look at how to design 400 00:18:13,280 --> 00:18:15,600 defenses which can minimize such uh 401 00:18:15,600 --> 00:18:18,360 unintended increases in other risks 402 00:18:18,360 --> 00:18:21,559 thank you