1
00:00:00,840 --> 00:00:03,600
hi I'm VES Dudu I'm a PhD student in

2
00:00:03,600 --> 00:00:05,480
Secure systems group at the University

3
00:00:05,480 --> 00:00:08,119
of watero uh today I'll be talking about

4
00:00:08,119 --> 00:00:10,679
our systematization on unintended

5
00:00:10,679 --> 00:00:12,519
interactions among machine learning

6
00:00:12,519 --> 00:00:15,080
defenses and risks this is a joint work

7
00:00:15,080 --> 00:00:17,600
with Sebastian Schiller and our adviser

8
00:00:17,600 --> 00:00:20,359
Professor Ana shokan machine learning

9
00:00:20,359 --> 00:00:22,560
models are susceptible to a wide range

10
00:00:22,560 --> 00:00:25,840
of risk to security privacy and fairness

11
00:00:25,840 --> 00:00:28,560
uh prior Works have uh proposed defenses

12
00:00:28,560 --> 00:00:31,439
to mitigate against each of these risks

13
00:00:31,439 --> 00:00:33,440
uh however the effectiveness of these

14
00:00:33,440 --> 00:00:36,200
defenses is only evaluated with respect

15
00:00:36,200 --> 00:00:39,680
to the risks that they protect against

16
00:00:39,680 --> 00:00:42,160
um but in practice uh machine learning

17
00:00:42,160 --> 00:00:45,120
models have to be deployed while uh

18
00:00:45,120 --> 00:00:47,600
incorporating multiple defenses

19
00:00:47,600 --> 00:00:50,559
simultaneously this raises two questions

20
00:00:50,559 --> 00:00:53,960
one uh can two defenses uh interact

21
00:00:53,960 --> 00:00:56,120
negatively with each other and the

22
00:00:56,120 --> 00:00:59,399
second is uh when a defense uh against a

23
00:00:59,399 --> 00:01:02,280
specific risk is effective uh can it

24
00:01:02,280 --> 00:01:04,680
increase or decrease the susceptibility

25
00:01:04,680 --> 00:01:08,240
to some other unrelated risk uh these

26
00:01:08,240 --> 00:01:10,280
are what we refer to as unintended

27
00:01:10,280 --> 00:01:12,680
interactions so to answer the first

28
00:01:12,680 --> 00:01:14,560
question uh we have unintended

29
00:01:14,560 --> 00:01:18,360
interactions among defenses uh where um

30
00:01:18,360 --> 00:01:21,040
prior work including ours have shown

31
00:01:21,040 --> 00:01:23,000
that combining multiple defenses

32
00:01:23,000 --> 00:01:25,079
together may result in

33
00:01:25,079 --> 00:01:29,159
conflicts um the second uh the second is

34
00:01:29,159 --> 00:01:31,720
um unint intended interactions between a

35
00:01:31,720 --> 00:01:35,000
defense and other unrelated risk and

36
00:01:35,000 --> 00:01:36,920
here we find that there is limited

37
00:01:36,920 --> 00:01:39,439
evaluation because prior work consider

38
00:01:39,439 --> 00:01:42,320
only specific risks uh defenses

39
00:01:42,320 --> 00:01:44,439
interactions or do not consider the

40
00:01:44,439 --> 00:01:45,600
underlying

41
00:01:45,600 --> 00:01:48,399
causes uh so there is no systematic

42
00:01:48,399 --> 00:01:51,159
framework to uh explore unintended

43
00:01:51,159 --> 00:01:54,200
interactions and this is the focus of uh

44
00:01:54,200 --> 00:01:57,479
our paper um so we present a systematic

45
00:01:57,479 --> 00:01:59,960
framework to understand un unintended

46
00:01:59,960 --> 00:02:02,039
interactions and we conjecture that

47
00:02:02,039 --> 00:02:04,360
overfitting and memorization are the

48
00:02:04,360 --> 00:02:07,439
underlying causes and their uh and

49
00:02:07,439 --> 00:02:09,560
factors which influence them are likely

50
00:02:09,560 --> 00:02:11,959
to influence these interactions as

51
00:02:11,959 --> 00:02:15,879
well um then we uh survey existing

52
00:02:15,879 --> 00:02:18,760
literature on unintended interactions

53
00:02:18,760 --> 00:02:22,280
and situate them uh within our framework

54
00:02:22,280 --> 00:02:24,599
uh and finally we present a guideline to

55
00:02:24,599 --> 00:02:27,280
conjecture about previously unexploded

56
00:02:27,280 --> 00:02:30,360
interactions uh and we consider uh

57
00:02:30,360 --> 00:02:33,120
and empirically validate the conjecture

58
00:02:33,120 --> 00:02:35,239
from the guideline for two previously

59
00:02:35,239 --> 00:02:37,280
unexplored

60
00:02:37,280 --> 00:02:40,319
interactions so I would I want to give a

61
00:02:40,319 --> 00:02:42,120
background of different machine learning

62
00:02:42,120 --> 00:02:44,239
risks and differences considered in our

63
00:02:44,239 --> 00:02:47,800
work so we considered the risks to

64
00:02:47,800 --> 00:02:50,720
security uh which includes uh evasion

65
00:02:50,720 --> 00:02:52,800
poisoning and unauthorized model

66
00:02:52,800 --> 00:02:56,480
ownership uh several risk to privacy and

67
00:02:56,480 --> 00:02:58,280
discriminatory Behavior as a risk to

68
00:02:58,280 --> 00:03:01,239
fairness and on the left left we have uh

69
00:03:01,239 --> 00:03:04,519
corresponding defenses which uh protect

70
00:03:04,519 --> 00:03:06,799
against each of the risks to

71
00:03:06,799 --> 00:03:09,000
systematically explore unintended

72
00:03:09,000 --> 00:03:11,680
interactions we consider all pairwise

73
00:03:11,680 --> 00:03:13,840
combination between a defense and

74
00:03:13,840 --> 00:03:17,120
unrelated risk um so for instance

75
00:03:17,120 --> 00:03:19,680
considering adversarial training uh

76
00:03:19,680 --> 00:03:21,480
which protects against the risk of

77
00:03:21,480 --> 00:03:25,239
evasion we want to see what the

78
00:03:25,239 --> 00:03:27,840
interaction with all the remaining risks

79
00:03:27,840 --> 00:03:31,000
are other than evasion so we previously

80
00:03:31,000 --> 00:03:32,640
mentioned that overfitting and

81
00:03:32,640 --> 00:03:35,560
memorization are the underlying causes

82
00:03:35,560 --> 00:03:37,560
um and there are two reasons for this

83
00:03:37,560 --> 00:03:41,840
one um effective defenses May induce

84
00:03:41,840 --> 00:03:44,480
reduce or rely on overfitting and

85
00:03:44,480 --> 00:03:47,760
memorization and uh all the risks tend

86
00:03:47,760 --> 00:03:50,680
to exploit overfitting and memorization

87
00:03:50,680 --> 00:03:52,400
overfitting and memorization are

88
00:03:52,400 --> 00:03:55,840
distinct and can occur simultaneously so

89
00:03:55,840 --> 00:03:57,799
overfitting is measured as the

90
00:03:57,799 --> 00:04:00,680
difference between the accuracy on train

91
00:04:00,680 --> 00:04:02,760
and test data sets and this is an

92
00:04:02,760 --> 00:04:05,480
aggregate metric as it is as accounts

93
00:04:05,480 --> 00:04:08,280
for all data records in both the data

94
00:04:08,280 --> 00:04:11,319
sets uh memorization on the other hand

95
00:04:11,319 --> 00:04:13,239
is a score that is assigned to

96
00:04:13,239 --> 00:04:15,200
individual data records in the training

97
00:04:15,200 --> 00:04:17,519
data set it is measured as the

98
00:04:17,519 --> 00:04:19,759
difference in the model's predictions on

99
00:04:19,759 --> 00:04:22,840
a data record with and without that data

100
00:04:22,840 --> 00:04:26,520
record in the training data set so to

101
00:04:26,520 --> 00:04:28,479
illustrate the relationship between

102
00:04:28,479 --> 00:04:30,639
overfitting and memorization we consider

103
00:04:30,639 --> 00:04:33,720
a simple experiment uh we take a

104
00:04:33,720 --> 00:04:36,560
synthetic data set with two claes uh

105
00:04:36,560 --> 00:04:41,080
orange CLA and blue CLA and uh here the

106
00:04:41,080 --> 00:04:43,160
training data records are indicated in

107
00:04:43,160 --> 00:04:45,680
circle while the crosses indicate test

108
00:04:45,680 --> 00:04:49,520
data records uh the Bas case is uh where

109
00:04:49,520 --> 00:04:51,919
there is no overfitting and memorization

110
00:04:51,919 --> 00:04:53,759
because the training data records are

111
00:04:53,759 --> 00:04:57,160
linearly separable um and uh and the

112
00:04:57,160 --> 00:04:59,320
test data distribution is similar to the

113
00:04:59,320 --> 00:05:01,759
training data distribution so when we

114
00:05:01,759 --> 00:05:04,479
train a multi-layer perceptron it learns

115
00:05:04,479 --> 00:05:07,680
a linear uh uh decision boundary to

116
00:05:07,680 --> 00:05:10,479
distinguish between both these claes we

117
00:05:10,479 --> 00:05:13,520
take this base example and add noise to

118
00:05:13,520 --> 00:05:15,800
the testing data records of both the

119
00:05:15,800 --> 00:05:17,800
claes such that they fall on the wrong

120
00:05:17,800 --> 00:05:20,080
side of the decision boundary as seen

121
00:05:20,080 --> 00:05:23,840
here and uh this results in a decrease

122
00:05:23,840 --> 00:05:27,039
in test accuracy and we have a case

123
00:05:27,039 --> 00:05:28,960
where there is overfitting but no

124
00:05:28,960 --> 00:05:30,639
memorization

125
00:05:30,639 --> 00:05:32,880
now instead of adding noise to the test

126
00:05:32,880 --> 00:05:35,680
data records we take the base example

127
00:05:35,680 --> 00:05:37,600
and add noise to the train data records

128
00:05:37,600 --> 00:05:41,080
now this results in the training data

129
00:05:41,080 --> 00:05:44,800
records being closer uh together and now

130
00:05:44,800 --> 00:05:47,360
they're no longer linearly separable and

131
00:05:47,360 --> 00:05:50,080
hence the uh the the neural network

132
00:05:50,080 --> 00:05:53,160
cannot learn a simple um linear

133
00:05:53,160 --> 00:05:55,240
classifier but it learns this complex

134
00:05:55,240 --> 00:05:58,440
decision boundary to fit um uh to fit

135
00:05:58,440 --> 00:06:01,080
each of the train in data records for

136
00:06:01,080 --> 00:06:04,360
both the classes perfectly here um

137
00:06:04,360 --> 00:06:07,400
memorization is non zero uh but there is

138
00:06:07,400 --> 00:06:10,800
no overfitting finally we have a case

139
00:06:10,800 --> 00:06:13,759
where we add noise to uh both the

140
00:06:13,759 --> 00:06:15,919
training and testing data records for

141
00:06:15,919 --> 00:06:18,199
both classes and this is where we

142
00:06:18,199 --> 00:06:20,680
observe overfitting and memorization to

143
00:06:20,680 --> 00:06:23,280
occur simultaneously now given the

144
00:06:23,280 --> 00:06:25,919
complexity of the data sets and the

145
00:06:25,919 --> 00:06:30,360
capacity of the model in in current uh

146
00:06:30,360 --> 00:06:32,919
machine learning Paradigm uh this is the

147
00:06:32,919 --> 00:06:35,199
setting that is most likely to occur in

148
00:06:35,199 --> 00:06:37,199
practice and this is what we assume for

149
00:06:37,199 --> 00:06:39,280
the rest of the paper that is

150
00:06:39,280 --> 00:06:43,680
overfitting and memorization uh occur

151
00:06:43,680 --> 00:06:46,720
simultaneously now I want to describe a

152
00:06:46,720 --> 00:06:49,639
framework uh which is used to evaluate

153
00:06:49,639 --> 00:06:52,240
uh unintended interactions now a Fame

154
00:06:52,240 --> 00:06:54,639
framework consists of different factors

155
00:06:54,639 --> 00:06:57,680
which influence uh overfitting and

156
00:06:57,680 --> 00:07:00,440
memorization um and this allows us to

157
00:07:00,440 --> 00:07:03,560
have a fine grained uh understanding of

158
00:07:03,560 --> 00:07:06,319
what influences these unintended

159
00:07:06,319 --> 00:07:08,840
interactions so we start by factors

160
00:07:08,840 --> 00:07:11,720
which influence overfitting uh bias and

161
00:07:11,720 --> 00:07:14,479
variance are two uh two underlying

162
00:07:14,479 --> 00:07:17,599
reasons why overfitting occur where bias

163
00:07:17,599 --> 00:07:20,520
is the error from poor hyperparameter

164
00:07:20,520 --> 00:07:22,960
choices for the machine learning model

165
00:07:22,960 --> 00:07:26,199
um so as an example if uh you consider a

166
00:07:26,199 --> 00:07:28,639
very small model uh which indicates a

167
00:07:28,639 --> 00:07:31,960
high bias it prevents learning relations

168
00:07:31,960 --> 00:07:35,039
between attributes and labels well uh

169
00:07:35,039 --> 00:07:37,599
variance on the other hand is the error

170
00:07:37,599 --> 00:07:39,520
from sensitivity to changes in the

171
00:07:39,520 --> 00:07:42,400
training data set and high variance is

172
00:07:42,400 --> 00:07:44,560
when the model fits the noise in the

173
00:07:44,560 --> 00:07:47,000
training data set now there's a tradeoff

174
00:07:47,000 --> 00:07:51,039
between bias and variance uh which uh

175
00:07:51,039 --> 00:07:54,280
which will result in overfitting uh and

176
00:07:54,280 --> 00:07:56,360
this trade-off can be balanced using two

177
00:07:56,360 --> 00:07:58,639
factors one is the size of the training

178
00:07:58,639 --> 00:08:01,680
data and and the model capacity we

179
00:08:01,680 --> 00:08:03,680
categorize the factors influencing

180
00:08:03,680 --> 00:08:05,759
memorization depending on whether

181
00:08:05,759 --> 00:08:07,400
they're related to the data set

182
00:08:07,400 --> 00:08:10,840
objective function or the model um so

183
00:08:10,840 --> 00:08:13,560
for data set related factors we have day

184
00:08:13,560 --> 00:08:15,560
length of the distribution where data

185
00:08:15,560 --> 00:08:17,759
records constituting the tail of the

186
00:08:17,759 --> 00:08:19,479
distribution are likely to be more

187
00:08:19,479 --> 00:08:22,360
memorized we also note that number of

188
00:08:22,360 --> 00:08:25,560
attributes um and the and whether the

189
00:08:25,560 --> 00:08:28,440
model focuses on learning stable

190
00:08:28,440 --> 00:08:30,360
attributes which do not change with

191
00:08:30,360 --> 00:08:33,640
change in distribution of the data also

192
00:08:33,640 --> 00:08:35,760
correlates with uh

193
00:08:35,760 --> 00:08:39,679
memorization um we also have objective

194
00:08:39,679 --> 00:08:41,679
function related factors which includes

195
00:08:41,679 --> 00:08:43,559
curvature smoothness and

196
00:08:43,559 --> 00:08:45,760
distinguishability in model observables

197
00:08:45,760 --> 00:08:48,800
across data set subgroups and and models

198
00:08:48,800 --> 00:08:51,200
themselves where these model observables

199
00:08:51,200 --> 00:08:54,240
are basically uh predictions or

200
00:08:54,240 --> 00:08:56,920
intermediate activations and finally the

201
00:08:56,920 --> 00:08:59,640
distance of training data records to to

202
00:08:59,640 --> 00:09:03,680
the decision boundary also uh uh

203
00:09:03,680 --> 00:09:06,880
influences memorization finally uh we

204
00:09:06,880 --> 00:09:09,120
have model capacity which is a factor

205
00:09:09,120 --> 00:09:13,040
that influences both overfitting and

206
00:09:13,040 --> 00:09:15,760
memorization given this framework with

207
00:09:15,760 --> 00:09:17,560
all these different factors we now

208
00:09:17,560 --> 00:09:21,240
situate prior work um and we indicate

209
00:09:21,240 --> 00:09:23,600
whether the prior work shows uh an

210
00:09:23,600 --> 00:09:25,440
increase in the risk for a specific

211
00:09:25,440 --> 00:09:28,000
combination of defense and risk or

212
00:09:28,000 --> 00:09:29,839
whether it's a decrease in the RIS risk

213
00:09:29,839 --> 00:09:32,079
or a particular combination has not been

214
00:09:32,079 --> 00:09:35,440
explored we also indicate whether prior

215
00:09:35,440 --> 00:09:37,880
evaluates the influence of factors

216
00:09:37,880 --> 00:09:40,519
empirically theoretically or simply

217
00:09:40,519 --> 00:09:43,440
conjectured uh about a

218
00:09:43,440 --> 00:09:46,240
factor now we revisit different risks

219
00:09:46,240 --> 00:09:50,160
and defenses that we uh showed in the

220
00:09:50,160 --> 00:09:53,839
background for each defense D we

221
00:09:53,839 --> 00:09:56,480
indicate whether uh the effectiveness of

222
00:09:56,480 --> 00:09:58,160
the defense correlates with the change

223
00:09:58,160 --> 00:10:00,200
in a particular Factor

224
00:10:00,200 --> 00:10:03,519
and we also U show whether a change in a

225
00:10:03,519 --> 00:10:06,079
factor correlates with the change in

226
00:10:06,079 --> 00:10:09,480
susceptibility to a risk R we use the

227
00:10:09,480 --> 00:10:11,399
upward arrow for positive correlation

228
00:10:11,399 --> 00:10:13,440
and a downward arrow for negative

229
00:10:13,440 --> 00:10:16,760
correlation and our table basically U

230
00:10:16,760 --> 00:10:19,160
enumerates all the defenses and risks

231
00:10:19,160 --> 00:10:21,600
and the factors and how they correlate

232
00:10:21,600 --> 00:10:22,920
with those

233
00:10:22,920 --> 00:10:25,880
factors Now using this we want to

234
00:10:25,880 --> 00:10:28,079
present a guideline by which uh

235
00:10:28,079 --> 00:10:29,880
researchers and practition can

236
00:10:29,880 --> 00:10:32,680
conjecture about unintended interactions

237
00:10:32,680 --> 00:10:35,720
so for a defense d a risk R and a common

238
00:10:35,720 --> 00:10:39,440
factor F uh we use a pair of arrows that

239
00:10:39,440 --> 00:10:44,240
describe how D and R correspond to F for

240
00:10:44,240 --> 00:10:47,120
a given common factor if both the arrows

241
00:10:47,120 --> 00:10:49,600
align then it indicates that the risk

242
00:10:49,600 --> 00:10:53,399
increases when a defense is effective uh

243
00:10:53,399 --> 00:10:57,200
and that's depicted by a red circle and

244
00:10:57,200 --> 00:10:59,800
otherwise um if the arrows are not

245
00:10:59,800 --> 00:11:02,560
aligned then it's a green circle now

246
00:11:02,560 --> 00:11:04,160
there could be multiple factors which

247
00:11:04,160 --> 00:11:06,920
are common for a combination of uh

248
00:11:06,920 --> 00:11:09,920
defense and a risk and uh if all the

249
00:11:09,920 --> 00:11:12,920
factors suggest the same thing uh then

250
00:11:12,920 --> 00:11:15,240
the conjectured overall interaction is

251
00:11:15,240 --> 00:11:18,160
what those factors indicate otherwise we

252
00:11:18,160 --> 00:11:20,639
have to prioritize the conjecture from

253
00:11:20,639 --> 00:11:22,560
dominant factor and this notion of

254
00:11:22,560 --> 00:11:25,320
dominance of factors uh depends on the

255
00:11:25,320 --> 00:11:27,639
attack and we'll we'll come to this in

256
00:11:27,639 --> 00:11:30,600
the next slide finally there could be uh

257
00:11:30,600 --> 00:11:33,920
non-common factors which may affect the

258
00:11:33,920 --> 00:11:36,360
overall interaction as

259
00:11:36,360 --> 00:11:41,480
well so for dominant factors um we note

260
00:11:41,480 --> 00:11:44,079
that there are we we can categorize the

261
00:11:44,079 --> 00:11:46,279
factors as active or passive depending

262
00:11:46,279 --> 00:11:48,600
on whether the factors are directly

263
00:11:48,600 --> 00:11:50,519
exploited by the attacks which is the

264
00:11:50,519 --> 00:11:55,399
case for uh o1 O2 and O3 and passive

265
00:11:55,399 --> 00:11:57,839
factors like data or model configuration

266
00:11:57,839 --> 00:12:00,240
are the rest so at attacks generally

267
00:12:00,240 --> 00:12:03,720
exploit active factors and we deem them

268
00:12:03,720 --> 00:12:06,440
as dominant because any change in those

269
00:12:06,440 --> 00:12:08,279
factors will result in a significant

270
00:12:08,279 --> 00:12:10,800
change in the susceptibility to a

271
00:12:10,800 --> 00:12:13,600
particular uh uh uh

272
00:12:13,600 --> 00:12:16,959
risk so uh there were two cases in our

273
00:12:16,959 --> 00:12:19,720
framework where we had to use uh this

274
00:12:19,720 --> 00:12:22,279
this notion of dominant factors so

275
00:12:22,279 --> 00:12:24,320
differential privacy increases the

276
00:12:24,320 --> 00:12:27,079
susceptibility to evasion as shown in uh

277
00:12:27,079 --> 00:12:31,199
prior work um and our conjecture uh and

278
00:12:31,199 --> 00:12:33,279
using our guideline we found three

279
00:12:33,279 --> 00:12:36,680
factors in common and the uh and by

280
00:12:36,680 --> 00:12:39,040
identifying the dominant factors which

281
00:12:39,040 --> 00:12:42,920
was o1 and O3 uh we could decide that um

282
00:12:42,920 --> 00:12:45,760
the interaction is given by a red which

283
00:12:45,760 --> 00:12:49,600
also matches uh the empirical results um

284
00:12:49,600 --> 00:12:51,839
second we had group fairness which uh

285
00:12:51,839 --> 00:12:53,440
increases the susceptibility to

286
00:12:53,440 --> 00:12:56,399
membership inference and um the work

287
00:12:56,399 --> 00:12:58,560
that empirically showed this also points

288
00:12:58,560 --> 00:13:01,399
out that um the distance to the decision

289
00:13:01,399 --> 00:13:03,600
boundary plays a role in the

290
00:13:03,600 --> 00:13:06,440
susceptibility to the risk and uh using

291
00:13:06,440 --> 00:13:08,959
a guideline and identifying the dominant

292
00:13:08,959 --> 00:13:13,000
Factor we can um indicate that um the

293
00:13:13,000 --> 00:13:16,440
interaction is is red which matches with

294
00:13:16,440 --> 00:13:19,000
the empirical results now using this

295
00:13:19,000 --> 00:13:23,240
guideline we conjecture uh about two

296
00:13:23,240 --> 00:13:25,880
unexplored interactions and empirically

297
00:13:25,880 --> 00:13:29,120
validate them uh the first is uh group

298
00:13:29,120 --> 00:13:32,600
aess and data reconstruction and uh the

299
00:13:32,600 --> 00:13:35,720
common factor between both of them is uh

300
00:13:35,720 --> 00:13:38,639
is distinguishability across subgroups

301
00:13:38,639 --> 00:13:42,279
which suggests green um and there is one

302
00:13:42,279 --> 00:13:44,519
non-common factor which is the number of

303
00:13:44,519 --> 00:13:48,079
attributes that may uh affect the

304
00:13:48,079 --> 00:13:52,279
susceptibility uh of of the risk so what

305
00:13:52,279 --> 00:13:56,320
we do is we uh perform the attack when

306
00:13:56,320 --> 00:13:58,440
the model is strained without fairness

307
00:13:58,440 --> 00:14:01,680
and with fa fairness and we find that um

308
00:14:01,680 --> 00:14:04,160
the using a fair model the attack sucess

309
00:14:04,160 --> 00:14:07,839
is lower um which confirms our

310
00:14:07,839 --> 00:14:10,480
conjecture um on evaluating the

311
00:14:10,480 --> 00:14:13,000
non-common factor we find that for less

312
00:14:13,000 --> 00:14:16,279
number of attributes um the conjecture

313
00:14:16,279 --> 00:14:18,560
holds uh but as we increase the number

314
00:14:18,560 --> 00:14:22,560
of attributes uh the memorization of uh

315
00:14:22,560 --> 00:14:25,920
individual attributes decreases and

316
00:14:25,920 --> 00:14:30,199
hence um this uh success of the attack

317
00:14:30,199 --> 00:14:32,560
um is is not the same anymore and there

318
00:14:32,560 --> 00:14:34,560
is no change in attack success on using

319
00:14:34,560 --> 00:14:35,519
Fair

320
00:14:35,519 --> 00:14:39,160
model now the second uh interaction is

321
00:14:39,160 --> 00:14:41,600
explanations and distribution inference

322
00:14:41,600 --> 00:14:44,040
and here the common factor it suggests

323
00:14:44,040 --> 00:14:46,079
that there is a uh there's a negative

324
00:14:46,079 --> 00:14:48,880
interaction between them um and there

325
00:14:48,880 --> 00:14:50,800
are two non-common factors which is the

326
00:14:50,800 --> 00:14:52,920
number of attributes and model capacity

327
00:14:52,920 --> 00:14:55,360
which may affect the susceptibility to

328
00:14:55,360 --> 00:14:58,360
different uh distribution inference now

329
00:14:58,360 --> 00:15:02,199
uh for validating this empirically uh we

330
00:15:02,199 --> 00:15:05,240
train a bunch of models on a training

331
00:15:05,240 --> 00:15:06,759
data set with a

332
00:15:06,759 --> 00:15:10,079
distribution um uh alpha 1 which could

333
00:15:10,079 --> 00:15:12,279
be which is the ratio of males to

334
00:15:12,279 --> 00:15:15,680
females and uh another set of models are

335
00:15:15,680 --> 00:15:17,720
trained on a data set with distribution

336
00:15:17,720 --> 00:15:20,800
Alpha 2 um and then the goal of the

337
00:15:20,800 --> 00:15:23,480
attack is to use the model explanations

338
00:15:23,480 --> 00:15:26,160
from both set of models to predict

339
00:15:26,160 --> 00:15:28,519
whether the uh whether the model was

340
00:15:28,519 --> 00:15:30,759
strained on a data set with alpha 1 or

341
00:15:30,759 --> 00:15:35,560
Alpha 2 um and here we find that across

342
00:15:35,560 --> 00:15:37,920
different explanation Al gorithms uh

343
00:15:37,920 --> 00:15:39,759
there is an increased susceptibility to

344
00:15:39,759 --> 00:15:42,639
inference as indicated by an attack

345
00:15:42,639 --> 00:15:46,040
accuracy of greater than 15% 50% for

346
00:15:46,040 --> 00:15:48,880
most ratios now we evaluate to see how

347
00:15:48,880 --> 00:15:51,800
the non-common factors uh influence the

348
00:15:51,800 --> 00:15:53,959
interaction uh starting with the number

349
00:15:53,959 --> 00:15:56,360
of attributes uh on increasing the

350
00:15:56,360 --> 00:15:59,279
number of attributes we find um that the

351
00:15:59,279 --> 00:16:01,360
susceptibility to distribution inference

352
00:16:01,360 --> 00:16:04,600
decreases and we attribute this to uh

353
00:16:04,600 --> 00:16:07,680
lower memorization of the relevant

354
00:16:07,680 --> 00:16:11,639
attributes which are responsible for uh

355
00:16:11,639 --> 00:16:13,360
uh which are required for distribution

356
00:16:13,360 --> 00:16:16,040
inference uh the second is model

357
00:16:16,040 --> 00:16:18,440
capacity and we find that increasing the

358
00:16:18,440 --> 00:16:21,399
model capacity uh results in a higher

359
00:16:21,399 --> 00:16:23,160
attack success due to increased

360
00:16:23,160 --> 00:16:25,759
memorization these examples show that

361
00:16:25,759 --> 00:16:28,319
our guideline can be used to conjecture

362
00:16:28,319 --> 00:16:30,880
about unex explor interactions uh and

363
00:16:30,880 --> 00:16:33,839
also we validate our uh guidelines

364
00:16:33,839 --> 00:16:36,759
predictions with uh uh interactions that

365
00:16:36,759 --> 00:16:39,480
have already been considered in Prior

366
00:16:39,480 --> 00:16:41,680
work uh however there are some

367
00:16:41,680 --> 00:16:44,680
exceptions to our guideline uh first uh

368
00:16:44,680 --> 00:16:46,800
the differences in adversary model can

369
00:16:46,800 --> 00:16:49,639
change the interaction type and we find

370
00:16:49,639 --> 00:16:52,880
uh three such cases in our survey where

371
00:16:52,880 --> 00:16:56,360
uh difference in adversary assumptions

372
00:16:56,360 --> 00:16:59,079
can uh result in different interaction

373
00:16:59,079 --> 00:17:01,720
type but our guideline continues can

374
00:17:01,720 --> 00:17:06,079
only predict one of them because uh uh

375
00:17:06,079 --> 00:17:07,599
because it does not account for the

376
00:17:07,599 --> 00:17:11,160
adversary model uh the second is uh that

377
00:17:11,160 --> 00:17:13,480
there are some risks and defenses which

378
00:17:13,480 --> 00:17:15,959
have too few factors uh evaluated in

379
00:17:15,959 --> 00:17:18,799
Prior work and as a result all their on

380
00:17:18,799 --> 00:17:21,439
all the uh interactions corresponding to

381
00:17:21,439 --> 00:17:24,280
these defenses and risk uh are hard to

382
00:17:24,280 --> 00:17:26,000
predict using our

383
00:17:26,000 --> 00:17:28,439
guideline so as part of our current work

384
00:17:28,439 --> 00:17:30,600
we are developing a software framework

385
00:17:30,600 --> 00:17:33,760
for systematic empirical evaluation of

386
00:17:33,760 --> 00:17:36,440
unexplored interactions this acts as a

387
00:17:36,440 --> 00:17:39,919
tool for practitioners and researchers

388
00:17:39,919 --> 00:17:43,240
um to evaluate the models uh and assess

389
00:17:43,240 --> 00:17:45,720
the risk of uh using a defense on a

390
00:17:45,720 --> 00:17:48,679
particular model there's also a need to

391
00:17:48,679 --> 00:17:51,240
understand uh how different variants of

392
00:17:51,240 --> 00:17:54,679
defenses and risk impact the

393
00:17:54,679 --> 00:17:57,480
interactions so to summarize unintended

394
00:17:57,480 --> 00:17:59,600
interactions are an important concern in

395
00:17:59,600 --> 00:18:03,039
practice and uh by looking at the common

396
00:18:03,039 --> 00:18:05,760
factors between a defense and a risk uh

397
00:18:05,760 --> 00:18:07,880
we can conjecture about the nature of

398
00:18:07,880 --> 00:18:10,400
such interactions and as part of future

399
00:18:10,400 --> 00:18:13,280
work uh we want to look at how to design

400
00:18:13,280 --> 00:18:15,600
defenses which can minimize such uh

401
00:18:15,600 --> 00:18:18,360
unintended increases in other risks

402
00:18:18,360 --> 00:18:21,559
thank you