1
00:00:10,490 --> 00:00:16,520
hi everyone thanks Daniel for the

2
00:00:13,400 --> 00:00:19,070
introduction so I'm very excited to be

3
00:00:16,520 --> 00:00:20,830
here and today I will present our paper

4
00:00:19,070 --> 00:00:24,410
when the signal is in the noise

5
00:00:20,830 --> 00:00:27,349
exploiting the vixx's sticky noise first

6
00:00:24,410 --> 00:00:29,900
I will give a little bit of context so

7
00:00:27,350 --> 00:00:31,820
suppose you have a nice data set and you

8
00:00:29,900 --> 00:00:35,180
would like to share this data set with

9
00:00:31,820 --> 00:00:38,080
an analyst how do you make sure that the

10
00:00:35,180 --> 00:00:40,580
analyst can analyze the data without

11
00:00:38,080 --> 00:00:42,379
compromising the privacy of the people

12
00:00:40,580 --> 00:00:46,070
who contributed with their data

13
00:00:42,379 --> 00:00:48,019
well you could anonymize the data which

14
00:00:46,070 --> 00:00:49,970
essentially means removing direct

15
00:00:48,020 --> 00:00:52,820
identifiers from the records such as

16
00:00:49,970 --> 00:00:54,110
phone number name and so on the problem

17
00:00:52,820 --> 00:00:56,480
is that we know by now that

18
00:00:54,110 --> 00:00:59,660
anonymization does not really work

19
00:00:56,480 --> 00:01:06,470
because anonymous datasets can often be

20
00:00:59,660 --> 00:01:09,020
reified so the issue with anonymization

21
00:01:06,470 --> 00:01:11,539
is that in the end the analyst still

22
00:01:09,020 --> 00:01:13,970
gets to see the individual level data

23
00:01:11,539 --> 00:01:17,240
and she can do whatever she wants with

24
00:01:13,970 --> 00:01:20,179
it and this is a problem and so we need

25
00:01:17,240 --> 00:01:23,899
a safer solution and a possible approach

26
00:01:20,180 --> 00:01:26,840
our data query systems so here the idea

27
00:01:23,900 --> 00:01:29,210
is that the data is kept behind a

28
00:01:26,840 --> 00:01:32,060
protected server and the analyst can

29
00:01:29,210 --> 00:01:36,798
send queries to the server and get

30
00:01:32,060 --> 00:01:39,799
results they in return but what if the

31
00:01:36,799 --> 00:01:42,880
analyst is malicious in this case the

32
00:01:39,799 --> 00:01:46,070
analyst could try and send queries to

33
00:01:42,880 --> 00:01:48,380
infer sensitive information to extract

34
00:01:46,070 --> 00:01:50,899
summited sensitive information about a

35
00:01:48,380 --> 00:01:53,539
single individual in the data set so

36
00:01:50,900 --> 00:01:55,850
intuitively you would like to allow only

37
00:01:53,540 --> 00:01:58,460
queries that perform some kind of

38
00:01:55,850 --> 00:02:01,520
aggregation however in practice this is

39
00:01:58,460 --> 00:02:04,820
not easy to do for example consider the

40
00:02:01,520 --> 00:02:07,429
following query how many people named

41
00:02:04,820 --> 00:02:11,299
Bob have a salary less than 2,000 pounds

42
00:02:07,430 --> 00:02:13,250
of course you would like to block this

43
00:02:11,299 --> 00:02:15,320
kind of query and more generally you

44
00:02:13,250 --> 00:02:18,919
would like to block any query that

45
00:02:15,320 --> 00:02:22,769
selects a small number of users in the

46
00:02:18,919 --> 00:02:25,830
data set however such a man

47
00:02:22,770 --> 00:02:29,070
is easily circumvented by using two

48
00:02:25,830 --> 00:02:31,080
queries the first one is how many people

49
00:02:29,070 --> 00:02:33,959
have a salary less than 2,000 pound and

50
00:02:31,080 --> 00:02:36,870
the second one is how many people not

51
00:02:33,960 --> 00:02:40,830
named Bob have a salary less than 2,000

52
00:02:36,870 --> 00:02:43,890
pounds and of course what these two

53
00:02:40,830 --> 00:02:45,750
queries could wouldn't select many users

54
00:02:43,890 --> 00:02:47,940
so they would be allowed but by taking

55
00:02:45,750 --> 00:02:52,070
the difference in the output this can be

56
00:02:47,940 --> 00:02:54,960
either 0 or 1 depending on Bob's salary

57
00:02:52,070 --> 00:02:56,940
this is called a different difference in

58
00:02:54,960 --> 00:03:01,440
attack but many more attacks are known

59
00:02:56,940 --> 00:03:04,550
in the literature so a core idea to

60
00:03:01,440 --> 00:03:07,710
protect against these kind of attacks is

61
00:03:04,550 --> 00:03:10,050
randomness addition and here the idea is

62
00:03:07,710 --> 00:03:12,600
that the data curator adds a little bit

63
00:03:10,050 --> 00:03:17,220
of random noise to the output before we

64
00:03:12,600 --> 00:03:19,560
listen them to the analyst and the noise

65
00:03:17,220 --> 00:03:22,170
could be for example drawn by from a

66
00:03:19,560 --> 00:03:25,620
normal distribution centered in 0 so

67
00:03:22,170 --> 00:03:30,079
that the the small smaller values of the

68
00:03:25,620 --> 00:03:33,600
noise are more likely to be extracted

69
00:03:30,080 --> 00:03:34,940
however implementing noise addition is

70
00:03:33,600 --> 00:03:38,280
not easy

71
00:03:34,940 --> 00:03:40,170
and indeed in 2003 dinner and Nixon

72
00:03:38,280 --> 00:03:43,200
proposed the first reconstruction attack

73
00:03:40,170 --> 00:03:45,030
and without going into details they

74
00:03:43,200 --> 00:03:47,489
showed that if the noise is not enough

75
00:03:45,030 --> 00:03:50,070
then an attacker can reconstruct the

76
00:03:47,490 --> 00:03:53,250
full dataset in polynomial time and

77
00:03:50,070 --> 00:03:57,660
since then data has been generalized and

78
00:03:53,250 --> 00:04:01,950
improved many times so how can we add

79
00:03:57,660 --> 00:04:05,040
enough noise and how can we do so what's

80
00:04:01,950 --> 00:04:06,989
the right way to add noise so a possible

81
00:04:05,040 --> 00:04:10,200
answer is given by a differential

82
00:04:06,990 --> 00:04:12,270
privacy which was proposed in a 2006 and

83
00:04:10,200 --> 00:04:16,500
I'm sure many of you have heard about it

84
00:04:12,270 --> 00:04:18,870
already so differential privacy is a

85
00:04:16,500 --> 00:04:20,519
very broad topic and it's very hard to

86
00:04:18,870 --> 00:04:23,550
make statements that are a hundred

87
00:04:20,519 --> 00:04:25,830
percent correct but I think that most

88
00:04:23,550 --> 00:04:27,900
people would agree that differential

89
00:04:25,830 --> 00:04:32,240
privacy has positive and negative

90
00:04:27,900 --> 00:04:35,429
aspects on the upside it gives

91
00:04:32,240 --> 00:04:36,540
meaningful improvable guarantees of

92
00:04:35,430 --> 00:04:38,310
privacy

93
00:04:36,540 --> 00:04:41,070
and it provides a mathematical framework

94
00:04:38,310 --> 00:04:45,570
to reason about the privacy utility

95
00:04:41,070 --> 00:04:48,000
trade-off on the other hand au courant

96
00:04:45,570 --> 00:04:51,120
differential privacy mechanisms often

97
00:04:48,000 --> 00:04:55,950
add too much noise to the outputs making

98
00:04:51,120 --> 00:04:57,990
utility pretty bad and moreover it's

99
00:04:55,950 --> 00:05:00,690
pretty hard with differential privacy to

100
00:04:57,990 --> 00:05:03,780
allow many queries to the analyst and to

101
00:05:00,690 --> 00:05:07,800
provide good usability and flexibility

102
00:05:03,780 --> 00:05:10,739
in the platform for the analyst and for

103
00:05:07,800 --> 00:05:13,230
these reasons people have been starting

104
00:05:10,740 --> 00:05:16,130
to look for alternatives to differential

105
00:05:13,230 --> 00:05:18,510
privacy that are perhaps not based on

106
00:05:16,130 --> 00:05:21,690
mathematical guarantees of privacy but

107
00:05:18,510 --> 00:05:25,830
rather on heuristics and the fix is one

108
00:05:21,690 --> 00:05:27,780
of them so the fix is a patented

109
00:05:25,830 --> 00:05:30,330
commercial system developed by the

110
00:05:27,780 --> 00:05:32,190
company err clock and some researchers

111
00:05:30,330 --> 00:05:35,400
at the Max Planck Institute for software

112
00:05:32,190 --> 00:05:38,100
systems and specifically D fix is a

113
00:05:35,400 --> 00:05:41,190
privacy-preserving database system that

114
00:05:38,100 --> 00:05:43,580
in practice operates as an SQL proxy

115
00:05:41,190 --> 00:05:46,170
between the analyst and the database and

116
00:05:43,580 --> 00:05:49,919
diffict provides some unique features

117
00:05:46,170 --> 00:05:52,830
such as are its SQL syntax little noise

118
00:05:49,920 --> 00:05:56,940
added to the outputs and infinitely many

119
00:05:52,830 --> 00:06:01,770
queries allowed to every analyst and

120
00:05:56,940 --> 00:06:03,930
these features are precisely meant to

121
00:06:01,770 --> 00:06:07,890
address the limitation of the

122
00:06:03,930 --> 00:06:12,750
limitations of differential privacy so

123
00:06:07,890 --> 00:06:14,849
the way we fix protects privacy is by

124
00:06:12,750 --> 00:06:17,520
means of a novel noise addition

125
00:06:14,850 --> 00:06:20,940
mechanism that they call sticky noise so

126
00:06:17,520 --> 00:06:24,030
here is how it works so suppose that an

127
00:06:20,940 --> 00:06:26,190
analyst submits account query Q to D fix

128
00:06:24,030 --> 00:06:28,739
such as this one that selects all the

129
00:06:26,190 --> 00:06:32,300
users with condition which satisfy

130
00:06:28,740 --> 00:06:36,090
condition one condition two and so on so

131
00:06:32,300 --> 00:06:39,570
the fixes output to this query would be

132
00:06:36,090 --> 00:06:43,080
the true count of the query plus static

133
00:06:39,570 --> 00:06:44,700
noise plus dynamic noise so without

134
00:06:43,080 --> 00:06:47,940
going into the details of how these

135
00:06:44,700 --> 00:06:49,979
noises are computed the main ideas are

136
00:06:47,940 --> 00:06:53,180
that first static the static

137
00:06:49,980 --> 00:06:55,890
depends only on the query syntax of Q

138
00:06:53,180 --> 00:06:58,680
the dynamic noise depends on the query

139
00:06:55,890 --> 00:07:01,229
syntax and also on the user set of Q

140
00:06:58,680 --> 00:07:05,220
which is the set of user IDs that are

141
00:07:01,230 --> 00:07:08,330
selected by Q in the data set and both

142
00:07:05,220 --> 00:07:12,120
noises are sticky which means that

143
00:07:08,330 --> 00:07:14,880
repeating the same query to the fix will

144
00:07:12,120 --> 00:07:21,090
always give the same nice value for that

145
00:07:14,880 --> 00:07:24,630
query so another thing that to keep in

146
00:07:21,090 --> 00:07:26,849
mind about the fix is that both the

147
00:07:24,630 --> 00:07:30,380
static noise and the dynamic noise are

148
00:07:26,850 --> 00:07:34,470
made of smaller noise values and

149
00:07:30,380 --> 00:07:36,360
specifically one per condition so for

150
00:07:34,470 --> 00:07:39,150
example in a query with three conditions

151
00:07:36,360 --> 00:07:42,570
we would have that the output of the fix

152
00:07:39,150 --> 00:07:45,479
is the true count plus three noise

153
00:07:42,570 --> 00:07:47,760
values for the static noise and three

154
00:07:45,480 --> 00:07:50,780
noise values for the dynamic noise and

155
00:07:47,760 --> 00:07:55,710
it's noise value is drawn from a

156
00:07:50,780 --> 00:07:57,809
standard normal distribution so in the

157
00:07:55,710 --> 00:08:01,590
paper that describes defects the authors

158
00:07:57,810 --> 00:08:03,540
explain why this mechanism protects

159
00:08:01,590 --> 00:08:06,000
against some known some attacks that are

160
00:08:03,540 --> 00:08:08,760
known in the literature and also the

161
00:08:06,000 --> 00:08:10,560
they explain they present other measures

162
00:08:08,760 --> 00:08:12,180
that are implemented in the fix but I

163
00:08:10,560 --> 00:08:16,040
will not cover them in in this

164
00:08:12,180 --> 00:08:19,500
presentation okay so we I can finally

165
00:08:16,040 --> 00:08:22,170
present our attack or noise exploitation

166
00:08:19,500 --> 00:08:23,760
attacks on D fix and the reason why we

167
00:08:22,170 --> 00:08:27,360
call them noise exploitation is that

168
00:08:23,760 --> 00:08:29,580
they actually exploit the fact that part

169
00:08:27,360 --> 00:08:32,520
of the noise defects ads actually

170
00:08:29,580 --> 00:08:36,689
depends on the data and we can use this

171
00:08:32,520 --> 00:08:39,419
as a signal for sensitive information so

172
00:08:36,690 --> 00:08:42,500
here is the attack model first we take a

173
00:08:39,419 --> 00:08:45,210
data set that has the attributes and

174
00:08:42,500 --> 00:08:48,900
particularly the last attribute is as

175
00:08:45,210 --> 00:08:51,900
secret attribute and the attacker

176
00:08:48,900 --> 00:08:56,040
targets one user at a time which we call

177
00:08:51,900 --> 00:08:56,850
Bob and the attackers goal is to infer

178
00:08:56,040 --> 00:08:59,520
Bob's

179
00:08:56,850 --> 00:09:01,740
at Bob's secret attribute s so for

180
00:08:59,520 --> 00:09:03,689
simplicity here we assume that the

181
00:09:01,740 --> 00:09:06,030
secret attribute is binary but actually

182
00:09:03,690 --> 00:09:09,180
can be generalized to non-binary

183
00:09:06,030 --> 00:09:11,240
attributes we also assume that the

184
00:09:09,180 --> 00:09:15,239
attacker has some auxiliary information

185
00:09:11,240 --> 00:09:18,240
about Bob first she knows that Bob's

186
00:09:15,240 --> 00:09:20,040
record is in the data set and second she

187
00:09:18,240 --> 00:09:23,760
knows that the value she knows the value

188
00:09:20,040 --> 00:09:25,500
of K attributes about Bob and here's an

189
00:09:23,760 --> 00:09:28,200
example with D equals three and K equals

190
00:09:25,500 --> 00:09:30,360
two so as before we can have that the a

191
00:09:28,200 --> 00:09:33,870
data set that includes attributes age

192
00:09:30,360 --> 00:09:36,540
department and high salary the secret

193
00:09:33,870 --> 00:09:39,660
attribute would be high salary and Bob's

194
00:09:36,540 --> 00:09:41,520
record would be as equals 40 compute

195
00:09:39,660 --> 00:09:43,800
apartment equals computing and high

196
00:09:41,520 --> 00:09:46,260
salary equals true but what the attacker

197
00:09:43,800 --> 00:09:48,120
would know is only that Bob is four

198
00:09:46,260 --> 00:09:50,430
years old and is in Department of

199
00:09:48,120 --> 00:09:55,340
computing and she would like to find out

200
00:09:50,430 --> 00:09:59,250
that Bob has high salary equals true so

201
00:09:55,340 --> 00:10:04,230
here's our first attack which we call

202
00:09:59,250 --> 00:10:06,780
differential attack and assume that Bob

203
00:10:04,230 --> 00:10:08,850
is the only person in the data set which

204
00:10:06,780 --> 00:10:11,550
is four years old and is in the

205
00:10:08,850 --> 00:10:14,190
department of computing then we would

206
00:10:11,550 --> 00:10:17,819
issue the two queries that you see at

207
00:10:14,190 --> 00:10:20,040
the top we would get the answers from

208
00:10:17,820 --> 00:10:23,000
defects with the noise and we would

209
00:10:20,040 --> 00:10:28,319
consider the difference between the two

210
00:10:23,000 --> 00:10:30,150
outputs so q1 minus q2 would be of

211
00:10:28,320 --> 00:10:34,260
course the difference between the true

212
00:10:30,150 --> 00:10:38,430
counts plus all the noise layers for q1

213
00:10:34,260 --> 00:10:41,189
and all the noise layers for q2 and this

214
00:10:38,430 --> 00:10:43,410
is quite a bit of noise but actually

215
00:10:41,190 --> 00:10:45,690
with a little bit of work we can see

216
00:10:43,410 --> 00:10:49,110
that some of the noise layers are the

217
00:10:45,690 --> 00:10:51,840
same and actually we more even more

218
00:10:49,110 --> 00:10:55,140
interestingly we can see that some nice

219
00:10:51,840 --> 00:10:59,370
layers cancel out depending on Bob's

220
00:10:55,140 --> 00:11:02,880
attribute specifically if Bob has high

221
00:10:59,370 --> 00:11:07,110
salary equals true then for static noise

222
00:11:02,880 --> 00:11:10,770
layers cancel out but if bob has has not

223
00:11:07,110 --> 00:11:11,220
high salary then also for dynamic noise

224
00:11:10,770 --> 00:11:16,560
layers

225
00:11:11,220 --> 00:11:17,490
cancel out so what this means is that q1

226
00:11:16,560 --> 00:11:21,170
minus q2

227
00:11:17,490 --> 00:11:25,140
you follows two different distributions

228
00:11:21,170 --> 00:11:27,719
depending on Bob's secret attribute high

229
00:11:25,140 --> 00:11:29,790
salary and specifically it follows

230
00:11:27,720 --> 00:11:31,890
distribution a normal distribution with

231
00:11:29,790 --> 00:11:35,490
mean zero and C and a standard deviation

232
00:11:31,890 --> 00:11:37,410
two if high salary equals true and it

233
00:11:35,490 --> 00:11:39,540
follows a distribution a normal

234
00:11:37,410 --> 00:11:42,569
distribution with mean one and standard

235
00:11:39,540 --> 00:11:46,230
deviation 2k plus two if the high salary

236
00:11:42,570 --> 00:11:49,470
is false and here K again is the number

237
00:11:46,230 --> 00:11:53,730
of attributes known to the attacker

238
00:11:49,470 --> 00:11:55,170
about the victim so in practice the

239
00:11:53,730 --> 00:11:59,040
differential attack has several

240
00:11:55,170 --> 00:12:01,620
limitations and the main ones are that

241
00:11:59,040 --> 00:12:05,849
first it assumes that Bob is unique in

242
00:12:01,620 --> 00:12:08,700
the data set and second some attack

243
00:12:05,850 --> 00:12:10,560
queries are likely to be suppressed by

244
00:12:08,700 --> 00:12:12,420
an additional measure that diff

245
00:12:10,560 --> 00:12:15,750
exploiting put in place and I didn't

246
00:12:12,420 --> 00:12:18,839
explain so ultimately this means that

247
00:12:15,750 --> 00:12:21,390
the accuracy is not great in some cases

248
00:12:18,839 --> 00:12:23,820
for this attack and this is the reason

249
00:12:21,390 --> 00:12:26,040
why we developed a second attack and

250
00:12:23,820 --> 00:12:27,990
improve the attack which we call cloning

251
00:12:26,040 --> 00:12:30,660
attack which achieves much better

252
00:12:27,990 --> 00:12:33,240
accuracy so unfortunately I will not

253
00:12:30,660 --> 00:12:35,100
have time to describe this attack

254
00:12:33,240 --> 00:12:37,140
because it's quite complicated but I

255
00:12:35,100 --> 00:12:40,709
would like to mention that it relies on

256
00:12:37,140 --> 00:12:44,130
a weaker notion of uniqueness that we

257
00:12:40,709 --> 00:12:46,829
named value uniqueness so we say that

258
00:12:44,130 --> 00:12:49,620
our record is value unique with respect

259
00:12:46,829 --> 00:12:52,410
to a set of attributes if all records

260
00:12:49,620 --> 00:12:55,200
sharing the same attributes also have

261
00:12:52,410 --> 00:12:59,850
the same secret attribute and you can

262
00:12:55,200 --> 00:13:01,709
see an example here so the first record

263
00:12:59,850 --> 00:13:05,250
which is Bob's record is value unique

264
00:13:01,709 --> 00:13:07,410
because it shares the same age and

265
00:13:05,250 --> 00:13:11,190
department attributes with the second

266
00:13:07,410 --> 00:13:12,839
record and it also shares the secret

267
00:13:11,190 --> 00:13:15,149
attribute high salary because it's the

268
00:13:12,839 --> 00:13:17,160
same on the other hand the last record

269
00:13:15,149 --> 00:13:19,950
alice's record is not val unique because

270
00:13:17,160 --> 00:13:21,759
the high salary attribute is not the

271
00:13:19,950 --> 00:13:26,499
same between the 3rd and

272
00:13:21,759 --> 00:13:30,970
the fourth record I would also like to

273
00:13:26,499 --> 00:13:33,339
clarify that we do not assume that the

274
00:13:30,970 --> 00:13:35,410
attacker knows that Bob's record is

275
00:13:33,339 --> 00:13:36,819
value unique this is because actually

276
00:13:35,410 --> 00:13:40,149
value uniqueness is detected

277
00:13:36,819 --> 00:13:44,738
automatically by our cloning attack with

278
00:13:40,149 --> 00:13:47,079
pretty good confidence and here are the

279
00:13:44,739 --> 00:13:49,869
results of the cloning attack on three

280
00:13:47,079 --> 00:13:52,959
real-world data sets and one synthetic

281
00:13:49,869 --> 00:13:55,839
data set so on the x-axis you have the

282
00:13:52,959 --> 00:13:58,479
number of attributes known to the

283
00:13:55,839 --> 00:14:01,209
attacker about the victim and on the

284
00:13:58,480 --> 00:14:05,100
y-axis you have the fraction of all

285
00:14:01,209 --> 00:14:08,108
records in the data set so the gray line

286
00:14:05,100 --> 00:14:10,600
indicates the fraction of value unique

287
00:14:08,109 --> 00:14:13,739
records and so you see that as K grows

288
00:14:10,600 --> 00:14:17,769
almost all users become value niek and

289
00:14:13,739 --> 00:14:19,769
the black line is the fraction of users

290
00:14:17,769 --> 00:14:23,499
in the data set that are attacked and

291
00:14:19,769 --> 00:14:26,589
correctly inferred and you can see that

292
00:14:23,499 --> 00:14:28,749
for larger numbers of non large numbers

293
00:14:26,589 --> 00:14:32,379
of known attributes the number of

294
00:14:28,749 --> 00:14:36,039
correctly inferred users goes almost up

295
00:14:32,379 --> 00:14:39,639
to the entire data set around 90% of all

296
00:14:36,039 --> 00:14:43,239
users so this the de Kooning attack in

297
00:14:39,639 --> 00:14:45,970
its original form uses about a few

298
00:14:43,239 --> 00:14:48,220
hundred queries for its user but

299
00:14:45,970 --> 00:14:51,100
actually we modified the attack in a way

300
00:14:48,220 --> 00:14:53,470
that targets about half of the users in

301
00:14:51,100 --> 00:14:57,399
the data set but can work with as little

302
00:14:53,470 --> 00:15:01,449
as 32 queries per user and that's you've

303
00:14:57,399 --> 00:15:05,709
still almost perfect accuracy so air

304
00:15:01,449 --> 00:15:08,019
clock proposed paths for our attack and

305
00:15:05,709 --> 00:15:12,339
the path is supposed to be implemented

306
00:15:08,019 --> 00:15:16,419
in defects by the fourth quarter of this

307
00:15:12,339 --> 00:15:19,509
year so the paths essentially removes

308
00:15:16,419 --> 00:15:22,809
dangerous conditions from the queries

309
00:15:19,509 --> 00:15:25,929
and it does so in a way that again

310
00:15:22,809 --> 00:15:28,569
depends on the data and so the technical

311
00:15:25,929 --> 00:15:31,089
details are not yet available for their

312
00:15:28,569 --> 00:15:32,790
paths but our comment is that in our

313
00:15:31,089 --> 00:15:35,010
opinion

314
00:15:32,790 --> 00:15:38,010
the patch does not really address the

315
00:15:35,010 --> 00:15:40,290
core vulnerability that we pointed out

316
00:15:38,010 --> 00:15:42,600
in our attacks namely that data

317
00:15:40,290 --> 00:15:45,360
dependent noise leaks information about

318
00:15:42,600 --> 00:15:48,480
the data and potentially this patch

319
00:15:45,360 --> 00:15:53,580
introduces new vulnerabilities because

320
00:15:48,480 --> 00:15:55,440
it is again data dependent so I'd like

321
00:15:53,580 --> 00:15:58,230
to mention that other attacks have been

322
00:15:55,440 --> 00:16:00,450
proposed on addy fix the first one is a

323
00:15:58,230 --> 00:16:03,810
membership attacked by jollies and

324
00:16:00,450 --> 00:16:07,050
others and it is based on a previous

325
00:16:03,810 --> 00:16:11,010
paper that they published in and ESS in

326
00:16:07,050 --> 00:16:13,349
2019 the second one is a linear

327
00:16:11,010 --> 00:16:16,319
reconstruction attack by kana Cohen and

328
00:16:13,350 --> 00:16:19,950
miss him and actually this is based on

329
00:16:16,320 --> 00:16:22,220
the original attack from 2003 but it was

330
00:16:19,950 --> 00:16:29,220
actually tweaked quite a bit to work

331
00:16:22,220 --> 00:16:32,520
against defects so to conclude we think

332
00:16:29,220 --> 00:16:36,510
that data query systems are the right

333
00:16:32,520 --> 00:16:41,579
response to the failure of anonymization

334
00:16:36,510 --> 00:16:44,520
and they are the way forward for privacy

335
00:16:41,580 --> 00:16:48,030
preserving their publishing however we

336
00:16:44,520 --> 00:16:52,290
think that incorrectly implementing data

337
00:16:48,030 --> 00:16:55,079
query systems is hard and for this

338
00:16:52,290 --> 00:16:57,719
reason we rely on a single mechanism to

339
00:16:55,080 --> 00:17:01,500
protect privacy such as sticky noise is

340
00:16:57,720 --> 00:17:03,540
risky so we believe that deployed system

341
00:17:01,500 --> 00:17:06,020
should also implement some defense in

342
00:17:03,540 --> 00:17:08,940
that measures such as for example

343
00:17:06,020 --> 00:17:12,839
pre-audit in query rate limiting and so

344
00:17:08,940 --> 00:17:15,329
on but also we think that alternatives

345
00:17:12,839 --> 00:17:18,240
to differential privacy are useful and

346
00:17:15,329 --> 00:17:22,190
actually a system like the fix with some

347
00:17:18,240 --> 00:17:25,050
modification can achieve a reasonable

348
00:17:22,190 --> 00:17:26,730
privacy utility trade-off in some

349
00:17:25,050 --> 00:17:30,020
settings especially in trusted

350
00:17:26,730 --> 00:17:32,270
environment of course in all cases

351
00:17:30,020 --> 00:17:34,530
transparency is fundamental to give the

352
00:17:32,270 --> 00:17:38,660
researchers the possibility to study the

353
00:17:34,530 --> 00:17:42,420
system and assess potential

354
00:17:38,660 --> 00:17:44,600
vulnerabilities and actually we welcome

355
00:17:42,420 --> 00:17:46,230
the fact that Eric lock decided to

356
00:17:44,600 --> 00:17:50,668
publish

357
00:17:46,230 --> 00:17:52,259
in publicly the specification of defects

358
00:17:50,669 --> 00:17:55,710
and we can we hope that they will

359
00:17:52,259 --> 00:17:57,389
continue to do so in the future thank

360
00:17:55,710 --> 00:18:00,040
you for your attention and I'll be happy

361
00:17:57,389 --> 00:18:01,800
to take any questions you may have

362
00:18:00,040 --> 00:18:04,960
[Applause]

363
00:18:01,800 --> 00:18:04,960
[Music]

364
00:18:10,820 --> 00:18:18,229
thanks for the presentation so is it

365
00:18:15,109 --> 00:18:20,320
easy to generalize your technique so

366
00:18:18,229 --> 00:18:26,029
that you can recover non-binary

367
00:18:20,320 --> 00:18:28,489
attributes yes so it's a relatively easy

368
00:18:26,029 --> 00:18:31,729
one of the possible ways to do this is

369
00:18:28,489 --> 00:18:34,309
by replacing the last condition with sex

370
00:18:31,729 --> 00:18:36,919
whether exactly the exact value of the

371
00:18:34,309 --> 00:18:40,940
condition with actual and inequality and

372
00:18:36,919 --> 00:18:42,739
then essentially narrow down the with

373
00:18:40,940 --> 00:18:45,139
inequalities today to the right

374
00:18:42,739 --> 00:18:48,019
attribute of course this might require

375
00:18:45,139 --> 00:18:49,758
more queries but actually the fix is

376
00:18:48,019 --> 00:18:51,979
implemented in a way that allows

377
00:18:49,759 --> 00:18:53,089
infinitely many queries so this is a

378
00:18:51,979 --> 00:19:00,979
this would be actually possible

379
00:18:53,089 --> 00:19:02,208
yesterday photos was not in this area

380
00:19:00,979 --> 00:19:04,190
I'm wondering if you can give us a sense

381
00:19:02,209 --> 00:19:06,769
of the impact of this what types of

382
00:19:04,190 --> 00:19:08,389
systems use this kind of setup like who

383
00:19:06,769 --> 00:19:12,729
are customers of this company that

384
00:19:08,389 --> 00:19:15,488
provides defects so the the customers

385
00:19:12,729 --> 00:19:20,799
for defects are not publicly available

386
00:19:15,489 --> 00:19:24,759
but we think that it might be primarily

387
00:19:20,799 --> 00:19:27,918
companies that would like to share data

388
00:19:24,759 --> 00:19:30,399
across different departments of the same

389
00:19:27,919 --> 00:19:32,929
company we are not sure but this is our

390
00:19:30,399 --> 00:19:34,579
guess so

391
00:19:32,929 --> 00:19:36,619
yeah this would be probably I mean this

392
00:19:34,579 --> 00:19:39,739
is this for the moment at least this is

393
00:19:36,619 --> 00:19:46,819
not used to share data as open data for

394
00:19:39,739 --> 00:19:49,240
example okay thank you very much not

395
00:19:46,819 --> 00:19:54,279
around here class

396
00:19:49,240 --> 00:19:54,279
[Applause]