1 00:00:10,490 --> 00:00:16,520 hi everyone thanks Daniel for the 2 00:00:13,400 --> 00:00:19,070 introduction so I'm very excited to be 3 00:00:16,520 --> 00:00:20,830 here and today I will present our paper 4 00:00:19,070 --> 00:00:24,410 when the signal is in the noise 5 00:00:20,830 --> 00:00:27,349 exploiting the vixx's sticky noise first 6 00:00:24,410 --> 00:00:29,900 I will give a little bit of context so 7 00:00:27,350 --> 00:00:31,820 suppose you have a nice data set and you 8 00:00:29,900 --> 00:00:35,180 would like to share this data set with 9 00:00:31,820 --> 00:00:38,080 an analyst how do you make sure that the 10 00:00:35,180 --> 00:00:40,580 analyst can analyze the data without 11 00:00:38,080 --> 00:00:42,379 compromising the privacy of the people 12 00:00:40,580 --> 00:00:46,070 who contributed with their data 13 00:00:42,379 --> 00:00:48,019 well you could anonymize the data which 14 00:00:46,070 --> 00:00:49,970 essentially means removing direct 15 00:00:48,020 --> 00:00:52,820 identifiers from the records such as 16 00:00:49,970 --> 00:00:54,110 phone number name and so on the problem 17 00:00:52,820 --> 00:00:56,480 is that we know by now that 18 00:00:54,110 --> 00:00:59,660 anonymization does not really work 19 00:00:56,480 --> 00:01:06,470 because anonymous datasets can often be 20 00:00:59,660 --> 00:01:09,020 reified so the issue with anonymization 21 00:01:06,470 --> 00:01:11,539 is that in the end the analyst still 22 00:01:09,020 --> 00:01:13,970 gets to see the individual level data 23 00:01:11,539 --> 00:01:17,240 and she can do whatever she wants with 24 00:01:13,970 --> 00:01:20,179 it and this is a problem and so we need 25 00:01:17,240 --> 00:01:23,899 a safer solution and a possible approach 26 00:01:20,180 --> 00:01:26,840 our data query systems so here the idea 27 00:01:23,900 --> 00:01:29,210 is that the data is kept behind a 28 00:01:26,840 --> 00:01:32,060 protected server and the analyst can 29 00:01:29,210 --> 00:01:36,798 send queries to the server and get 30 00:01:32,060 --> 00:01:39,799 results they in return but what if the 31 00:01:36,799 --> 00:01:42,880 analyst is malicious in this case the 32 00:01:39,799 --> 00:01:46,070 analyst could try and send queries to 33 00:01:42,880 --> 00:01:48,380 infer sensitive information to extract 34 00:01:46,070 --> 00:01:50,899 summited sensitive information about a 35 00:01:48,380 --> 00:01:53,539 single individual in the data set so 36 00:01:50,900 --> 00:01:55,850 intuitively you would like to allow only 37 00:01:53,540 --> 00:01:58,460 queries that perform some kind of 38 00:01:55,850 --> 00:02:01,520 aggregation however in practice this is 39 00:01:58,460 --> 00:02:04,820 not easy to do for example consider the 40 00:02:01,520 --> 00:02:07,429 following query how many people named 41 00:02:04,820 --> 00:02:11,299 Bob have a salary less than 2,000 pounds 42 00:02:07,430 --> 00:02:13,250 of course you would like to block this 43 00:02:11,299 --> 00:02:15,320 kind of query and more generally you 44 00:02:13,250 --> 00:02:18,919 would like to block any query that 45 00:02:15,320 --> 00:02:22,769 selects a small number of users in the 46 00:02:18,919 --> 00:02:25,830 data set however such a man 47 00:02:22,770 --> 00:02:29,070 is easily circumvented by using two 48 00:02:25,830 --> 00:02:31,080 queries the first one is how many people 49 00:02:29,070 --> 00:02:33,959 have a salary less than 2,000 pound and 50 00:02:31,080 --> 00:02:36,870 the second one is how many people not 51 00:02:33,960 --> 00:02:40,830 named Bob have a salary less than 2,000 52 00:02:36,870 --> 00:02:43,890 pounds and of course what these two 53 00:02:40,830 --> 00:02:45,750 queries could wouldn't select many users 54 00:02:43,890 --> 00:02:47,940 so they would be allowed but by taking 55 00:02:45,750 --> 00:02:52,070 the difference in the output this can be 56 00:02:47,940 --> 00:02:54,960 either 0 or 1 depending on Bob's salary 57 00:02:52,070 --> 00:02:56,940 this is called a different difference in 58 00:02:54,960 --> 00:03:01,440 attack but many more attacks are known 59 00:02:56,940 --> 00:03:04,550 in the literature so a core idea to 60 00:03:01,440 --> 00:03:07,710 protect against these kind of attacks is 61 00:03:04,550 --> 00:03:10,050 randomness addition and here the idea is 62 00:03:07,710 --> 00:03:12,600 that the data curator adds a little bit 63 00:03:10,050 --> 00:03:17,220 of random noise to the output before we 64 00:03:12,600 --> 00:03:19,560 listen them to the analyst and the noise 65 00:03:17,220 --> 00:03:22,170 could be for example drawn by from a 66 00:03:19,560 --> 00:03:25,620 normal distribution centered in 0 so 67 00:03:22,170 --> 00:03:30,079 that the the small smaller values of the 68 00:03:25,620 --> 00:03:33,600 noise are more likely to be extracted 69 00:03:30,080 --> 00:03:34,940 however implementing noise addition is 70 00:03:33,600 --> 00:03:38,280 not easy 71 00:03:34,940 --> 00:03:40,170 and indeed in 2003 dinner and Nixon 72 00:03:38,280 --> 00:03:43,200 proposed the first reconstruction attack 73 00:03:40,170 --> 00:03:45,030 and without going into details they 74 00:03:43,200 --> 00:03:47,489 showed that if the noise is not enough 75 00:03:45,030 --> 00:03:50,070 then an attacker can reconstruct the 76 00:03:47,490 --> 00:03:53,250 full dataset in polynomial time and 77 00:03:50,070 --> 00:03:57,660 since then data has been generalized and 78 00:03:53,250 --> 00:04:01,950 improved many times so how can we add 79 00:03:57,660 --> 00:04:05,040 enough noise and how can we do so what's 80 00:04:01,950 --> 00:04:06,989 the right way to add noise so a possible 81 00:04:05,040 --> 00:04:10,200 answer is given by a differential 82 00:04:06,990 --> 00:04:12,270 privacy which was proposed in a 2006 and 83 00:04:10,200 --> 00:04:16,500 I'm sure many of you have heard about it 84 00:04:12,270 --> 00:04:18,870 already so differential privacy is a 85 00:04:16,500 --> 00:04:20,519 very broad topic and it's very hard to 86 00:04:18,870 --> 00:04:23,550 make statements that are a hundred 87 00:04:20,519 --> 00:04:25,830 percent correct but I think that most 88 00:04:23,550 --> 00:04:27,900 people would agree that differential 89 00:04:25,830 --> 00:04:32,240 privacy has positive and negative 90 00:04:27,900 --> 00:04:35,429 aspects on the upside it gives 91 00:04:32,240 --> 00:04:36,540 meaningful improvable guarantees of 92 00:04:35,430 --> 00:04:38,310 privacy 93 00:04:36,540 --> 00:04:41,070 and it provides a mathematical framework 94 00:04:38,310 --> 00:04:45,570 to reason about the privacy utility 95 00:04:41,070 --> 00:04:48,000 trade-off on the other hand au courant 96 00:04:45,570 --> 00:04:51,120 differential privacy mechanisms often 97 00:04:48,000 --> 00:04:55,950 add too much noise to the outputs making 98 00:04:51,120 --> 00:04:57,990 utility pretty bad and moreover it's 99 00:04:55,950 --> 00:05:00,690 pretty hard with differential privacy to 100 00:04:57,990 --> 00:05:03,780 allow many queries to the analyst and to 101 00:05:00,690 --> 00:05:07,800 provide good usability and flexibility 102 00:05:03,780 --> 00:05:10,739 in the platform for the analyst and for 103 00:05:07,800 --> 00:05:13,230 these reasons people have been starting 104 00:05:10,740 --> 00:05:16,130 to look for alternatives to differential 105 00:05:13,230 --> 00:05:18,510 privacy that are perhaps not based on 106 00:05:16,130 --> 00:05:21,690 mathematical guarantees of privacy but 107 00:05:18,510 --> 00:05:25,830 rather on heuristics and the fix is one 108 00:05:21,690 --> 00:05:27,780 of them so the fix is a patented 109 00:05:25,830 --> 00:05:30,330 commercial system developed by the 110 00:05:27,780 --> 00:05:32,190 company err clock and some researchers 111 00:05:30,330 --> 00:05:35,400 at the Max Planck Institute for software 112 00:05:32,190 --> 00:05:38,100 systems and specifically D fix is a 113 00:05:35,400 --> 00:05:41,190 privacy-preserving database system that 114 00:05:38,100 --> 00:05:43,580 in practice operates as an SQL proxy 115 00:05:41,190 --> 00:05:46,170 between the analyst and the database and 116 00:05:43,580 --> 00:05:49,919 diffict provides some unique features 117 00:05:46,170 --> 00:05:52,830 such as are its SQL syntax little noise 118 00:05:49,920 --> 00:05:56,940 added to the outputs and infinitely many 119 00:05:52,830 --> 00:06:01,770 queries allowed to every analyst and 120 00:05:56,940 --> 00:06:03,930 these features are precisely meant to 121 00:06:01,770 --> 00:06:07,890 address the limitation of the 122 00:06:03,930 --> 00:06:12,750 limitations of differential privacy so 123 00:06:07,890 --> 00:06:14,849 the way we fix protects privacy is by 124 00:06:12,750 --> 00:06:17,520 means of a novel noise addition 125 00:06:14,850 --> 00:06:20,940 mechanism that they call sticky noise so 126 00:06:17,520 --> 00:06:24,030 here is how it works so suppose that an 127 00:06:20,940 --> 00:06:26,190 analyst submits account query Q to D fix 128 00:06:24,030 --> 00:06:28,739 such as this one that selects all the 129 00:06:26,190 --> 00:06:32,300 users with condition which satisfy 130 00:06:28,740 --> 00:06:36,090 condition one condition two and so on so 131 00:06:32,300 --> 00:06:39,570 the fixes output to this query would be 132 00:06:36,090 --> 00:06:43,080 the true count of the query plus static 133 00:06:39,570 --> 00:06:44,700 noise plus dynamic noise so without 134 00:06:43,080 --> 00:06:47,940 going into the details of how these 135 00:06:44,700 --> 00:06:49,979 noises are computed the main ideas are 136 00:06:47,940 --> 00:06:53,180 that first static the static 137 00:06:49,980 --> 00:06:55,890 depends only on the query syntax of Q 138 00:06:53,180 --> 00:06:58,680 the dynamic noise depends on the query 139 00:06:55,890 --> 00:07:01,229 syntax and also on the user set of Q 140 00:06:58,680 --> 00:07:05,220 which is the set of user IDs that are 141 00:07:01,230 --> 00:07:08,330 selected by Q in the data set and both 142 00:07:05,220 --> 00:07:12,120 noises are sticky which means that 143 00:07:08,330 --> 00:07:14,880 repeating the same query to the fix will 144 00:07:12,120 --> 00:07:21,090 always give the same nice value for that 145 00:07:14,880 --> 00:07:24,630 query so another thing that to keep in 146 00:07:21,090 --> 00:07:26,849 mind about the fix is that both the 147 00:07:24,630 --> 00:07:30,380 static noise and the dynamic noise are 148 00:07:26,850 --> 00:07:34,470 made of smaller noise values and 149 00:07:30,380 --> 00:07:36,360 specifically one per condition so for 150 00:07:34,470 --> 00:07:39,150 example in a query with three conditions 151 00:07:36,360 --> 00:07:42,570 we would have that the output of the fix 152 00:07:39,150 --> 00:07:45,479 is the true count plus three noise 153 00:07:42,570 --> 00:07:47,760 values for the static noise and three 154 00:07:45,480 --> 00:07:50,780 noise values for the dynamic noise and 155 00:07:47,760 --> 00:07:55,710 it's noise value is drawn from a 156 00:07:50,780 --> 00:07:57,809 standard normal distribution so in the 157 00:07:55,710 --> 00:08:01,590 paper that describes defects the authors 158 00:07:57,810 --> 00:08:03,540 explain why this mechanism protects 159 00:08:01,590 --> 00:08:06,000 against some known some attacks that are 160 00:08:03,540 --> 00:08:08,760 known in the literature and also the 161 00:08:06,000 --> 00:08:10,560 they explain they present other measures 162 00:08:08,760 --> 00:08:12,180 that are implemented in the fix but I 163 00:08:10,560 --> 00:08:16,040 will not cover them in in this 164 00:08:12,180 --> 00:08:19,500 presentation okay so we I can finally 165 00:08:16,040 --> 00:08:22,170 present our attack or noise exploitation 166 00:08:19,500 --> 00:08:23,760 attacks on D fix and the reason why we 167 00:08:22,170 --> 00:08:27,360 call them noise exploitation is that 168 00:08:23,760 --> 00:08:29,580 they actually exploit the fact that part 169 00:08:27,360 --> 00:08:32,520 of the noise defects ads actually 170 00:08:29,580 --> 00:08:36,689 depends on the data and we can use this 171 00:08:32,520 --> 00:08:39,419 as a signal for sensitive information so 172 00:08:36,690 --> 00:08:42,500 here is the attack model first we take a 173 00:08:39,419 --> 00:08:45,210 data set that has the attributes and 174 00:08:42,500 --> 00:08:48,900 particularly the last attribute is as 175 00:08:45,210 --> 00:08:51,900 secret attribute and the attacker 176 00:08:48,900 --> 00:08:56,040 targets one user at a time which we call 177 00:08:51,900 --> 00:08:56,850 Bob and the attackers goal is to infer 178 00:08:56,040 --> 00:08:59,520 Bob's 179 00:08:56,850 --> 00:09:01,740 at Bob's secret attribute s so for 180 00:08:59,520 --> 00:09:03,689 simplicity here we assume that the 181 00:09:01,740 --> 00:09:06,030 secret attribute is binary but actually 182 00:09:03,690 --> 00:09:09,180 can be generalized to non-binary 183 00:09:06,030 --> 00:09:11,240 attributes we also assume that the 184 00:09:09,180 --> 00:09:15,239 attacker has some auxiliary information 185 00:09:11,240 --> 00:09:18,240 about Bob first she knows that Bob's 186 00:09:15,240 --> 00:09:20,040 record is in the data set and second she 187 00:09:18,240 --> 00:09:23,760 knows that the value she knows the value 188 00:09:20,040 --> 00:09:25,500 of K attributes about Bob and here's an 189 00:09:23,760 --> 00:09:28,200 example with D equals three and K equals 190 00:09:25,500 --> 00:09:30,360 two so as before we can have that the a 191 00:09:28,200 --> 00:09:33,870 data set that includes attributes age 192 00:09:30,360 --> 00:09:36,540 department and high salary the secret 193 00:09:33,870 --> 00:09:39,660 attribute would be high salary and Bob's 194 00:09:36,540 --> 00:09:41,520 record would be as equals 40 compute 195 00:09:39,660 --> 00:09:43,800 apartment equals computing and high 196 00:09:41,520 --> 00:09:46,260 salary equals true but what the attacker 197 00:09:43,800 --> 00:09:48,120 would know is only that Bob is four 198 00:09:46,260 --> 00:09:50,430 years old and is in Department of 199 00:09:48,120 --> 00:09:55,340 computing and she would like to find out 200 00:09:50,430 --> 00:09:59,250 that Bob has high salary equals true so 201 00:09:55,340 --> 00:10:04,230 here's our first attack which we call 202 00:09:59,250 --> 00:10:06,780 differential attack and assume that Bob 203 00:10:04,230 --> 00:10:08,850 is the only person in the data set which 204 00:10:06,780 --> 00:10:11,550 is four years old and is in the 205 00:10:08,850 --> 00:10:14,190 department of computing then we would 206 00:10:11,550 --> 00:10:17,819 issue the two queries that you see at 207 00:10:14,190 --> 00:10:20,040 the top we would get the answers from 208 00:10:17,820 --> 00:10:23,000 defects with the noise and we would 209 00:10:20,040 --> 00:10:28,319 consider the difference between the two 210 00:10:23,000 --> 00:10:30,150 outputs so q1 minus q2 would be of 211 00:10:28,320 --> 00:10:34,260 course the difference between the true 212 00:10:30,150 --> 00:10:38,430 counts plus all the noise layers for q1 213 00:10:34,260 --> 00:10:41,189 and all the noise layers for q2 and this 214 00:10:38,430 --> 00:10:43,410 is quite a bit of noise but actually 215 00:10:41,190 --> 00:10:45,690 with a little bit of work we can see 216 00:10:43,410 --> 00:10:49,110 that some of the noise layers are the 217 00:10:45,690 --> 00:10:51,840 same and actually we more even more 218 00:10:49,110 --> 00:10:55,140 interestingly we can see that some nice 219 00:10:51,840 --> 00:10:59,370 layers cancel out depending on Bob's 220 00:10:55,140 --> 00:11:02,880 attribute specifically if Bob has high 221 00:10:59,370 --> 00:11:07,110 salary equals true then for static noise 222 00:11:02,880 --> 00:11:10,770 layers cancel out but if bob has has not 223 00:11:07,110 --> 00:11:11,220 high salary then also for dynamic noise 224 00:11:10,770 --> 00:11:16,560 layers 225 00:11:11,220 --> 00:11:17,490 cancel out so what this means is that q1 226 00:11:16,560 --> 00:11:21,170 minus q2 227 00:11:17,490 --> 00:11:25,140 you follows two different distributions 228 00:11:21,170 --> 00:11:27,719 depending on Bob's secret attribute high 229 00:11:25,140 --> 00:11:29,790 salary and specifically it follows 230 00:11:27,720 --> 00:11:31,890 distribution a normal distribution with 231 00:11:29,790 --> 00:11:35,490 mean zero and C and a standard deviation 232 00:11:31,890 --> 00:11:37,410 two if high salary equals true and it 233 00:11:35,490 --> 00:11:39,540 follows a distribution a normal 234 00:11:37,410 --> 00:11:42,569 distribution with mean one and standard 235 00:11:39,540 --> 00:11:46,230 deviation 2k plus two if the high salary 236 00:11:42,570 --> 00:11:49,470 is false and here K again is the number 237 00:11:46,230 --> 00:11:53,730 of attributes known to the attacker 238 00:11:49,470 --> 00:11:55,170 about the victim so in practice the 239 00:11:53,730 --> 00:11:59,040 differential attack has several 240 00:11:55,170 --> 00:12:01,620 limitations and the main ones are that 241 00:11:59,040 --> 00:12:05,849 first it assumes that Bob is unique in 242 00:12:01,620 --> 00:12:08,700 the data set and second some attack 243 00:12:05,850 --> 00:12:10,560 queries are likely to be suppressed by 244 00:12:08,700 --> 00:12:12,420 an additional measure that diff 245 00:12:10,560 --> 00:12:15,750 exploiting put in place and I didn't 246 00:12:12,420 --> 00:12:18,839 explain so ultimately this means that 247 00:12:15,750 --> 00:12:21,390 the accuracy is not great in some cases 248 00:12:18,839 --> 00:12:23,820 for this attack and this is the reason 249 00:12:21,390 --> 00:12:26,040 why we developed a second attack and 250 00:12:23,820 --> 00:12:27,990 improve the attack which we call cloning 251 00:12:26,040 --> 00:12:30,660 attack which achieves much better 252 00:12:27,990 --> 00:12:33,240 accuracy so unfortunately I will not 253 00:12:30,660 --> 00:12:35,100 have time to describe this attack 254 00:12:33,240 --> 00:12:37,140 because it's quite complicated but I 255 00:12:35,100 --> 00:12:40,709 would like to mention that it relies on 256 00:12:37,140 --> 00:12:44,130 a weaker notion of uniqueness that we 257 00:12:40,709 --> 00:12:46,829 named value uniqueness so we say that 258 00:12:44,130 --> 00:12:49,620 our record is value unique with respect 259 00:12:46,829 --> 00:12:52,410 to a set of attributes if all records 260 00:12:49,620 --> 00:12:55,200 sharing the same attributes also have 261 00:12:52,410 --> 00:12:59,850 the same secret attribute and you can 262 00:12:55,200 --> 00:13:01,709 see an example here so the first record 263 00:12:59,850 --> 00:13:05,250 which is Bob's record is value unique 264 00:13:01,709 --> 00:13:07,410 because it shares the same age and 265 00:13:05,250 --> 00:13:11,190 department attributes with the second 266 00:13:07,410 --> 00:13:12,839 record and it also shares the secret 267 00:13:11,190 --> 00:13:15,149 attribute high salary because it's the 268 00:13:12,839 --> 00:13:17,160 same on the other hand the last record 269 00:13:15,149 --> 00:13:19,950 alice's record is not val unique because 270 00:13:17,160 --> 00:13:21,759 the high salary attribute is not the 271 00:13:19,950 --> 00:13:26,499 same between the 3rd and 272 00:13:21,759 --> 00:13:30,970 the fourth record I would also like to 273 00:13:26,499 --> 00:13:33,339 clarify that we do not assume that the 274 00:13:30,970 --> 00:13:35,410 attacker knows that Bob's record is 275 00:13:33,339 --> 00:13:36,819 value unique this is because actually 276 00:13:35,410 --> 00:13:40,149 value uniqueness is detected 277 00:13:36,819 --> 00:13:44,738 automatically by our cloning attack with 278 00:13:40,149 --> 00:13:47,079 pretty good confidence and here are the 279 00:13:44,739 --> 00:13:49,869 results of the cloning attack on three 280 00:13:47,079 --> 00:13:52,959 real-world data sets and one synthetic 281 00:13:49,869 --> 00:13:55,839 data set so on the x-axis you have the 282 00:13:52,959 --> 00:13:58,479 number of attributes known to the 283 00:13:55,839 --> 00:14:01,209 attacker about the victim and on the 284 00:13:58,480 --> 00:14:05,100 y-axis you have the fraction of all 285 00:14:01,209 --> 00:14:08,108 records in the data set so the gray line 286 00:14:05,100 --> 00:14:10,600 indicates the fraction of value unique 287 00:14:08,109 --> 00:14:13,739 records and so you see that as K grows 288 00:14:10,600 --> 00:14:17,769 almost all users become value niek and 289 00:14:13,739 --> 00:14:19,769 the black line is the fraction of users 290 00:14:17,769 --> 00:14:23,499 in the data set that are attacked and 291 00:14:19,769 --> 00:14:26,589 correctly inferred and you can see that 292 00:14:23,499 --> 00:14:28,749 for larger numbers of non large numbers 293 00:14:26,589 --> 00:14:32,379 of known attributes the number of 294 00:14:28,749 --> 00:14:36,039 correctly inferred users goes almost up 295 00:14:32,379 --> 00:14:39,639 to the entire data set around 90% of all 296 00:14:36,039 --> 00:14:43,239 users so this the de Kooning attack in 297 00:14:39,639 --> 00:14:45,970 its original form uses about a few 298 00:14:43,239 --> 00:14:48,220 hundred queries for its user but 299 00:14:45,970 --> 00:14:51,100 actually we modified the attack in a way 300 00:14:48,220 --> 00:14:53,470 that targets about half of the users in 301 00:14:51,100 --> 00:14:57,399 the data set but can work with as little 302 00:14:53,470 --> 00:15:01,449 as 32 queries per user and that's you've 303 00:14:57,399 --> 00:15:05,709 still almost perfect accuracy so air 304 00:15:01,449 --> 00:15:08,019 clock proposed paths for our attack and 305 00:15:05,709 --> 00:15:12,339 the path is supposed to be implemented 306 00:15:08,019 --> 00:15:16,419 in defects by the fourth quarter of this 307 00:15:12,339 --> 00:15:19,509 year so the paths essentially removes 308 00:15:16,419 --> 00:15:22,809 dangerous conditions from the queries 309 00:15:19,509 --> 00:15:25,929 and it does so in a way that again 310 00:15:22,809 --> 00:15:28,569 depends on the data and so the technical 311 00:15:25,929 --> 00:15:31,089 details are not yet available for their 312 00:15:28,569 --> 00:15:32,790 paths but our comment is that in our 313 00:15:31,089 --> 00:15:35,010 opinion 314 00:15:32,790 --> 00:15:38,010 the patch does not really address the 315 00:15:35,010 --> 00:15:40,290 core vulnerability that we pointed out 316 00:15:38,010 --> 00:15:42,600 in our attacks namely that data 317 00:15:40,290 --> 00:15:45,360 dependent noise leaks information about 318 00:15:42,600 --> 00:15:48,480 the data and potentially this patch 319 00:15:45,360 --> 00:15:53,580 introduces new vulnerabilities because 320 00:15:48,480 --> 00:15:55,440 it is again data dependent so I'd like 321 00:15:53,580 --> 00:15:58,230 to mention that other attacks have been 322 00:15:55,440 --> 00:16:00,450 proposed on addy fix the first one is a 323 00:15:58,230 --> 00:16:03,810 membership attacked by jollies and 324 00:16:00,450 --> 00:16:07,050 others and it is based on a previous 325 00:16:03,810 --> 00:16:11,010 paper that they published in and ESS in 326 00:16:07,050 --> 00:16:13,349 2019 the second one is a linear 327 00:16:11,010 --> 00:16:16,319 reconstruction attack by kana Cohen and 328 00:16:13,350 --> 00:16:19,950 miss him and actually this is based on 329 00:16:16,320 --> 00:16:22,220 the original attack from 2003 but it was 330 00:16:19,950 --> 00:16:29,220 actually tweaked quite a bit to work 331 00:16:22,220 --> 00:16:32,520 against defects so to conclude we think 332 00:16:29,220 --> 00:16:36,510 that data query systems are the right 333 00:16:32,520 --> 00:16:41,579 response to the failure of anonymization 334 00:16:36,510 --> 00:16:44,520 and they are the way forward for privacy 335 00:16:41,580 --> 00:16:48,030 preserving their publishing however we 336 00:16:44,520 --> 00:16:52,290 think that incorrectly implementing data 337 00:16:48,030 --> 00:16:55,079 query systems is hard and for this 338 00:16:52,290 --> 00:16:57,719 reason we rely on a single mechanism to 339 00:16:55,080 --> 00:17:01,500 protect privacy such as sticky noise is 340 00:16:57,720 --> 00:17:03,540 risky so we believe that deployed system 341 00:17:01,500 --> 00:17:06,020 should also implement some defense in 342 00:17:03,540 --> 00:17:08,940 that measures such as for example 343 00:17:06,020 --> 00:17:12,839 pre-audit in query rate limiting and so 344 00:17:08,940 --> 00:17:15,329 on but also we think that alternatives 345 00:17:12,839 --> 00:17:18,240 to differential privacy are useful and 346 00:17:15,329 --> 00:17:22,190 actually a system like the fix with some 347 00:17:18,240 --> 00:17:25,050 modification can achieve a reasonable 348 00:17:22,190 --> 00:17:26,730 privacy utility trade-off in some 349 00:17:25,050 --> 00:17:30,020 settings especially in trusted 350 00:17:26,730 --> 00:17:32,270 environment of course in all cases 351 00:17:30,020 --> 00:17:34,530 transparency is fundamental to give the 352 00:17:32,270 --> 00:17:38,660 researchers the possibility to study the 353 00:17:34,530 --> 00:17:42,420 system and assess potential 354 00:17:38,660 --> 00:17:44,600 vulnerabilities and actually we welcome 355 00:17:42,420 --> 00:17:46,230 the fact that Eric lock decided to 356 00:17:44,600 --> 00:17:50,668 publish 357 00:17:46,230 --> 00:17:52,259 in publicly the specification of defects 358 00:17:50,669 --> 00:17:55,710 and we can we hope that they will 359 00:17:52,259 --> 00:17:57,389 continue to do so in the future thank 360 00:17:55,710 --> 00:18:00,040 you for your attention and I'll be happy 361 00:17:57,389 --> 00:18:01,800 to take any questions you may have 362 00:18:00,040 --> 00:18:04,960 [Applause] 363 00:18:01,800 --> 00:18:04,960 [Music] 364 00:18:10,820 --> 00:18:18,229 thanks for the presentation so is it 365 00:18:15,109 --> 00:18:20,320 easy to generalize your technique so 366 00:18:18,229 --> 00:18:26,029 that you can recover non-binary 367 00:18:20,320 --> 00:18:28,489 attributes yes so it's a relatively easy 368 00:18:26,029 --> 00:18:31,729 one of the possible ways to do this is 369 00:18:28,489 --> 00:18:34,309 by replacing the last condition with sex 370 00:18:31,729 --> 00:18:36,919 whether exactly the exact value of the 371 00:18:34,309 --> 00:18:40,940 condition with actual and inequality and 372 00:18:36,919 --> 00:18:42,739 then essentially narrow down the with 373 00:18:40,940 --> 00:18:45,139 inequalities today to the right 374 00:18:42,739 --> 00:18:48,019 attribute of course this might require 375 00:18:45,139 --> 00:18:49,758 more queries but actually the fix is 376 00:18:48,019 --> 00:18:51,979 implemented in a way that allows 377 00:18:49,759 --> 00:18:53,089 infinitely many queries so this is a 378 00:18:51,979 --> 00:19:00,979 this would be actually possible 379 00:18:53,089 --> 00:19:02,208 yesterday photos was not in this area 380 00:19:00,979 --> 00:19:04,190 I'm wondering if you can give us a sense 381 00:19:02,209 --> 00:19:06,769 of the impact of this what types of 382 00:19:04,190 --> 00:19:08,389 systems use this kind of setup like who 383 00:19:06,769 --> 00:19:12,729 are customers of this company that 384 00:19:08,389 --> 00:19:15,488 provides defects so the the customers 385 00:19:12,729 --> 00:19:20,799 for defects are not publicly available 386 00:19:15,489 --> 00:19:24,759 but we think that it might be primarily 387 00:19:20,799 --> 00:19:27,918 companies that would like to share data 388 00:19:24,759 --> 00:19:30,399 across different departments of the same 389 00:19:27,919 --> 00:19:32,929 company we are not sure but this is our 390 00:19:30,399 --> 00:19:34,579 guess so 391 00:19:32,929 --> 00:19:36,619 yeah this would be probably I mean this 392 00:19:34,579 --> 00:19:39,739 is this for the moment at least this is 393 00:19:36,619 --> 00:19:46,819 not used to share data as open data for 394 00:19:39,739 --> 00:19:49,240 example okay thank you very much not 395 00:19:46,819 --> 00:19:54,279 around here class 396 00:19:49,240 --> 00:19:54,279 [Applause]