A college announces that a dataset will be publicly released on its website containing the grades (total scoring) assigned to its students in all the courses offered in one specific year


1::The name of the courses and, for each course, the grade assigned to each student. Any identifier of the student (e.g. name, student reference number etc.) will be omitted or will have been replaced by other meaningless reference numbers. Other personal information of students (age, sex etc.) may be in the list.
3::The name of the courses and, for each course, the grades assigned, without any other information
2::Only the grades assigned to the students; no other information (not even the corresponding courses)

Unfortunately, this is not the correct answer. A data list should be considered as anonymized if it is not possible to identify any person of the list, taken into account all the means likely reasonably to be used by any other person. Hence, replacing identifiers with other meaningless reference numbers is not anonymization but pseudonymization; e.g. having the knowledge than only John has been assigned the grade 10 for the course A, we can find out the reference number in the list corresponding to John and, subsequently, all the grades that have been assigned to John to all courses. Even if we simply omit all the unique identifiers, combination of other personal information may also lead to identification (e.g. if we know that the only man at the age of 21 is John).

The correct answer in this scenario is the name of the courses and, for each course, the grades assigned, without any other information. In general, it is quite difficult to achieve full anonymization, if a list with useful data is to be published. Simple removal or change of persons’ identifiers does not mean anonymization. There are specific anonymization methods that the publisher should follow, in order to address several risks for de-anonymization. See for more information the ENISA report Privacy by Design in Big Data: https://www.enisa.europa.eu/publications/big-data-protection

 

This is the correct answer. A data list should be considered as anonymized if it is not possible to identify any person of the list, taken into account all the means likely reasonably to be used by any other person. In general, it is quite difficult to achieve full anonymization if a list with useful data is to be published. Simple removal or change of persons’ identifiers does not mean anonymization. There are specific anonymization methods that the publisher should follow, in order to address several risks for de-anonymization. See for more information the ENISA report Privacy by Design in Big Data: https://www.enisa.europa.eu/publications/big-data-protection

 

This is only partially correct. Indeed, this list, which is a simple list of grades (i.e. numbers) is fully anonymized, since it is not possible to identify any person of the list, taken into account all the means likely reasonably to be used by any other person. However it is not a useful list in the context of the intended purpose of publication (e.g. we cannot compute the average values of the grades assigned to the course A). The correct answer in this scenario is the name of the courses and, for each course, the grades assigned, without any other information. In general, it is quite difficult to achieve full anonymization, if a list with useful data is to be published. Simple removal or change of persons’ identifiers does not mean anonymization. There are specific anonymization methods that the publisher should follow, in order to address several risks for de-anonymization. See for more information the ENISA report Privacy by Design in Big Data: https://www.enisa.europa.eu/publications/big-data-protection

 


for the purpose of allowing to everyone further processing for scientific/statistical analysis, i.e. to compute average values, as well as maximum and minimum values and other statistical information on the grades corresponding to each course. The college states in its announcement that the list will be fully anonymized.

Which are the data of such a fully anonymized list that do you expect to see?