It is difficult to strike a balance between the challenges of research based on open source information (OSINF) and the benefits the society will have from this type of research. The potential harms to individuals need to be clearly identified and addressed, while recognizing that shortcomings might derive from: (i) the very nature of the research (e.g., using social media as the source for the collected data), and (ii) the data protection legal framework. The challenges and the effects that research conducted on the basis of open source information has for individuals are briefly discussed below.
Open Source Information from Social Media and the Nature of Research
Research based on open source information from social media presents many advantages. However, before engaging in such research, one needs to consider the challenges created by the complexity of interactions between individuals, groups, and technical systems in the digital world [1]. These challenges include the self-selecting nature of social media users, inequalities in access to social media platforms and data, the difficulty that analysts face to obtain meaning from heterogeneous data of variable quality and provenance, and a dependency on observing and interpreting what is ‘out there’ in a way that differs from traditional approaches [2].
The greatest challenges for researchers making use of publicly available data from social media are undoubtedly the ethical ones [3]. Firstly, there might be variable perceptions of and unclear boundaries between ‘public’ and ‘private’ spaces. Secondly, there might also be impediments in ensuring anonymity and preserving the privacy of data subjects, whose identities may not be disguised or may be easily deduced from their personal postings and affiliations. Thirdly, data that might reveal sensitive information as well as data from minors and other vulnerable individuals might be inadvertently processed [4].
Awareness of the potential privacy implications of sharing personal information on social media is growing. Previous research, such as Facebook’s experiments in emotion manipulation [5] or the use of social media by data analytics companies seeking insights into citizens’ political attitudes and networks to influence voter behavior [6], have certainly changed the perception of researchers but it is not clear to what extent they have influenced the online behavior of individuals.
Open Source Information and Data Protection
From a data protection perspective, the use of open source information that falls under the category of personal data [7] might provide challenges for the compliance with some of the GDPR principles and standards [8].
Personal Data
“‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;” — GDPR art 4(1) [7]
Accuracy and Reliability
Open source information from social media is not necessarily accurate or reliable. The data should be updated or rectified when necessary. However, the data are often processed in the stage in which they originated from the original social media source. In principle, data subjects have a right to rectify or erase incorrect personal data as well as to object and to restrict processing (s. GDPR Chapter 3 [7]).
Personal data shall be:
“accurate and, where necessary, kept up to date; every reasonable step must be taken to ensure that personal data that are inaccurate, having regard to the purposes for which they are processed, are erased or rectified without delay (‘accuracy’);” — GDPR art 5(1)(d) [7]
Adequate, Relevant and Proportionate Personal Data
Closely linked to the above consideration is the challenge to determine if the data used are to be considered as adequate at the moment of processing. Individuals might have – in the meantime – modified the content or deleted it. Thus, it is difficult to ensure compliance with the right to be forgotten or erasure [9].
Personal data shall be:
“adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed (‘data minimisation’);” — GDPR art 5(1)(c) [7]
Purpose Limitation
Open source information is published on social media for the purpose of communication with other users. Social media providers host the data for the same purpose. Harvesting the data and processing them for scientific research purposes is recognized by the law as compatible with the initial purposes. However, if these data are further processed, beyond the needs of scientific research, the risk that the principle of purpose limitation is not fulfilled is created.
Personal data shall be:
“collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes; further processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes shall, in accordance with Article 89(1), not be considered to be incompatible with the initial purposes (‘purpose limitation’);” — GDPR art 5(1)(b) [7]
Consent
It is often assumed that individuals, who post on social media, consent to the use of these data for other purposes. This assumption does not take into account the fact that individuals might often not be aware of the technicalities and privacy filters introduced by social media providers and thus the publicity of their data is not the result of informed consent or of a clear and positive affirmation (see e.g., [5], GDPR art 7 [7], [10], [11]). In the ‘Ethics and data protection’ guidance note, the European Commission requires a case by case assessment of the use of these data for research, stating that: “If your research project uses data from social media networks and you do not intend to seek the data subjects’ explicit consent to the use of their data, you must assess whether those persons actually intended to make their information public (e.g. in the light of the privacy settings or limited audience to which the data were made available). It is not enough that the data be accessible; they must have been made public to the extent that the data subjects do not have any reasonable expectation of privacy”.
Open Source Information and Risks for Individuals
Apart from the challenges of research discussed above, the potential harms for individuals need to be assessed. For example, given the sensitive context of research dealing with migration issues and the potential vulnerability of individuals, one needs to consider the risks that profiling individuals as ‘migrants’ might entail. While it is not the intention of research to directly affect any individuals, the risk exists of the potential misuse of data and of potential harm caused by any reported research findings. The profiling of individuals as ‘migrants’ alone could potentially expose them to harm, including hate speech, detention, removal, and, for people fleeing persecution, potential pressures from homeland authorities on family members who remain there [12]. For instance, a picture posted on social media may be used by anti-migration groups to feed racist campaigns [13] or the profiling of social media accounts to predict migration flows and to close migration routes can lead people to go through even more dangerous border crossings [14]. These unwanted consequences contradict the basic ethical standards of scientific research, which aim not to cause any harm to individuals. Thus, potential harms need to be identified and assessed and proper safeguards for protecting individuals need to be in place.
Dr Jonida Milaj-Weishaar
Dr Jonida Milaj-Weishaar is Assistant Professor in Technology Law and Human Rights at the University of Groningen (the Netherlands) and a member of the Security, Technology and e-Privacy (STeP) research group. Her main research focus is on the challenges that technology creates for the protection of fundamental rights of individuals. She is a research fellow at the Information Society Law Center of the University of Milan (Italy) and a visiting lecturer at the Central University of Political Science and Law in Beijing (China).
Sources:
- Sean A Munson, et al., ‘Sociotechnical challenges and progress in using social media for health’, 15/10 J Med Internet Res e226. (2013)
- Joanna Taylor, Claudia Pagliari, ‘Mining social media data: How are research sponsors and researchers addressing the ethical challenges’, 14/2 Research Ethics 1-39. (2018)
- David M Berry, ‘Internet research: privacy, ethics and alienation: an open source approach’, 14/4 Internet Research 323–332. (2004)
- Albena Kuyumdzhieva, ‘Data Ethics and Ethics Review Process. Ethics compliance under GDPR’. Presentation. (2018)
- Jukka Jouhki, et al., ‘Facebook’s Emotional Contagion Experiment as a Challenge to Research Ethics’, 4/4 Media and Communication 75-85. (2016)
- Jim Isaak, Mina J Hanna, ‘User Data Privacy: Facebook, Cambridge Analytica, and Privacy Protection’, 51/8 Computer 56-59. (2018)
- Regulation (EU) 2016/679 0f the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and the free movement of such data, and repealing Directive 95/46/EC (GDPR) OJ L119/1, art 4(1). (2016) [pdf]
- Els De Busser, ‘Open Source Data and Criminal Investigations: Anything You Publish Can and Will be Used Against You’, 2/2 GroJIL: Privacy in International Law: Regulating the Internet. (2014) [doi]
- Case C-131/12 Google Spain and Google EU:C:2014:317, para 99.
- Case C-673/17 Planet49 EU:C:2019:801, para 62;
- Case C-61/19 Orange Romania EU:C:2020:901, para 36.