In the time of heated discussions around another AI achievements, it is necessary to review the notions connected with the database design process that facilitates creating new machine-learning-based solutions. In this context, the important role is played by open source data (including Creative Commons). The results of research and grassroot analyses indicate that, in the recent years, sets of sensitive data have been created based on the content shared by the Internet users via open source sites. Users are not aware to what extent their content can be used for the purposes of the AI algorithms development. What is especially controversial is the use of photographs shared on the photo hosting portals (such as Flickr) for the development of facial recognition algorithms. The research project executed for the needs of the AI_Commons (Open Future Foundation) involved the qualitative and quantitative analysis of answers provided to the survey questions addressed to the users of photo hosting portals. The respondents were asked about their experience and observed for reactions to the examples of faces photographs being included to databases for the purposes of algorithms development. The research results suggest that users of such platforms have limited knowledge about the possible use of the photos they share, while expressing objections against their use for the development of algorithms, especially in the commercial context. The article expands the discussion of the research results accounting for such phenomena as the privacy paradox and surveillance apathy. As a result of the wholistic analysis, the author indicates critical areas, which require a broader debate among experts, and gives certain guidelines to digital product and service designers who wish to publish open source content.
Keywords: artificial intelligence (AI), UX research, open source, data sets, privacy paradox