‘Data Crawling’ is a technique that collects data from SNS(Social Network Service) and apps and uses it as AI (Artificial Intelligence) for learning or research data. However, the legal boundaries are not clear, which has become a source of controversy. Let’s find out what the current situation is and where to go from there. ..........................Ed
The fourth industrial revolution began, and ‘Big Data’ emerged as a new issue. Companies analyze consumers through data collection, provide customized information for them, and promote their products to generate profits. Data collection technology is an important technique for determining the quality of big data delivery services. As a result, collecting as much personal information as possible has become a competitive advantage. On the other hand, companies that collect and use personal information without permission are discovered by consumers, and they are faced with consequences. Global company Google has also been sued for nearly $5.3 billion for illegally tracking users’ personal information.
Recently in Korea, personal information was collected without permission by users during the development of the chatbot Lee Luda. This claim was raised through the Cheong Wa Dae National Petiton. On January 12, a message was posted on the Cheong Wa Dae National Petition Bulletin Board, saying, “We demand the end of full-scale service on Scatter Lab, which created AI chatbots through the unauthorized use and leakage of the users’ personal information.” The petitioner claimed, “Scatter Lab is an application launched in 2016 that analyzes conversation patterns, such as response time, and shows the number and type of affection by illegally taking data from users off the platform without any notice or consent.”
Scatter Lab said in an official statement on January 11 that it used the conversation of users collected through the company’s service ‘Science of Love’ to develop it, but stopped the service. Currently, the Personal Information Protection Committee and the Korea Internet Promotion Agency are investigating the case, and users of Science of Love are reportedly preparing to sue.
According to the purpose of the Personal Information Protection Act, citizens should be able to agree to process personal information based on free will. However, Scatter Lab’s app goes against the purpose of the law. Therefore, civic groups demanded that the Personal Information Protection Commission investigate all products of Scatter Lab, including ‘Science of Love,’ ‘Text At,’ and Lee Luda.
Even before Lee Luda, companies and researchers acquired personal information in a plethora of unethical ways. One such example is the case of the U.S. international online dating site ‘OkCupid.’ On May 8, 2016, Danish researchers Emil Kirkegaard and Julius Daugbjerg Bjerrekaer used bots (autonomous programs on the Internet or another network that can interact with systems or users) to collect and compile OkCupid’s user-released profiles. Afterwards, the compiled dataset was released publicly on a psychology research website. Among the collected contents were username, age, gender identity, physical location, and gender preference (or relationship). When controversy arose, Kirkegaard, the representative of the study, said, “It is public data and what we have done is just collect and present it in a more useful form.” The dataset is now no longer public, but it remains an example of how researchers acquired data through questionable means. Other instances of this is Harvard University collecting more than 1,700 Facebook profiles in 2008 for research purposes, and Pete Warden, former Apple employee, collected more than 100 GB of Facebook profiles, friends lists, and fan pages for the same purpose of ‘research.’
An example of companies illegally acquiring data for profit is the ‘Facebook-Cambridge Analytica Data Scandal.’ In 2013, a British company called Cambridge Analytica developed a psychological examination app named ‘This Is Your Digital Life.’ This app provided rewards for people completing several-hundred questionnaire for “research purposes.” However, it was eventually found out that the app collected information, such as Facebook profiles and friends lists, without the user’s consent, in addition to the information collection policy that was agreed to by the user. Cambridge Analytica was shown to have used the questionnaire answers to construct psychological profiles of the users and further analyzed the illegally obtained Facebook information to guess their political stance. Said information was then used to expose users with targeted advertisements, such as exposing promotional materials of certain politicians to users with certain political profiles. This technology was proven to have been used to assist Ted Cruz and Donald Trump in their presidential runs in 2016. In the case of Trump’s presidential run, his supporters were shown pictures of Trump looking triumphant along with information on the location of nearby polling stations, while politically neutral people who are likely to vote for Trump were shown negative images of his then rival Hillary Clinton.
Cambridge Analytica’s unethical collection of information led to disciplinary action by both the U.S. FTA and the British Information Commissioner’s Office following the accounts of whistleblowers in 2018. Facebook was fined $5 million and £500,000, respectively, “for violating users’ right to privacy”, while Cambridge Analytica filed for bankruptcy.
As unethical data collection grows and diversifies in method, companies and researchers alike are also making efforts to protect data ethics. After the then Google CEO made the company’s first public statement about AI ethics at a developer conference in 2016, Google hired full-time AI policy and ethics staff in 2018, and companies like IBM began hiring AI ethics experts soon after. The research community has since developed and complied with several self-written Codes of Conduct, including the ‘Oxford Munich Code of Conduct: Ethical Data Science’ and the ‘Data Science Association’s Code of Conduct.’
Now that Lee Luda initiates debate, Korean companies and researchers also need to be alert to the ethics of data collection. Countries should establish more legal definitions on the boundaries of data collection. In a society where data on individuals is being highly commercialized, here’s to hoping that human rights will not be the next thing to be traded on the market.
Noh Yun-jung (ST Cub-Reporter)
Lee Tae-ran (ST Reporter)