Read it anywhere by subscribing to my RSS feed!
Article published on Wed, 15 Feb 2023 17:36:28 GMT
Written by Bart Nijbakker
Many people have recently been asking me about ChatGPT, so let's dive in.
It's the next hot topic: the launch of ChatGPT. Some are declaring a state of emergency, others seem not to be affected by it at all. To better understand what I am talking about, let me quote Wikipedia to explain what ChatGPT is:
ChatGPT (Chat Generative Pre-trained Transformer) is a chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and has been fine-tuned (an approach to transfer learning) using both supervised and reinforcement learning techniques.
Allow me to remind you that this article is based on research that I have done, and for the most part consists of my personal opinion which might not be shared by others. It is always best to do your own research and draw your own conclusions.
ChatGPT is surely useful for some scenarios. Its vast knowledge of computer code, for example, can be very useful for developers to debug their code or find bottlenecks. It can help writers improve their work and scholars save time on their studies. However, the way I see it, there are mostly downsides to this seemingly incredible technology.
Of course, I had to try ChatGPT myself, but it is required to create an account in order to access it. This surprised me, but I continued, filling in an anonymous name and email address. This should be possible for most people.
However, ChatGPT needs something else in order to register: a phone number. This is a major problem, because in today's society, a phone number is like a social security number. It is directly linked to your real identity, and acquiring an anonymous phone number is unpractical, to say the least.
OpenAI will likely justify this requirement by claiming it is to fight spam and restrict non-human access to their chatbot. This is a valid concern, as submitted text is used to further improve the program. However, a simple Captcha would suffice here. There is no need to know people's identity in order to confirm that they are human.
To me, the fact that people are forced to submit their phone number in order to access ChatGPT shows that OpenAI does not want them to be private. In fact, the chatbot actually collects user-submitted text to further train itself. Therefore, be aware that your conversations with ChatGPT are not confidential and can be reproduced to other users of the service or read by OpenAI employees.
Besides requiring people to unveil their real identity, asking for a phone number actually makes many people, including myself, unable to access the chatbot. It might come as a surprise, but not everyone is willing to sacrifice their phone number for a seemingly "free" product.
The "Open" in OpenAI implies values like transparency, equal access and public benefit, values of a non-profit. However, OpenAI has turned for-profit in 2019, and accepted an immense multi-billion dollar investment from Microsoft in exchange for sole access to GPT-3's source code.
This is a big red flag to me, because it flies directly in the face of all values that OpenAI seems to have, such as transparency and free access.
Read more about GPT-3 on Wikipedia.
GPT was built with data from the Common Crawl dataset, a conglomerate of copyrighted articles, internet posts, web pages, and books scraped from 60 million domains over a period of 12 years. TechCrunch reports this training data includes copyrighted material from the BBC, The New York Times, Reddit, the full text of online books, and more.
Everything ChatGPT "knows" is scraped off the internet. This includes content in the public domain, but also immense amounts of copyrighted work, such as literature, articles, lyrics and source code. ChatGPT does not properly credit its sources, and therefore infringes on their copyright. This is a major problem for artists and authors, but also people's freedoms, as public works are now used in non-public programs.
ChatGPT's knowledge also contains personal and even sensitive information which can now be presented outside of its context. In my opinion, this heavily infringes on people's right to privacy, as facts may be presented in a misleading way, or sharing of private information can be done unchecked and in an unpredictable fashion.
Although ChatGPT was trained using all kinds of information from the public domain, it has a proprietary license and is therefore not in the public domain. GPT-3's source code is also proprietary. This surprised me, again, because the name "OpenAI" suggests that it is publicly available.
I will quote Wikipedia again here, as some of these concerns have already been thoroughly explained there.
It was revealed by a TIME magazine investigation that to build a safety system against toxic content (e.g. sexual abuse, violence, racism, sexism, etc.), OpenAI used outsourced Kenyan workers earning less than $2 per hour to label toxic content. These labels were used to train a model to detect such content in the future. The outsourced laborers were exposed to such toxic and dangerous content that they described the experience as "torture". OpenAI's outsourcing partner was Sama, a training-data company based in San Francisco, California.
Because ChatGPT can generate almost human-like essays and articles, it has become much harder for teachers and institutions to verify generated material. It also becomes harder to trace content back to the original source, because ChatGPT combines various works in an unknown manner.
Although I am not afraid of doomsday predictions or AI taking over the world just yet, I am worried about our human rights such as privacy and freedom, and the future of important topics such as copyright and the public domain. Programs such as ChatGPT pose a serious threat to our freedoms by restricting access to otherwise publicly accessible information and knowledge.
Web design and content © 2024 Bart Nijbakker.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Read more about sharing