AI Thought Leadership Series | Responsible Business Center: Artificial Intelligence — Who Owns the Content and Output, and How Will Intellectual Property Rights be Protected?
Faculty , Responsible Business Center | Sep 05, 2024 | Gabelli School of Business
This article is the second of a four part series on AI and responsible business brought to you by the Gabelli School of Business faculty members and Responsible Business Center staff. Stay tuned for future editions of this series!
Two years ago, Columbia University Press (CUP) published a biography Paul Sonkin and I wrote on Roger F. Murray. Most readers of this essay will not know who Murray is. The full details of his life are not important for this discussion, although a few comments will put him in context. Murray had a long and successful career as a professional money manager for Bankers Trust and then as a full-time Columbia Business School (CBS) professor. He is important because, in 1956, he took over teaching the Security Analysis course that Benjamin Graham had created and taught at CBS for 29 years, starting in 1927. Graham is considered to be the father of Value Investing, and many see Murray as his natural successor. Murray was also the bridge between Graham and Bruce Greenwald. Greenwald was one of CBS’s most influential tenured professors over the past 30 years. He decided to reinstate the Value Investing program at CBS in the fall of 1993 with Murray’s assistance. Murray had last taught the Security Analysis course at CBS in 1977, and that was the last time the course was taught. Paul Sonkin and I decided to write Murray’s biography because he kept the Security Analysis and Value Investing program alive at CBS and continued the legacy Ben Graham had initiated.
ChatGPT 3.5 was launched to the public in October 2022. Shortly after the launch, a student showed me how he had used ChatGPT to assist him in one of his assignments for my course. I was impressed by the output and a bit mystified by the program he used (I had not heard of ChatGPT before that discussion). My roommate in college studied neural networks in graduate school in the mid 1980s, and I had been familiar with the core technology and its history. Still, like many people, I had mistakenly lost interest in neural nets because the core technology needed to make more progress. I soon realized the technology had advanced significantly in the past decade and that ChatGPT and similar Large Language Models (LLM) were the modern evolution of neural nets. When ChatGPT 4.0 was introduced in March 2023, I decided to learn more about its capabilities. I was generally impressed with the output. One day, I queried ChatGPT to tell me everything it knew about Roger F. Murray, a topic in which I had extensive knowledge. I was shockingly disappointed by the output. The response to my query suggested that ChatGPT’s only source was Murray’s obituary published by The NY Times in 1998 and, perhaps, a few other random articles that came up with a simple Google search.
I will admit that I am an avid Wikipedia user. I never assume that the content is entirely accurate, but the entries are an excellent way to get up to speed quickly on a topic. The best part of the Wiki platform is the sources and references listed at the end of the entry. Although I have found that many of the links are broken, they provide a way to reference the source material and double-check the claims. My biggest concern with ChatGPT (and LLMs in general) is that there is no way to check facts or verify sources. I discovered this instantly when I queried ChatGPT on Murray. In one of his many great essays on the misinterpretation of evolutionary forces, the esteemed evolutionary biologist and prolific writer Stephen J. Gould tells the story of how an inaccurate graphical representation of the evolution of the horse was first published in 1874 and repeated in several generations of science textbooks unaltered after that. All the authors should have realized the mistake along the way, yet the error was copied from one science book to the next. The LLMs do not know which “facts” they collect are accurate and which are simply wrong. All input is treated essentially the same. This issue should concern everyone, particularly when biases enter the models.
Writing the biography of Murray was challenging, and it took a lot of effort to find source material for his life. Interestingly, according to Murray’s son, his dad was a bit of a packrat and kept every piece of paper from his life—research articles he had written, correspondences he had sent and received, internal memos, and other documentation collected throughout his life. The son informed us that he had stored all of Murray’s papers in several dozen boxes in his garage after his dad died. However, he disposed of all the boxes a year before we started the project because of extensive water damage from a severe winter storm caused by flooding in his garage. It was clear that the readily available corpus ChatGPT had to work with when building knowledge about Murray and his life was extremely limited. This realization was a critical “ah ha” moment for me as I recognized that the LLMs are only as good as the information to which they have access. And in Murray’s case, the amount of information was close to nothing. My second “ah ha” moment was the realization that our biography was not part of the corpus, which I found interesting, at least initially. However, the more I thought about that specific issue, the more I realized that I did not want our biography to be part of ChatGPT (or any other LLM) because I would not receive any reward, recognition, or remuneration from the output. I doubt many people will consult ChatGPT to learn about Murray, but that does not change the one-sided nature of these models.
I quickly realized a third insight. I am not the only creator facing this issue. I have become sympathetic to ALL creators who must deal with the challenge of not getting any reward for being part of a model’s corpus. One of the most critical questions we must resolve is who owns the output from these models. For instance, if two people ask an LLM the same question, they should get the same response. Do either of the users “own” the output? Can the first user claim precedent as to ownership and prevent others from using that work product? What if a user wants to give credit to the source material? How would they know who or what to acknowledge? Looking back on my use of ChatGPT, I wonder which prior work or reference material I used without realizing or acknowledging it.
In July 2024, I launched a new publishing company called Partners Media with my close friend Myles Thompson. Myles has over 40 years of publishing experience, and I have purchased (although not read!) over 4,000 books (many I have sold back to the Strand in New York City). We knew we would have fun working together to build a next-generation publishing house as we were both frustrated with the traditional publishing model. Our business will develop in the age of AI, and we realize that we need to have a plan as to how we will deal with many of the issues outlined in this essay. Myles tells a story about a well-respected professor who was fired from his tenured position for plagiarism. One of the TAs helping him write questions for his textbook had taken questions from another textbook on the same subject. The professor claimed that he was unaware of the TA’s actions.
Nonetheless, the professor had (intentionally or inadvertently) committed plagiarism. This issue is significantly more complicated in a world with LLMs. It is still being determined whether we can even define plagiarism any longer, a concern that all academics must be concerned with and one we all need to resolve.
Like many other individuals, I am intrigued by the potential of the new AI models. I have increased my usage and have been generally impressed by the results. However, I think we need clarification on the content issue, both as it relates to the input to these models and the rules around using the output from these models.
Written by: Paul Johnson, adjunct professor, and director of the Center for Global Security Analysis, Fordham University, Gabelli School of Business and OpenAI. (2024). ChatGPT 4.0 [Large language model].