AI is terrifying, but not for the reasons you think!
The robots are going to take over. That's the fear, isn't it? With the evolution of artificial intelligence moving at an almost incomprehensibly fast pace, it's easy to understand why we get preoccupied with this idea. Everywhere we turn, there are headlines about AI stealing human jobs. Goldman Sachs even published a report last year saying that AI could replace the equivalent of 300 million full-time jobs. Generative AI is more accessible than ever, and workers are anxious.
A Price Waterhouse Cooper survey from May 2022 found that almost one-third of respondents were worried about their employment roles being replaced by technology in the next three years. Creatives worldwide are fearful that art, as we know it, faces an existential threat with the proliferation of AI. For the first time, we’re seriously asking whether human authenticity is a necessary part of the world anymore.
Of course, the worst fear is that artificial intelligence will reach a point of self-improvement so advanced that it will become uncontrollable. If the AI can teach itself and achieve superior intelligence to us mere mortals, what will become of our future? These doomsday scenarios are an important part of the conversation. The truth is nobody knows what will happen in 10 or 20 years, let alone 10 and 20 minutes. We can try to predict the path that AI will take, but two short years ago, we were all playing around with the first public release of ChatGPT, completely enthralled with its mere existence. Now it's just a regular part of many people's lives.
Besides, we don't need to preoccupy ourselves with being controlled by robots; there's plenty happening right now that should raise some red flags. Generally speaking, we think advanced technology is synonymous with sustainability, but that's not often the case. There are always trade-offs. The hope is that the technology is beneficial enough to society and the environment that the trade-offs are worth it. It might feel like AI exists out there in the cloud, pinging our computers and phones when we need it, and it's not wrong. However, as we all know, the cloud isn't just floating up in the sky. AI's cloud is built of metal and silicon. It's powered by energy, and every AI query that comes through is a cost to the planet.
A team of 1,000 researchers joined together to try and address this growing concern. They created an AI model called Bloom, which stands for Biologically Localized and Online One-Shot Multitask Learning, that emphasizes ethics, transparency, and consent. They discovered that training this environmentally friendly model used as much energy as 30 homes in one year and emitted 20 tons of carbon dioxide. In comparison to a behemoth like ChatGPT, Bloom is small potatoes. So, AI researchers assume that bigger models like GPT use at least 20 times more energy. The exact number remains a mystery, though, because tech companies aren't required to disclose information on energy consumption. Not to mention that the current trend in AI follows the rule of "bigger is better."
Large language models like ChatGPT and Google's Gemini grew 2,000 times in size over the last five years. With that growth comes inevitable and often undiscussed environmental impacts. One of these environmental impacts is the amount of energy that computers need to process the large volume of information required to run these AI systems. Most of this energy is sourced from non-renewable sources, which is only worsening our climate crisis.
If you want to do something about the climate crisis, then you should check out the sponsor of today's episode: Solar Slice. Solar Slice is a startup that lets you fund the construction of large-scale solar farms, accelerating the transition to clean energy. All you need to do is sponsor a slice of their large-scale solar farm—a solar slice—which adds 50 watts of solar to the grid and reduces harmful emissions. To measure just how much impact you're making, their app allows you to track real-time data on your slice's energy production and carbon savings. As your slice generates clean energy, you earn Eco points, which you can then use to buy more slices, plant trees, or fund other meaningful climate-friendly projects. To make even more impact, you can share your progress with others, create group impact goals with friends, or send solar slices to your eco-conscious friends as gifts. To learn more, visit solarslice.com. There you'll find a link to their Kickstarter campaign, which will help fund the construction of their first solar farm and the development of their app.
Back to our story: on the other hand, the growing copyright issues surrounding how these AI models are trained have been discussed extensively. Simply stated, copyright law protects intellectual property and content from being used or sold without permission from the copyright holder. Until recently, the implications were relatively easy to define and prosecute when necessary.
With AI, it's a different story. Recently, OpenAI was called out for using YouTube videos to train its models. These large language models need massive amounts of data to work effectively. Yes, it's important that they can answer simple questions like what temperature to cook chicken at, but perhaps more importantly, they need to be able to generate coherent, human-like sentences. But how do they learn to talk like a human? From other humans, of course. But is it ethical or legal for a company like OpenAI to scrape online sources like YouTube that might not approve of such scraping?
OpenAI reportedly used its audio transcription model, Whisper, in an attempt to get over the hump of hazy AI copyright law. The model transcribed files from YouTube videos into plain text documents, creating the data sources needed to train its AI chatbots. Whisper transcribed over a million hours of YouTube videos uploaded by millions of users, some of whom derive part or all of their income from creating content on the platform. OpenAI knew this was legally questionable but believed they could claim it was fair use of online content. OpenAI president Greg Brockman was hands-on in collecting videos used in the training, and the company maintains that it uses publicly available data to train its AI models. The scraping violated YouTube's rules, which ban the use of content for applications independent of the site.
Interestingly, Google, which owns YouTube, knew about OpenAI's actions but didn't report them because they are allegedly doing some content scraping of their own for the Gemini AI model. YouTube isn't the only company that's pushing back against AI training. In 2023, the New York Times accused OpenAI of stealing intellectual property and sued both it and Microsoft, OpenAI's financial backer, for copyright infringement. With this move, the Times became the first major American media organization to sue an artificial intelligence company over its content being used to train chatbots. The suit called for companies like OpenAI to destroy chatbot models and training data that is copyrighted New York Times material. It's the first test of legal issues around generative AI technology and could have major implications for training large language models.
While the Times understandably has issues with its catalog of 13 billion articles being used without permission, News Corp, which owns the New York Post and the Wall Street Journal, has taken the polar opposite approach. As of May 2024, the company has a multi-year licensing deal in place reportedly worth $250 million that grants OpenAI access to much of its content. OpenAI has also inked deals with Fox Media and The Atlantic, perhaps out of the harsh reality that artificial companies like it will be facing moving forward.
All of the major players creating these massive language model AI programs are starting to hit the limit of data available to train them. Google now has a deal with Reddit to license content from the website to train Gemini. Meta even considered buying book publisher Simon and Schuster and its 100 years of material outright so it could get access to all of its content.
While these companies fight it out over who gets access to what, there are real implications for the people who create this content. Visual artists, musicians, and writers are watching their work show up in AI texts and images. This happens when an AI is trained on certain texts and images and learns to identify and replicate patterns in the data. When the program is meant to generate music, art, or text, the data it trains on has to be created by humans. Notable authors like Jonathan Fran and George R.R. Martin, and John Grisham filed a lawsuit after learning that AI had absorbed tens of thousands of books. Actress and comedian Sarah Silverman sued Meta and OpenAI for using her memoir as a training text.
Just like chatbots, it's difficult to identify what art has been used to train these models because companies like OpenAI, which owns the popular image generator, don't disclose their datasets. Others, like Stability AI, which owns the generative AI model Stable Diffusion, are clear about which data they’re using, but they are still taking artists' work without permission or payment. The legal recourse for artists is difficult. Experts are of two minds; some feel that this type of AI training infringes on copyright law, but others feel it's still above board and that the lawsuits will fail. The truth is that nobody knows because we’re in uncharted territory.
That once seemed like merely the subject of science fiction movies. In the 2013 Spike Jonze movie "Her," Joaquin Phoenix's character falls in love with an AI virtual assistant voiced by Scarlett Johansson. Eleven years later, life is imitating art after OpenAI announced a new personal assistant called Sky. It was easy to notice that her voice sounded a lot like Johansson's. Sam Altman, the company's CEO, has noted that "Her" is one of his favorite movies. Turns out he'd been courting Johansson to voice the new AI assistant, but she declined the offer after hearing Sky's voice. Johansson threatened a lawsuit against OpenAI.
For actors, politicians, athletes, or anyone else in the public eye, it's easy to see how AI could completely upend someone's life if their image, voice, or likeness is replicated. That upending is already happening right now. While it is clear that AI companies are knowingly pushing the limits of copyright law, they’re also inadvertently causing even more harm. Whether the companies are intentional about it, AI models are inevitably trained on the discriminatory data littered across the internet. AI models encode patterns and beliefs representing racism, sexism, and other prejudices.
If these biases are deployed in settings intended for use specifically in law enforcement, they can lead to tangible damage to innocent people. For example, if AI models are shown more images of white faces than darker skin tones, they will have more trouble identifying features of dark-skinned people. If police use AI to try and catch criminals, the odds are higher that their systems will mistakenly identify dark-skinned individuals more often. Or if AI is used to generate forensic sketches, the model will take all of the biases it has been fed and spit them back out in the sketch prompts, like "gang member" or "terrorist," which will inevitably whip up a stereotype that could totally be off the mark.
The implications in law enforcement are easy to see, but they’re also much further reaching. In healthcare, computer-aided diagnosis systems have returned lower accuracy results for black patients than white patients. In job applicant tracking, Amazon stopped using a hiring algorithm after it saw that the algorithm favored words like "executed" and "captured," which were more often found on men's resumes. AI biases perpetuate human societal biases and can come from historical or current social inequality. If you ask an AI to generate an image of a scientist, it'll most likely show a middle-aged white man with glasses. What does that say to young girls of color who want to be scientists? These missteps foster mistrust among marginalized groups and could lead to slower adoption of some AI technology.
The ethical issues aren't solely embedded in the training and use of these models; they're happening right here in the physical world as well. Content moderation is a famously difficult job. People sift through some of the worst images, descriptions, and sounds on social media platforms, online forums, and retail sites. They ensure that disturbing scenes don’t wind up on our screens or in our ears. AI might be getting smart, but it doesn't self-moderate.
Time Magazine did a deep dive into a company called Sama in January 2023. Sama provided OpenAI with laborers tasked with combing through some of the worst extremist, sexual, and violent content on the internet to ensure it didn't end up in the AI training regimen. Former Sama employees said they suffered post-traumatic stress disorder while on the job and after sifting through these horrific things. To make matters worse, employees, mostly located in Kenya, were paid less than $2 an hour. The company claimed it was lifting people out of poverty, but the Time article described claims of the work being torture. Individuals regularly had to work past assigned hours, and despite some wellness services offered to them, many experienced irreversible emotional effects.
The narrative that AI can eliminate workers is true, but the workers it takes to make AI possible are still suffering. So, what’s the solution? Is there one? For artists, a company called Spawning created a tool that can help them better understand and control which art ends up in training databases. The company Stability AI does train its models on existing text and images available online, but it's looking at ways to ensure that creatives are paid royalties for using their work. Another tool called Code Carbon has emerged, which runs in parallel to AI training and measures emissions. This might help users make informed choices about which AI model to use based on how sustainable its operations are.
These are important and worthy starts, but no single tool can solve such complex issues. By creating tools that can measure AI's social, legal, and environmental impacts, we can start to understand how bad these problems are. This hopefully can lead to creating guardrails and advising legislators on how to develop new regulations on artificial intelligence. It might feel like AI is moving quickly, and that's because it is. The existential worry about robots taking over is a fun and scary one to entertain.
However, we do have real issues centered around our potential digital overlords happening as we speak. It's not too late to find ways to create an artificially intelligent world that we all want to live in, but users and companies alike have to decide that path together.