Hello /r/antiwork, I'm a software engineer with an interest in ethical computing. I've worked at several silicon valley startups and my education has included several exposures to the field of machine learning. I'm here to explain why generative AI is shitty

What is Machine Learning?

Machine learning is a fairly broad field of computer science that at its core is about leveraging data and statistical techniques to produce models that can make useful predictions about the world. Some examples include things like collecting data about your typing habits on your phone to learn how you type and make better predictions, discriminating spam from important emails, and predicting the weather. In general, many of these models are very useful and generate real value for society in a positive way.

Machine learning has actually been around since the 50's, but computing power, data infrastructure, and distributed computing techniques have caught up to the models in the past two decades and research and funding into the field has exploded.

Machine learning does not mean the computer can learn in the traditional cognitive sense that humans do. Human cognition is still fairly poorly understood. As impressive as machine learning models are, they do not understand context or domain knowledge, and they are not capable of interpreting their results in the general intelligence way that humans can. You may have heard about techniques called neural nets or deep learning. It's true that these models took inspiration from naturally occurring neurons, and that they can exhibit emergent behavior when you place a lot of these simulated neurons in a multi-layer model, however, the neurons in these models are based on a more rudimentary model of neurons called the perceptron from the early stages of AI research. Human neurons are far more complex, as well as more energy and space efficient.

When we say a machine learning model is learned, we mean the model has a fixed set of parameters that specifies the model based on a cycle of learning iterations. Most of the time, when you're using the model, the model is not changing in response to new input. If you have new data, you usually have to retrain the whole model. Once you have the model, it's essentially a very sophisticated black box that takes input and gives you some desired output. We understand how to make these models pretty well now but the interpretability (how easy they are to understand) is still quite lacking.

What is Generative AI and why is it happening now?

Generative AI is a machine learning system that takes an input (like a prompt) and produces some creative output (like an image) based on the model it has. These models have been trained on sometimes billions of examples. The outputs can include computer code, artistic images, fake photographs, pictures of real people, or some written work. I will reiterate that these models don't understand what they're producing. They don't have any capacity for interpretation, nor are they self aware.

To go back to the point about training data, the reason a lot of these models are coming out now is that not only has there been a lot of research into how to build these models, but today there's a ton of infrastructure to support training these models. Companies like OpenAI and Stable Diffusion and Github collect data from the internet (this is also called scraping), then use that data to train these models. Bigger datasets mean better, more accurate, and impressive models. More infrastructure and understanding of how to build distributed systems gained from decades of cloud software at many different companies also means these companies have the techniques and knowledge to train these models faster, and to distribute them.

There's also been a lot of interest from investors now that high profile models like Dalle2 and GPT3 are publicly available.

Why you should care about this technology in the context of labor

OpenAI has raised 1 billion dollars in venture capital while Stability AI has raised about 100 million dollars. Venture capital and large corporate interests want this technology to succeed, because they see the “value” in it.

The value being, of course, that they can do more with fewer workers. I was listening to a podcast recorded by a few of these investors, and the main thing they were excited about was cutting their design staffs. These systems are a very real threat to workers like artists, graphic designers, journalists, writers, programmers, etc. Shareholders will demand that companies use these systems to reduce staff and save money.

Here's the most perverse part about these systems: none of it works without prior work. Those prompt to image machines require the works of millions of artists to exist. Without their work, there is no generative AI. OpenAI is already monetizing Dalle2, selling access to the system. Artists are not credited or compensated for their work, and the AI launders any association its output has with that work.

What now?

There has been some chatter about allowing artists to opt out of making their work available to these companies through a system they set up. I don't think this goes far enough. These companies ought to be required to publish their training and show that they have explicit permission to use each and every image in their training sets.

Some will argue that this is fair use because it's transformative, or that art is publicly available and so is fair game, or that there's no way to prove that prior work is represented in an output image. I believe copyright laws are currently insufficient, and need to be amended to consider generative systems and protect creative workers. The people who wrote the copyright laws did not envision a future where a computer system could produce thousands of works a day.

Additionally, lawsuits are already emerging in this space. Microsoft is being sued for reproducing open source code from co-pilot without proper attribution or license compliance. What this means is that there are certain open source licenses on publicly available projects in github. Code from some of those projects can be used in any project, public or private, free or monetized, but the license requires you to provide an attribution to the original repo. Other licenses require projects that use their code to be open-sourced, or it's illegal to use. Co-pilot can be demonstrated to reproduce code from these projects verbatim without proper attribution. The legal council at my company has advised us not to use code from co-pilot because we could get involved in a legal dispute.

I believe cases like this will shape the law surrounding these systems, and we should all be aware of it because it's going to have a huge impact on our lives and we should push to protect the creative workers VCs are trying to replace with soulless computers.

Recent Posts

Recent Comments

Archives

Categories

What is Machine Learning?

What is Generative AI and why is it happening now?

Why you should care about this technology in the context of labor

What now?

Leave a Reply Cancel reply