I am currently spinning in my office chair and thinking about all the ways in which Sam Altman could leverage public celebrity outrage so that no one ever stops talking about OpenAI. I find it not surprising, and actually quite infuriating, that his favourite movie is Her, and that he is doing everything he can to make us befriend his virtual assistants, just like in the movie, and that he should come out and say that he just wants to fuck a disembodied female-sounding AI voice.
I’m not sure what the Scarlett Johansson complaint does other than prove that outrage is as effective as wonder in convincing the masses that generative AI systems are worth paying attention to — even though we don’t really know what they’re for yet (use your imagination while we waste billions on GPU production, I guess?).
The Content Production Cinematic Universe is morphing into something that’s hard to recognise right now: the demands of generative AI appear to be creating a funnel through which all content must flow, and its final destination is a sad waiting room for training data. With every passing keystroke, I feel less like a consumer of the internet and more like a production assistant whose addiction to posting things online is now, to those with power, indistinguishable from labour.
First, we must look past how ScarJo’s precious voice was underhandedly harvested for our pleasure (she is not the victim here, I don’t care how pretty she is), and focus directly on Google’s new AI Overviews, which are saying that glue is definitely a pizza ingredient, and that you should definitely try it (which Katie Notopoulos did, god bless her). This proves, once again, that generative AI models want nothing more than to try and be useful to you, even though they are inhuman (no taste buds) and cannot take a joke (glue will not affix cheese to pizza, actually).
These AI overviews are of course, hilarious — but not on purpose. Just once I would like AI to generate something that actually delights me, rather than inadvertently serve me rage bait. There is a second-order function to these AI overviews, though: they allow us to tap faintly on the outer casing of Google’s base of knowledge, and then see that cracks appear quite quickly, and in fact your entire arm breaks through with tremendous ease, revealing there is literally nothing in there. Because of course there isn’t; Google is just a middle-man service that tells you what websites to visit — but now it’s not even that.
The model Google is using for these overviews is either not fit for purpose, or the pre-prompt is bad — but this is kind of besides the point. This is just a bad feature and it never should have made it off the production line. Is Sundar Pichai embarrassed? Or just malnourished? Maybe he should go and eat a rock about it, as per The Onion, a respectable publication full of truth and wisdom; perfectly fine to take out of context.
Another by-product of Google’s Very Bad Feature is the unrelenting panicked wails of SEO experts. What if… no one ever clicks on a website again? As I’ve covered before, clicking on websites is the foundation of the internet’s economy (also porn… which you access via, uh, websites — oh no!).
Fear not, disgruntled clickbait consumer: websites will live to see another hype cycle. OpenAI are striking deal after deal with a number of online publications: Vox Media and The Atlantic (and others) can stave off irrelevancy for a little while longer by submitting their content to the training data waiting room. A sad place where ideas go to die; where all data will end up one day, because apparently all content forever more should exist downstream of an AI model — a conveyor belt of garbage leading to another conveyor belt of garbage eventually leading to… your eye balls.
No one is safe from generative AI’s empty promises. End users are sold the idea that they can produce anything at the click of a button, and media companies are sold the idea that a fragment of a sentence that a human staff writer wrote once might appear in a hallucination one day. Everything is upside down. Elon Musk is reciting passages from the communist manifesto as if they are his own ideas. Late capitalism may have finally created all the conditions we need for fully automated luxury communism. I’m so happy.
Back to being serious for my final observations: if we accept that every online person (whether they mean to or not) is producing content that will eventually end up in training data — we should be aware that this process is both open and invisible. It’s open because, for some reason, generative AI companies have decided that absolutely all of the internet is up for grabs, even copyrighted material. It’s invisible because we have no idea how the models work, and how they ‘understand’ the data, and how they ingest it and spit it back out as something that vaguely makes sense.
This is a tension (between open and closed) that presents itself quite a lot in technology. I’ve been doing a lot of work for Computer Says Maybe recently, and they just released a podcast where Alix Dunn interviewed Kate Sim, a former Google employee who got fired for protesting project Nimbus. This was a project in which Google were providing infrastructure for the Israeli military — but they did not make their workers privy to this. Naturally, they were outraged when they learned their labour was going towards genocide.
On the other end of the spectrum, I am also in the middle of a research project on open source; one of the experts I interviewed very aptly pointed out that not everyone is cut out to work in the open, because they may not be comfortable with some of the ways people use their technology — and the nature of open source means they have little control over that.
These two examples obviously don’t map perfectly onto the way we passively produce training data, but I think having no control or knowledge over how value is extracted from your efforts and digital vapours is typical of contemporary technology; it’s the idea that everything we produce is neutral, and part of an entirely organic ecological system, where we act as custodians and interpreters. In actuality this is all by design, and what we are witnessing is a series of intentional strategy decisions, that are now playing out in a sort of world wide testing ground.