🥼 Crushing It
The limitations of generative AI’s new UI paradigm — and pinning a computer to your chest (also Star Trek)
Hello and welcome to this week’s mind meld. Firstly I just want to thank all of you for reading Horrific/Terrific; for those of you who have been here since the beginning, it’s changed a lot over the years and I’m glad you’re still here. It became really exhausting being a kind of loose aggregator of tech news, and also quite boring. For me, this new direction is useful because it’s more interesting and challenging to write, and forces me to think a lot deeper about tech policy etc.
The feedback I’m getting suggests that people are mostly into it — and I’m getting more unsubscribes than I used to, which to me is a good thing because it shows that this newsletter actually has a clear message and direction now, enough that certain people have realised it’s not for them. Anyway on with the main thing!
Last week a company called Humane (who I’d never heard of until now) released what they’ve called an ‘Ai Pin’. It’s a small AI-enabled device that you pin to your clothes like a civilian wearing a police body camera. The idea is that it completely replaces your phone. There is no screen, but it can project a simple UI onto your hand or other surface, and it’s all powered by an AI assistant which is built on GPT-4.
I have a lot of thoughts about this so I’ll do my best to unpack them in a coherent way. Firstly, one of the main apparent draws of this gadget is that it’s not invasive like a normal phone, because you don’t have to pick it up and look at it — it’s just pinned to your chest. I’d argue that something that is always there and attached to you like an appendage is just as invasive as anything else, but okay. Furthermore, I honestly think the only people who will willingly walk around with these visibly fastened to their clothes are startup founders who’ve never once suffered embarrassment, and are so thoroughly embedded in the ‘tech is everything’ life that they’ve lost touch with reality.
But the main thing here really is how this pin represents an intentional (but not necessarily good) shift into a new UI paradigm: we are moving away from command-based interactions where we tell machines to complete a series of tasks in a certain order to achieve a goal. With the addition of AI assistants or agents or whatever they end up being called, we are getting closer to more intention-based interactions, where all you do is tell the machine what you want, and it provides.
So, with command-based interactions, you give a computer or device a series of commands: to open a web browser, navigate to a supermarket website, select items, and then you tell it where to deliver the items and which bank account to take the money from. With an intent-based interaction, you simply say ‘buy these groceries’ and all those actions just happen, and you don’t necessarily know — or have to know — the minute details of the steps the AI took to get you your groceries.
With intent-based UIs apps become irrelevant; the AI agent simply engages with a range of services that would have made up the backend of whatever apps you used to use. So, with the above example, there is no more supermarket app, and maybe even no more supermarket — but none of that will matter to you. The only things you need to know is how much your groceries are, and if the ones you want are even available to you.
I think that the single monolithic interface which ‘just works’, and obscures an entire buffet of background tasks appears to be the end goal among many techno-capitalists. I recently read a Cory Doctorow piece where they explained the key differences between websites and apps, where websites are places anyone can visit, and you can bookmark them, look at the underlying code if you want, and share them easily. Whereas apps are more like this:
“Apps are ‘closed’ in every sense. You can't see what's on an app without installing the app and ‘agreeing’ to its terms of service. You can't reverse-engineer an app (to add a privacy blocker, or to change how it presents information) without risking criminal and civil liability. You can't bookmark anything the app won't let you bookmark, and you can't preserve anything the app won't let you preserve”
The addition of AI agents to this kind of ecosystem only encloses it further. A system where all voiced ‘intents’ lead to a desired output (or maybe not desired, because we are humans and intentions change), leaves very little room for experimentation. This shift to intent-based UIs exemplifies the undying urge to optimise for ‘the one best way’; when a machine appears to work effortlessly with only a simple instruction from you, it’s because a company or other small pool of people have all gotten together and decided — for you and everyone else — what ‘good’ looks like.
On a technical level, this makes sense, because LLMs are notoriously difficult to control on the user side, which means companies need to pare-down what models and tools can do, so that there’s less room for rubbish ineffective outputs. But on a non-technical level what this does is flatten everything and make it hard to imagine use-cases outside of like getting an AI assistant to summarise your inbox for you, which honestly all that AI Pin looks like it can do — and you don’t even get written summaries because it’s all voice-based!
I’m not at all saying that the introduction of this AI pin — seemingly designed only for Silicon Valley VCs and divorced men — will necessarily change anything. The shift to intent-based UIs could facilitate transformations as major as the introduction of roads and public transport, or they could be as nothingy as vending machines. And seasoned readers will know that the level of transformation does not depend on the technical capability of AI systems, but rather on political will and corporate power.
I also think a UI shift which relies solely on natural language inputs is very unlikely. After all this time, we’re still stuck on the idea of making elements of Star Trek a reality, where pretty much all day-to-day activities are conducted via voice commands. One of my clients recently wrote about The End of Prompting, where he explains that it’s possible that in the next year or so written or spoken prompts will no longer trigger outputs from machines. E.g. you will not have to ask an LLM ‘hey can you make this document sound more professional’. Rather, you’ll just use a slider and crank it all the way to the ‘professional’ end of the scale.
However, if you look at OpenAI’s recent product announcements they are really pushing hard on prompts: you can produce a fully functional custom GPT using only natural language. But, as mentioned above, because of predetermined guardrails deciding what ‘good’ looks like for you, the possibilities of what you can make are significantly narrowed. Therefore, the ‘impressive’ part of these tools is not necessarily what they can do, but that the only way to instruct them is with natural language. They’ve basically made a drag-and-drop website builder but instead of dragging and dropping you just type out ‘can the profile pic be a cute cat’, and also it doesn’t even really build a website.
I mentioned Star Trek above because I do think that early modern conceptions of sci-fi futures have played a part in what we consider important to aim for these days. I also mentioned it because the concept of prompt engineering, or using and tweaking natural language to get a desired output, really reminds me of this one episode of The Next Generation: Dr Crusher accidentally slips into a reality created by her own mind, and she panics as every crew member on the ship slowly disappears from that reality. Eventually, it’s just her alone with the ship’s computer, and all she can do is talk to it in order to figure out what’s going on. She quite literally uses a chain of prompts to understand the limitations of the reality she’s in, and the limitations of the computer itself. It sort of went like this:
She asks the computer to read out the full roster of crew members, and the computer answers only with ‘Dr Beverly Crusher’.
Then she asks what the purpose of the starship Enterprise is, and the computer tells her that it’s to ‘explore the universe’
Finally, she asks if she alone possesses the skills to explore the universe, and the computer is unable to answer — and this is kind of the first breakthrough moment for her in the episode.
I really enjoy this example because there are parts where she gets frustrated with the computer’s answers, and other parts where she asks rhetorical questions out loud while she’s trying to work things out, and the computer attempts to respond, moving Dr Crusher to snap and say ‘I wasn’t talking to you!’ It was cool to see that even in the limitless tech-utopia of Star Trek — which was written before home computers were a thing — a character became frustrated because the computer was not behaving in the way she was expecting.
This is very similar to how we experience technology now; when a device doesn’t do what we want within it’s narrowly scoped capabilities, it’s very annoying. And then we feel lost and powerless. Dr Crusher did not escape her alternate reality because the computer was helpful — she escaped because she was being smart, and therefore understood the limitations of the technology around her. Dr Crusher eventually knew, unlike the computer, that she was trapped in another reality. So from now on, the refusal to take machine outputs at face value should be referred to as ‘Crushing It’. Thank you and goodbye.