My timeline is blowing up with reactions to the latest Gemini release from Google. I think Google can take care of itself (and kudos for releasing the open-source LLM Gemma). This whole discussion though is yet another example of a wilful category confusion driven by our fascination with all things gen-AI.
The issue: Gemini returning "diverse" images in response to prompts and getting either celebrated or dunked on for this. I’m not going to link to the hysterics, but in summary, there are strong reactions to for e.g prompts asking for pictures of founding fathers and the system returning imaginative outputs depicting non-white-male people doing things like signing the declaration of independence etc.
I see two problems - one new and one old - that are lurking underneath, which also explains why it feels like Google gets dunked on more than others.
The old
Google has always promised that it's a search engine that delivers the world's knowledge. In other words, that it gave us a way to ask questions for which there were well defined answers, as if we were looking up facts in an encyclopedia.1
When the alternatives were SEO hacks, Google's promise was relatively true. And it worked ... for a while. But Google search has always really been a recommendation engine rather than an encyclopedia lookup - the content it delivered was driven by advertising, engagement etc. We continue to tell ourself PageRank fairy tales while hugging our linear algebra textbooks, but what Google considered relevant was always based on a WEIRD conception of the most appropriate response to a query.
We've learnt to either ignore this distinction or work around it via more specialized interfaces. But it's always been that way. I bring this up because the older debates about why a Google search for "CEO" returns pictures of white male ceos centered on the gap between our expectations and desires for an interface in which we hoped to see the knowledge we wanted, and Google’s lack of interest in any deeper epistemological exploration of what an answer to a question should be.
The arguments right now about Gemini outputs have this flavor. The people upset about this are upset because they think Gemini is woke-washing the truth, and they only want the truth. The fact that Google has never promised “the truth” is irrelevant - it’s that the choice of what Google presents via Gemini has shifted away from people’s expectations.
I have always had mixed feelings about the “images of white CEOs” arguments. I’m not sure I want a search engine interfering with my query results. But if we accept that all search results are really curated recommendations, then I really do want transparency in the decision making that led to the presentation of results or at least some way to “prompt engineer” to steer away from potentially skewed results. By now we are used to doing this with search engines - we prompt engineer all the time to get to the results we are really looking for.
None of this goes away just because we have a new “sorta kinda search engine” in the form of Gemini. But the issue gets even more complicated because ….
The New
LLMs aren’t even remotely like search engines. For all my complaining about Google search above, there is at least some explicit attempt to determine relevance of the search results to the query. It’s a strong signal amongst all the signals used to produce the final ranked list. There’s a form of grounding via keywords, other fancy information retrieval technology, and so on. And above all, there’s verifiability: when a google search produces results, you go and click on a link and see for yourself.2
LLMs don’t do explicit grounding. Whether you want to call them stochastic parrots or not (and btw parrots are crazy smart :)), they are engineered to complete prompts - their actual behavior is a miracle of training, neural networks, and our deep desire to make sense of and humanize the entities we interact with.
But Google, and OpenAI, and Microsoft, and everyone else, really really want you to use LLMs like a search engine, and will create all kinds of interfaces to make you think of them that way. If we thought of an LLM as an automated content generator3 and asked it to produce representative images of Founding Fathers, we might be less likely to complain when it produced images that we didn’t expect, and might instead be happy at the creativity.
But a tool that gives you creative responses is a toy. The tech companies are not in the business of making toys. They are in the business of solving Problems, generating Information, and creating artificial Intelligence. They will RAG their way to something that can resemble search, and they will fail to inform you of the lack of reliability of the LLM backends powering the search engines.
It’s all epistemologically shaky turtles, all the way down.
I don’t see these debates getting any clearer any time soon. We will continue to expect LLMs to behave like search engine (they’re not) and will continue to expect search engines to give us the truth we want (they won’t).
I hope that we can use this moment of flux to think more creatively about the interfaces that would be actually useful and meaningful to us. And that provide more honesty about what these interfaces can and cannot do.
I realize that in certain academic circles it’s required to put ‘fact’ within scare quotes. But humor me - this is just a post.
The continuous attempts by search engines to integrate and inline information into the search results rather than letting you go to links undermines this verifiability.
I recently proposed at a meeting with journalists that we should just call all such systems ‘automated content generators’ - it hasn’t caught on yet!