UX for AI assistants application

Executive summary: With the absence of an artificial general intelligence equivalent to human intelligence, there is no such thing as “UX for AI”. Different AI subfields and applications demand distinct UX considerations for the best user experience. Conversational interfaces are creating a shift from traditional graphical user interfaces, prompting designers to enhance interactions, embrace multimodality, and explore AI applications beyond current boundaries.

As long as there is no artificial general intelligence with a level of intelligence equal to that of a human, there is no such thing as “UX for AI”. Currently, available artificial intelligence encompasses a broad spectrum of subfields and has been developed to focus on specific applications. Each requires its own consideration in terms of the most suitable user experience.

For example, the use of computer vision (subfield) in self-driving cars (application) presents a very different set of design challenges from the use of machine learning in stock market predictions. In this article, the discussion will focus on the UX for AI assistants application, enabled by natural language processing.

No more clicking: beyond traditional interactions

The launch of Chat GPT has led many to contend that we are on the brink of a significant paradigm shift. After the command line interfaces (CLI) and graphical user interfaces (GUI), we have now approached the era of conversational interfaces (CUI) where we interact with computers as we would with humans. Bill Gates writes in his blog “Your main way of controlling a computer will no longer be pointing and clicking or tapping on menus and dialogue boxes.” This concept, while not new, is evolving with advancements in natural language processing, making text-based user interfaces (known from chatbots) and voice user interfaces (used in voice assistants such as Amazon’s Alexa and Apple’s Siri) capable of generating coherent and contextually relevant responses on any topic.

Enhancing discoverability and interactions

What is the problem then, you may ask? Let’s compare GUI to conversational ones. Traditional GUI makes it easy to understand what actions can be taken within an app by using a set of repeatable visual clues. The majority of people know what action will follow when they click on a trash icon and the text label “Delete”.

CUI, by enabling users to ask about anything, can lead to confusion about what the system is capable of. In simple terms, conversational interfaces are great for findability, which means users can easily locate the functions or content they believe the system should have by simply asking. However, these interfaces fall short in terms of discoverability, which refers to how effortlessly users can come across functions or content they didn't know existed.

It's beneficial to guide users in conversations through text or voice prompts. These prompts can encourage users to ask certain questions or issue specific commands, with the level of guidance tailored to the app's intended focus. For a versatile, open-ended assistant, offering intriguing prompts at opportune moments can inspire users to think more broadly. However, for an AI-based diagnostic app, more precise prompts are crucial to ensure the system collects all necessary information.
We are seeing effective prompting techniques already being established. For example, using clickable options, or “chips”, at the end of a chat response helps guide users by presenting them with relevant next steps. These chips can suggest actions (“Share your results”), request specific details (“Enter your location”), offer choices (“Save as pdf” or “Save as csv”), or confirm decisions (“Confirm order” or “Cancel order”), all within the chat's context.

Specialized GPTs for more effective communication

Understanding the capabilities of a general assistant can be challenging, much like the initial awkwardness of small talk. This is because we often lack sufficient knowledge about someone's expertise and interests to engage in meaningful conversation. To address this issue, OpenAI introduced specialized versions of ChatGPT, known as GPTs, each designed for a specific purpose. For example, “Laundry Buddy” is an expert in all things laundry, from removing stains to sorting clothes, while “Tech Advisor” specializes in helping you set up your printer. By focusing these Large Language Models (LLMs) on specific areas relevant to the user's current needs, they become much easier to understand and use.

Establishing trust with AI

A “black box” system is one where the inputs and outputs are visible, but the internal process remains a mystery. For example, in ChatGPT, users can see the questions they ask and the answers they receive, yet the method of arriving at these responses is hidden. This opacity makes it difficult for users to judge when ChatGPT might produce errors or misinformation. To enhance transparency, what if we tilted the box a little bit? This means pre-prompting the assistant to explain its reasoning or the steps it's taking to answer a question, similar to showing its “chain of thought.” It could also reference the sources of its information, blending the clarity of code comments with the credibility of academic citations. This method aims to make the assistant's responses more understandable and trustworthy.

Embracing multimodality

The future of CUI extends beyond text and speech alone. We must remember that in some scenarios, the traditional clicking is much quicker for accomplishing tasks, especially when it involves standard actions like saving the response generated by an LLM.

That's why integrating a simple input field used in a chat with additional graphical controls where necessary is a sound approach. And if there is no screen – add one. We have already witnessed the first attempts of voice-first AI devices, such as the Humane AI Pin, a small wearable that attaches magnetically to clothing. Despite its small size, the pin is actually a multimodal device. In addition to talking to it and hearing the answers back, users can also turn their own palm into a graphical user interface thanks to a technology called “laser ink” and navigate through it with gestures. A similar device, the Rabbit R1, which presents itself as a modern reinterpretation of Nintendo's GameBoy, also qualifies as a multimodal device with its screen navigable via an analog scroll wheel.
So if there is one takeaway from this article, let it be the following - let’s finally decouple AI capabilities (like generating text) from the interface used to present those capabilities to the user. Chat GPT's success has led us to closely associate natural language processing with chat interfaces. However, these capabilities can be embodied in various forms, from a chat-based co-pilot integrated with the main graphical user interface to a voice-first interface or a Word-like AI text editor devoid of any conversational elements. The choice is ours to make.

Key terminologies and definitions:

UX for AI: The design of user experiences specifically for artificial intelligence applications, tailored to the unique requirements of different AI subfields.
Conversational user interfaces (CUI): Interfaces that allow users to interact with computers through natural language, either typed or spoken.
Natural language processing (NLP): A field of AI focused on enabling computers to understand, interpret, and generate human language.
Findability: The ease with which users can locate a function or content within a system.
Discoverability: How easily users can find new functions or content they were previously unaware of.
Large language models (LLMs): Advanced AI models capable of understanding and generating human language based on vast amounts of text data.
Black box system: A system whose internal workings are not visible or known to the user, only the input and output can be observed.
Multimodal devices: Devices that support multiple modes of interaction, such as voice, text, and touch, offering a more versatile user experience.