How to Successfully Integrate Language Models into Your Application
Applications are integrating language models left and right as the AI hype cycle churns on. Some of these features are useful, but a surprising amount aren't helpful at all. Let's fix that.
Bouvet Island is a volcanic land formation located approximately 1,100 miles (1,700 km) north of Antarctica, and 1,600 miles (2,600 km) southwest from the southern tip of Africa. It is considered one of the most remote places on the planet, given its location and the harsh difficulty of access.
Uninhabited and desolate, Bouvet Island might be one of the few places on Earth where you might be able to avoid hearing the term "AI". Since 2022, this buzzword has embedded itself into virtually every aspect of the tech industry. Investors want to know what the "AI Strategy" is, and companies are rushing to embed Language Models like GPT4 into their applications.
This is because the industry sees that there's practical value in this technology. However, many have realized that these new integrations have not actually increased their productivity. You can't log into a new application without being prompted to "Check out our latest ✨AI Feature", and yet, the feature often doesn't improve productivity.
Many factors contribute to this issue, however, chief among them is plain bad design. Making a feature that is actually valuable to an end user is hard. This is especially true when new technologies are involved. Today we'll be discussing how to integrate language models effectively, through understanding what they're actually good at, and relying on tried-and-true design philosophy.
Thinking about "AI"...
If the marketing teams got their way, then you might believe that language models are magical machines that can think at a superhuman level. We're on the brink of "Artificial General Intelligence", and pretty soon AI will take all of our jobs, run the world government, and shovel food into our mouths like babies until we poop out paperclips.
However, as much as companies might love for you to believe that they're on the brink of humanity's future, this is probably not true. Current models are excellent at many tasks, and automated reasoning is more advanced than ever. But these models don't act or think like us, and they certainly aren't as intelligent as us.
I mention this because when you're integrating a technology, one of the quickest ways to fail is misunderstanding what it is. Language models aren't automated people. They are simply programs–albeit complex ones–which take input and produce output. See through the magic to understand the value.
For this specifically, I'd recommend learning about Transformers, which I've written about here, as a digestible starting point. Additionally, you might check out my post about Prompt Engineering, which aims to cut through the noise and explain what it actually is. However, when working with language models, it really boils down to this:
Language models fit to the patterns of natural language, and are astute at approximating the likely next word
This is to say, when you give a model a prompt, it's trying to predict what the next token in the sequence would be. It does this again and again until eventually the next token is likely to be "stop", at which point it does so. Predicting this next token is done with the help of fancy math, which essentially condenses a massive set of language examples into an educated approximation. It doesn't stop to think.
Since the model is trained on actual human language, it picks up on patterns within that language. Language is made up of patterns. Logical explanations, poetry, humor, etc. are all patterns which language models have access to. You can think of language models as excellent at predicting what a continuation of some text might look like.
Strengths and weaknesses...
I'd like to clarify that what language "looks like" isn't the same as what it is. That is to say, language models can create illusions. In fact, language models are so good at creating illusions that we have a term for it: "Hallucination".
There's a good chance that you've encountered this phenomenon before. If you've had a chat with ChatGPT, then you've likely asked it a question, and received a fluent and accurate sounding answer... that was wrong! This is because the model is trained to predict what language should look like, how it should sound, the nuances of the voice–but without the underlying human mechanisms.
(Check out this article for some specific examples of hallucination.)
This proclivity to hallucinate fuels many of the fundamental weaknesses of language models, and therefore the features that we create with them. They simply aren't 100% accurate. To be fair, humans aren't always accurate either. But at least with us, we usually know when we don't know something. With language models, they'll simply make something up when they don't know, because they don't think like us.
So what does this mean practically? Well, it essentially means that these models are good at operating in the average range, but fall apart towards specifics. It also means that they work well when they have more relevant data to make predictions from. If some information isn't prominent in the context that the language model is referring to for its prediction, then it's likely to fill in the blanks.
To highlight this point, let's take a look at some of the areas that language models both struggle and succeed.
Things Language Models are Bad At:
Generating high-quality content (Essays, Blogs, etc.) from scratch
Producing accurate facts without additional context
Giving you the exact output that you're looking for
Knowing when to stop talking
Things Language Models are Good At:
Making quality improvements to existing content with a certain goal
Producing accurate statements when given additional context within a prompt
Matching examples closely with a high degree of flexibility
Having an idea of the approximate quality of some text relative to a goal
There's a trend here, do you see it? Language models work best when applied to some existing data. They work well off of an initial point of reference, and oftentimes the more context you can provide them, the better they are at approximating a useful result.
Asking a language model to write a blog directly doesn't often work out very well. This is because when it attempts to create the content, it doesn't have much of a point of reference, and therefore will likely make something that isn't to your standards. Accurate facts are challenging because it doesn't have the context readily available.
Exact output is challenging because they only approximate, and don't produce exact results by default. However, if you can settle for something close, you can retain a degree of determinism with much more flexibility towards diverse use-cases.
A language model might not be good at determining when some content is ready to publish, because it's trained to always provide an answer. It can, however, be very good at constantly moving the needle towards 100% readiness, but never reaching.
The bottom line to keep in mind is this:
Lean into language models' approximate nature, and don't give them the chance to fill in the blanks. Have them work off of a point of reference rather than starting from scratch.
What makes a good feature?
Before we can understand what makes a good language model feature, we need to understand what makes a good feature in the first place. Ultimately, a feature should serve the app's core purpose, or value to the user. When a user downloads your application, they want it to solve a specific problem for them. Every single feature in your application, therefore, should serve that solution.
Let's take Airbnb as an example. If you're looking to find a place to stay when visiting a certain area, you have a problem. There are often hundreds to thousands of different places to stay, all with varying amenities, features, and requirements. That's a lot to handle. Airbnb, among other things, simplifies this process through their solution. Their application allows you to browse through rental properties in a given location, and easily filter based on various criteria.
So the core value of Airbnb is something like "provide a bridge between renters and rentals". Every feature of Airbnb is in service of that. For example, being able to search for a specific type of property, in a specific area, for a specific time. For reference, let's take a look at some of Airbnb's other features:
You can input your desired location and length of stay
Easy-to-choose categories to quickly narrow down your selection
Price estimations are displayed prominently on every listing
You can add listings to saved lists for later review
Looking at pictures of listings to get an idea of the feel
Each one of these features contributes to the larger value of the Airbnb solution, and good design requires careful integration of each. The system should be frictionless, and workflows should make sense. Each of these features either makes the solution easier to use, or more seamless in some way. They do not make things unnecessarily complicated.
Implementing a good feature means keeping this in mind. What is your product's core value to the user? Is your feature contributing to that? Or is it noise? There is no good feature in and of itself, only one that solves a specific problem as a subset of the greater solution.
How to do it wrong...
You might also ask yourself this:
Is this a solution in search of a problem?
It's surprisingly easy to convince yourself that something would be useful when you simply want it to exist. This can happen for a variety of reasons, one of which I'm frequently guilty of is "shiny object syndrome". I'll often find myself interested in new technologies which claim fancy features and improvements on the boring current versions of things.
Language models are an excellent chance for that to happen. We see them in action and think, "this is like magic". Now there's a programming interface to automatically have reasoning performed on custom data. The implications of this are huge because of how flexible and capable language models can be. However, that doesn't mean that they're always a good fit.
If a good feature is one which acts in service of the core value of the application, then a bad feature is something which doesn't. In fact, some features go as far as to increase friction, or downright decrease the value of the application altogether. One type of feature which I've grown to dislike is what I like to call the "Glorified FAQ".
Upwork, for example, provides an option on their Help Center to "Chat with Support", with the description "Get immediate support by starting a chat". This feature at a glance seems like it might connect you with someone behind the scenes who can actually help you fix your problem. However, when you attempt to start a chat in search of needed assistance, it becomes apparent that this is not the case.
I began my conversation with "Upwork Support" like so:
I'm having issues with my identity verification
And received the following response:
I'm sorry to hear that you're having issues with your identity verification. You can access your personalized video verification link through "Settings > Identity Verification" and a new agent will continue with the process.
If you continue to experience major issues, please let me know, and I can assist further.
This is quite obviously an automated message, and they don't necessarily hide that fact ("Powered by Forethought" is displayed). However, it's also obvious that there is little to no value in this chatbot. It has simply regurgitated some link to the very identity verification that gave me issues in the first place. This has done nothing to solve my problem, and once it realizes that it can't, it simply redirects me to a support ticket.
Upwork is only one of many companies who have an automated support system powered by LLMs, and I haven't seen a good one. The issue that I see with these features is that they integrate an LLM in service of a problem that has already been solved. You can easily search the FAQs on Upwork's support page to find the exact advice about identity verification that was given to me by the "Upwork support" chatbot.
It might be a useful feature if Upwork's assistant actually had the ability to close support tickets in some capacity. Perhaps it could go and read the error message associated with my account. Or maybe it could interface with Upwork's administration system and get my identity verified by walking me through the process. However, it doesn't do that at all, and is competing heavily with the search bar in terms of utility.
Avoiding the "assistant"...
One type of LLM-powered feature which has grown in popularity is the "assistant". This is a sort of all-purpose chatbot that can answer questions, and maybe perform some app-specific functions. At face value, this might seem like one of the most effective ways to integrate language models into your application.
Simply have a chat window, and let the user talk to the assistant. Hook up the assistant to your backend, giving it access to some functions that it can perform when instructed. Then, through the power of conversation, your users can use your app like never before! This is how humans think and communicate naturally, after all.
There's one small issue: this is just a worse version of your existing interface. Let's keep in mind that the entire point of your interface is to provide an intuitive surface for your users to interact with your application's functionality. You have designed your entire application around certain workflows, and built-in features to optimize towards those workflows. Building a chat interface on top of that, which can interact directly with your backend or press buttons for you, isn't necessarily going to increase the value of your application.
In fact, it might be to the detriment. A chat often isn't a great way to use an application. Navigation is unidirectional. Instead of a button that says "do the thing", now you have to create a prompt that gets the chatbot to effectively press that button for you. Instead of simply interacting with a UI that has all the features for you, now you have to talk to a chatbot, and coerce it into working with you.
All this in service of... what exactly? Now we can "talk" to the application instead of clicking around, however this isn't actually more intuitive. It's like having a hammer, and instead of just using the hammer like a hammer, you build an assistant that you can ask to use the hammer. You'll have to tell it exactly how to use the hammer, and it will probably get it wrong from time to time, but at least now you don't have to move your hand as much.
I don't want to give up entirely on the assistant, however. I do think that there is some promise to the idea, especially for complex workflows. Perhaps "wizards" can be replaced by wizard chatbots, that help you through specific workflows iteratively. However, in the meantime, we have to consider whether this new interface is actually useful, or a solution in search of a problem as described above.
Invisible AI...
Ok, so now that I've rambled on about how NOT to do it, let's talk about making good features.
I'd like to start by highlighting this YouTube video by Enrico Tartarotti, which discussed the issues with the current landscape of language model features. They're everywhere, but they aren't really working. Things are complicated, users are annoyed, popups are prominent, and everything's a mess. However, in search of a solution, Enrico coined a term that resonated with me, "Invisible AI".
It's not about more AI, actually it's about getting rid of the AI, making AI invisible
So, what does that even mean? Let's think back to the previous section on making a useful feature. Every single feature in your application, should serve the solution. The idea of invisible AI is to just keep making useful features, like before, but using language models to increase their functionality. Now that we have this technology, we can do more dynamic tasks than before.
There's no need to necessarily imagine how to use language models to add new features to your application. In fact, that's a little backwards. Instead, figure out how to add new features that your users might appreciate, and then if language models fit your use-case, use them too!
For example, one very common workflow when making application is the form. I was working on a web app recently, where a user would fill out a form with details about a job description. While this is a fairly simple task, it can be quite tedious, and so I sought to find a way to have this form automatically fill when given a link to a job description.
There are many ways to get data off of a web page, but many of these require creating custom scripts for each site that might be accessed. Additionally, rigidity is an issue, as a custom script will break if the website changes. So, this is an area where language models presented themselves as rather useful.
I decided that I would create a system where the user could input a link to a job description, and a language model would parse all the text from that site. It would identify the data for each field in the form, and send it back to the system, which would then populate the form for the user.
This integration was simple, and to the end user it was a single text field with a button that said "Populate". However, it contributed to the value of the application by decreasing friction for the end-user. I'm not going to pretend like this is the most impressive feature, but it is an example of "Invisible AI".
You don't have to (and probably shouldn't) advertise that your application is "AI Powered". You just need to make features that your users actually find helpful. Language models are a great way to make more complex and useful features, but you don't even need to tell your users that they're being used.
Takeaways
In the coming months and years, the AI hype cycle will burn out. Language models will potentially plateau, or at least become increasingly difficult to train. Advancements will still be made, but we'll come to find that a large percentage of the promise of AI never came to fruition. Our tools will look different, and be more powerful, but they'll be pretty similar to how they are now.
That's why invisible AI is important. Don't ride the hype cycle and create an illusion of value to both you and your users. Rather, create actual valuable features, and use language models where it makes sense.
Note from the author
Thank you very much for reading :), I hope you found this article valuable. I have some articles planned that will delve a bit deeper into this topic. The next few years are going to be very fun. I look forward to seeing how things go!
Credits
Background Music:
Music track: Marshmallow by Lukrembo
Source: https://freetouse.com/music, Copyright Free Background Music


