Have you ever wondered how voice assistants, one of the most fascinating applications of Artificial Intelligence, learn to answer your questions? How do they know how to respond to a wide range of queries, from the weather to cooking recipes? The answer lies in the way these assistants, or language models, are trained. Recently, OpenAI, a leader in Artificial Intelligence research, published a paper that sheds light on an effective technique for training these models, called process supervision.
Process supervision is like a patient teacher who guides a student step-by-step through a problem, rather than simply telling them the correct answer. This is particularly useful for complex problems that require multiple steps to solve. For example, if you ask your voice assistant, a practical application of GPT-4, how to bake a cake, it will need to explain multiple steps, from mixing the ingredients to the baking temperature and time.
OpenAI found that this “step-by-step teaching” approach is much more effective than simply focusing on the end result. That makes sense, right? When learning something new, it’s often more helpful to understand the process, rather than just the end result.
Additionally, the study also highlighted the importance of active learning. This means that the language model learns to adapt based on user feedback. So if you correct your voice assistant when it makes a mistake, it will learn from it and improve its responses in the future.
However, there is a challenge we need to be aware of. Sometimes, these language models can generate information that has no basis in reality or the input data, something researchers call “hallucinations.” It’s as if the voice assistant makes up a response that sounds plausible but is actually incorrect. OpenAI is working on ways to detect and correct these hallucinations to improve the accuracy of responses.
So what does this mean for us, the average tech user? It means that the way we interact with our voice assistants or other AI tools can affect the quality of the answers we receive. If we provide helpful feedback and guide our voice assistants through problems step by step, they can learn to provide us with more accurate and helpful answers.
A practical example of this can be seen in a hypothetical situation where we ask a voice assistant about the daily schedule of a fictional character, Steven. The assistant, using GPT-4, analyzes the information provided, such as the times Steven was seen doing various activities, and makes educated guesses about when Steven might have gone to the bookstore. The assistant’s response is based on a logical analysis of the information provided, demonstrating GPT-4’s ability to process information and provide useful and relevant answers.
However, it is important to note that the effectiveness of this interaction depends on the alignment of the language model with the user's instructions. Alignment is a measure of how well the language model follows the user's instructions and provides answers that are in line with the user's expectations. If the language model is not aligned with the user's instructions, it may provide answers that are technically correct but not useful or relevant to the user.
Therefore, to improve our interaction with AI tools and get better answers, it is crucial that we work to improve alignment. This can be done by providing clear and specific instructions, providing feedback to the language model to help it learn and improve, and using process supervision techniques to ensure that the language model is following instructions correctly.
In short, effective interaction with AI is a two-way street. It’s not just about how the AI responds, but also about how we, as users, interact with the AI. By improving alignment and providing clear instructions and feedback, we can improve the quality of the responses we receive and make AI a more useful and effective tool.
And if you’re wondering how you can apply these lessons in your own company or organization, Uebit is here to help. In uebit, we have a team of trained and up-to-date professionals, aligned with the latest technologies in artificial intelligence. We offer consulting for companies and training for employees on how to use technologies like ChatGPT effectively. Contact us to learn more about how we can help you improve your interactions with artificial intelligence and get more accurate and useful answers.
References where I got this information from:
Improving mathematical reasoning with process supervision (openai.com)
[2202.01344] Formal Mathematics Statement Curriculum Learning (arxiv.org)
[2305.20050] Let's Verify Step by Step (arxiv.org)
Text: Guilherme Silveira
Revision: Victory Parise
Image: Midjourney