With ChatGPT’s release, what seemed like science fiction only a few months ago now appears plausible: AI is finally here. Attention-grabbing headlines everywhere demonstrate how AI can write code, teach languages, compose music, and generate art. It seems like artificial intelligence is finally, well… intelligent. However, according to OpenAI’s CEO, Sam Altman, many of the outrageous claims seen online are sensationalist exaggerations gripping the public. To put AI to the test, I decided to task ChatGPT (Feb 2023 release) with problems similar to those encountered in my work as an applications engineer. I found that AI is a promising tool, but it has a long way to go before it can compete with human intelligence. Here, I’ll present some experiments and my evaluation of ChatGPT’s performance on various engineering prompts.
ChatGPT is known to be excellent at information aggregation and summarization, which explains how prompts with generic problems (even using specific part numbers) generate excellent responses. Ask ChatGPT about common part issues (e.g. “Why am I not seeing output on the DOUT pin of ___ part?”) and it provides great recommendations for general troubleshooting, including checking the connections, power supplies, and input signal range. These responses are not particularly exciting or novel, but they are incredibly useful because most product issues are resolved with standard troubleshooting procedures. ChatGPT excels at these kinds of general-purpose replies.
Another query that prompts for a similar high-level response: “I’m using an I2C device and am not getting any response from the slave device. I have 7 devices attached to the master device, using 10k pullups at maximum data rate transfer. Can you help troubleshoot?” In this case, ChatGPT demonstrates impressive implicit domain knowledge, even as it recommends fairly standard I2C troubleshooting steps. While the output itself isn’t particularly novel, GPT’s ability to rapidly aggregate knowledge allows it to generate useful responses to generalized queries, even in smaller domains that might require extensive background. This indicates promise for this kind of AI to be highly useful in providing first steps and information, even in highly specific problem domains.
An alarming failure mode of ChatGPT emerges as we narrow the problem scope, with ChatGPT generating responses that seem correct, but upon further examination, are not. For example, when asked about using one part as a drop-in, pin-for-pin replacement for another 56-pin TQFN part, ChatGPT confidently asserts that the parts have 5 pins. It also provides (amongst other errors) incorrect supply voltages and output current limits, and claims that the two parts are incompatible. Every supposed “difference” that ChatGPT reports between these products (in this case, the MAX4936 and MAX4937) is untrue — the parts are nearly identical! As this “pin-for-pin" prompt demonstrates, prompts with high specificity quickly reveal ChatGPT’s algorithmic origins, and hints at the limits of trust in AI-generated information.
Despite the appearance of intelligence, ChatGPT is fundamentally a probability-fueled machine. As a subset of a class of AI called “large language models” (LLMs), it operates by predicting the statistically most likely next word (or “token”) in a string of text. Even when it claims to do so, it doesn’t consult a datasheet or look up information — it’s not even connected to the internet! Therefore, what seems like “reasoning” is really pattern recognition and extrapolation, providing what is most likely to be the case based on training data. This explains its failures in well-defined problem spaces: statistically likely extrapolation becomes wholly untrue when conditioned on narrow queries.
In addition to providing potentially wrong information, the structure of LLMs presents dangerous issues regarding trust and our ability to use it safely. ChatGPT’s probabilistic output conforms to statistical regularities which challenge the heuristics we use to distinguish fact from fiction. When these heuristics fail, the best way to check for errors is to directly solve the problem yourself… but at that point, what’s the use in asking a computer to do the work for you, anyway?
This paradox hints at a likely misuse of artificial intelligence: AI is only as useful as the trust we can give it, but given the incentives, it seems inevitable that we would compromise on trust to use it more broadly. Understanding how AI mimics intelligence and demonstrating its failure modes might break the illusion enough to recalibrate our expectations and prevent such a naive mistake.
I asked ChatGPT why it generated incorrect responses. Part of its response summarized the issue perfectly:
OpenAI's language models, including GPT-3, are trained on a large corpus of text, which includes many errors and inaccuracies. The models use statistical patterns in the data to predict the next word or sequence of words, so it's possible for the models to generate incorrect information.
Finally, I decided to ask, “Can I trust your responses?”
As an AI language model, I don't have feelings or intentions, so the concept of trust does not really apply to me. However, I can assure you that I am designed to provide accurate and helpful responses to the best of my ability based on the data and algorithms that have been used to train me. I strive to be as reliable and impartial as possible. However, it is always a good idea to exercise critical thinking and fact-checking when using any source of information, including AI language models like me.
Indeed. AI has tremendous power to transform our world, but as generalized artificial intelligence becomes more ubiquitous, accessible, and advanced, it is essential that we continue research to understand its capabilities and limitations. Over time, we will undoubtedly trust AI’s like ChatGPT with more critical tasks, and we must be educated to make informed decisions and use ChatGPT responsibly.
* To see a full transcript of my conversation with ChatGPT, click here.