Robot hand typing on laptop keyboard depicting ChatGPT.

The Limits of AI: Testing ChatGPT’s Performance in Engineering

With ChatGPT’s release, what seemed like science fiction only a few months ago now appears plausible: AI is finally here. Attention-grabbing headlines everywhere demonstrate how AI can write code, teach languages, compose music, and generate art. It seems like artificial intelligence is finally, well… intelligent. However, according to OpenAI’s CEO, Sam Altman, many of the outrageous claims seen online are sensationalist exaggerations gripping the public. To put AI to the test, I decided to task ChatGPT (Feb 2023 release) with problems similar to those encountered in my work as an applications engineer. I found that AI is a promising tool, but it has a long way to go before it can compete with human intelligence. Here, I’ll present some experiments and my evaluation of ChatGPT’s performance on various engineering prompts. 

General Purpose Troubleshooting 

ChatGPT is known to be excellent at information aggregation and summarization, which explains how prompts with generic problems (even using specific part numbers) generate excellent responses. Ask ChatGPT about common part issues (e.g. “Why am I not seeing output on the DOUT pin of ___ part?”) and it provides great recommendations for general troubleshooting, including checking the connections, power supplies, and input signal range. These responses are not particularly exciting or novel, but they are incredibly useful because most product issues are resolved with standard troubleshooting procedures. ChatGPT excels at these kinds of general-purpose replies. 

Another query that prompts for a similar high-level response: “I’m using an I2C device and am not getting any response from the slave device. I have 7 devices attached to the master device, using 10k pullups at maximum data rate transfer. Can you help troubleshoot?” In this case, ChatGPT demonstrates impressive implicit domain knowledge, even as it recommends fairly standard I2C troubleshooting steps. While the output itself isn’t particularly novel, GPT’s ability to rapidly aggregate knowledge allows it to generate useful responses to generalized queries, even in smaller domains that might require extensive background. This indicates promise for this kind of AI to be highly useful in providing first steps and information, even in highly specific problem domains. 

Generating Pseudo-Truths 

An alarming failure mode of ChatGPT emerges as we narrow the problem scope, with ChatGPT generating responses that seem correct, but upon further examination, are not. For example, when asked about using one part as a drop-in, pin-for-pin replacement for another 56-pin TQFN part, ChatGPT confidently asserts that the parts have 5 pins. It also provides (amongst other errors) incorrect supply voltages and output current limits, and claims that the two parts are incompatible. Every supposed “difference” that ChatGPT reports between these products (in this case, the MAX4936 and MAX4937) is untrue — the parts are nearly identical! As this “pin-for-pin" prompt demonstrates, prompts with high specificity quickly reveal ChatGPT’s algorithmic origins, and hints at the limits of trust in AI-generated information. 

Guessing Passes Tests 

Despite the appearance of intelligence, ChatGPT is fundamentally a probability-fueled machine. As a subset of a class of AI called “large language models” (LLMs), it operates by predicting the statistically most likely next word (or “token”) in a string of text. Even when it claims to do so, it doesn’t consult a datasheet or look up information — it’s not even connected to the internet! Therefore, what seems like “reasoning” is really pattern recognition and extrapolation, providing what is most likely to be the case based on training data. This explains its failures in well-defined problem spaces: statistically likely extrapolation becomes wholly untrue when conditioned on narrow queries.  

In addition to providing potentially wrong information, the structure of LLMs presents dangerous issues regarding trust and our ability to use it safely. ChatGPT’s probabilistic output conforms to statistical regularities which challenge the heuristics we use to distinguish fact from fiction. When these heuristics fail, the best way to check for errors is to directly solve the problem yourself… but at that point, what’s the use in asking a computer to do the work for you, anyway?  

Gaining Public Trust 

This paradox hints at a likely misuse of artificial intelligence: AI is only as useful as the trust we can give it, but given the incentives, it seems inevitable that we would compromise on trust to use it more broadly. Understanding how AI mimics intelligence and demonstrating its failure modes might break the illusion enough to recalibrate our expectations and prevent such a naive mistake. 

I asked ChatGPT why it generated incorrect responses. Part of its response summarized the issue perfectly: 

OpenAI's language models, including GPT-3, are trained on a large corpus of text, which includes many errors and inaccuracies. The models use statistical patterns in the data to predict the next word or sequence of words, so it's possible for the models to generate incorrect information. 

Finally, I decided to ask, “Can I trust your responses?” 

As an AI language model, I don't have feelings or intentions, so the concept of trust does not really apply to me. However, I can assure you that I am designed to provide accurate and helpful responses to the best of my ability based on the data and algorithms that have been used to train me. I strive to be as reliable and impartial as possible. However, it is always a good idea to exercise critical thinking and fact-checking when using any source of information, including AI language models like me. 

Indeed. AI has tremendous power to transform our world, but as generalized artificial intelligence becomes more ubiquitous, accessible, and advanced, it is essential that we continue research to understand its capabilities and limitations. Over time, we will undoubtedly trust AI’s like ChatGPT with more critical tasks, and we must be educated to make informed decisions and use ChatGPT responsibly. 

 

* To see a full transcript of my conversation with ChatGPT, click here

  • This is one of the best practical summaries of chatGPT's abilities that I have seen. If Google is one side of an information coin, then chatGPT is the other: it is excellent at understanding context and meaning, and inferring your intentions, and its response matches that. On the other hand, it fails when in narrow fields and especially with specifics - whereas google is excellent when you can provide names, product model numbers, or rare terminology.

  • Alexander, you do a great job keeping us honest about the capabilities and limitations of ChatGPT. The real way to make the most of ChatGPT within engineering is to marry it with an enterprise search tool. Enterprise search tools are very good at retrieving internal company knowledge no matter where it lives. Results come back as snippets or links with little blurbs of the relevant content found in a document. If you have ChatGPT summarize snippets retrieved by search, you get answers that can be verified for correctness. You can verify correctness by going into the source document retrieved by search.

  • Great analysis.  Some companies are promoting AI as a timesaver, replacing human-written research and writing with AI.  A company could staff their support centers with AI-driven bots, but given your narrow-focused query, a bot could end up giving incorrect information in the semi-conductor industry.  

  • Doug Mercer pointed your posting out to me. We have been talking about AI.

    Your thoughts have encouraged me to think more about how to configure student teams with AI Agent members. Sounds like it has some real potential. One thing to think about - where would one put an AI Agent in the Dreyfus Model of Expertise? (Novice-Advanced Beginner-Competent-Proficient-Expert) I suspect that AI Agents are more like consultants who have specialized expertise in a narrow area - good at finding simple versions of information maybe. 

  • I strongly believe that teams with AI agents would often perform better than those without. Pretty sure that most of the time, the best performance comes from humans most able to leverage the latest technology. ChatGPT isn't as good as having an actual expert with you, but how many of those can you have across all of the fields relevant to your problem? And how long do you need them to assist you? In solutions which span a broad range of material, you might not need a lot of depth (which an expert could provide) but simply need the range to get a baseline understanding — and this is where ChatGPT and other LLMs excel. 

    Also, I like your point about AI agents helping students get started. I would replace "students" with "anyone relatively new" because it can give a lot of the high-level introductory understanding that anyone might need when first entering a new problem space. I'm not as worried about students not thinking for themselves, if only because the reasoning that this technology can do right now "bottoms out" at the point where most students would usually end up struggling and finding learning value. Long-term, though, the tech will become better and make this a real concern, but when that point is, is anyone's guess.

    Thanks for reading and sharing your thoughts!

  • This is great. I recently worked on a proposal with some folks from an unnamed university to explore what it means to have teams that include AI agents. That was only part of the proposal, but it does really interest me. I have been treating ChatGPT as a type of colleague, kind of based on ideas I generated while working on the proposal. This article is really helpful and reminds me of the issues that we face on teams when there is a very large range of experience/background. There are other fascinating issues in what we are now calling convergent research where teams are made up of people from starkly different disciplines. Working with psychologists can be great fun, but it requires a lot of translation and dealing with misunderstanding. One can go weeks thinking that everyone is in agreement on something but different groups were working from different definitions of basic ideas or even words. What also makes this article fun is its emphasis on troubleshooting. I can see how an AI agent would help students get started with something they usually find challenging. The worry is that they do not end up thinking for themselves, but the inadequacies of AI agents should remain obvious for some time so that students should know what to be skeptical about. 
    Thanks again for this interesting exploration of AI.