⚠️ I do not use ChatGPT for work. For good reasons, it is prohibited where I work (porting PC games to consoles):
- Uploading client code into the chatbot may violate the NDAs we have with our clients and partners.
- We do not charge clients for AI work. (As you will see below, it is not nearly good enough.)
- Since many of the libraries we use are proprietary libraries also under NDA, the bot has limited knowledge of the platforms we work for, and it is not very useful to provide the solutions we need.
- Because of space limitations, it can’t do useful work on a large scale.
At least for the time being, a human is still the best way to get a game made for one platform onto another.
That said, I have been revisiting implementing classical data structures for a personal project, and I find the bot a compelling coding assistant. In this post, I describe my experience.
Since ChatGPT 4 is rate-limited, this post is mostly about ChatGPT 3.5. Although ChatGPT 4 is indeed noticeably better, I think the same points still apply, just to a different degree.
The good
It has a vast knowledge of algorithms and data structures. You can ask it questions like:
- What is a data structure that supports logarithmic insertions, deletions, and finds?
- What data structure works like a heap but supports constant time finds?
- What are the variations of merge sort?
It can give you good references on specific data structures or algorithms. It can suggest books (often even where inside the book), papers, and websites. What I find extremely useful is a structured reading list of a topic.
It knows a lot about C#, the standard library, and even the IDE I use. You can ask it, for example, if C# provides a library implementation of a binary search tree. (It does, in the form of SortedSet).
It can translate from one language to another. Translating code between languages is not too hard for programmers, even if they don’t fully know one of the languages. But this task can be annoying and might require a lot of research, especially when dealing with code that use many library functions. While there are automated tools out there, their results often need plenty of manual tweaking.
It can interpret code. In many cases, it can recognize the algorithm that a piece of code is trying to implement. It is therefore able to add useful comments to a piece of code, choose (moderately) better names, and write useful XML documentation. The latter is especially wonderful for the boilerplate comments you need to write for container classes.
It can generate small sets of useful unit tests. As explained below, there are limitations to this.
It can do useful transformations that go beyond code transformations available in typical IDEs, quite successfully in many cases. Here are some examples:
- Transforming recursive methods into iterative ones.
- Modifying a class symmetrically, such as converting a min heap into a max heap.
- Adopting a different idiom, like switching NUnit tests to a version using Assert.That.
- Altering a method to utilize a different type, for instance, using an array instead of a List.
- Reducing code length. (However, not always usefully — for example, by removing necessary braces or deleting needed code.)
- Enhancing code efficiency. (Although, not always.)
- Selecting improved variable names.
- Applying Microsoft coding standards when requested, though it may not always work.
- Breaking a lengthy method into smaller methods, typically making reasonable choices.
- Implementing an interface within specific constraints, including the structure for the implementation.
- Suggesting additional methods for a given type.
Because it can transform code so quickly, it can be very useful to evaluate alternative styles or implementation details. Even when not correct, this can inspire alternative implementations that would not occur to me naturally. I would not have coded four different iteration methods just to see how they look, for example.
You can ask it to spot bugs. It does not always work, but sometimes it does. It is usually a good idea to ask it to write a unit test that will show the existence of a bug that it found, as it can be easier to check the test (and then run it) than to check the veracity of the bug report itself. It is especially good for detecting bugs in tediously expressed code (such as very long methods or a problem in a long XML document). I have used it to spot errors in my Rider XML layout file.
You can ask it for domain-specific applications of algorithms. For example, I learned that a stack can be used in the algorithm to calculate the convex hull of a 2D point set. As a game developer, I like this because too many examples in computer science are taken from the world of business. (If you don’t specify it will mimic the rest of computer science except where the algorithms tend to be used in a certain domain, or if you already made it focus on another domain during the conversation.)
The bad
You cannot trust it. You have to carefully vet any code you use, or any answer it gives you about anything really 🙂. In some cases, it will change working code into a non-working version to do what you ask. The request “Can you shorten this” can sometimes lead to more elegant code, but it can also simply remove some code that is needed.
The best way to vet its code is of course by using a set of unit tests (and I think as a result I have leveled up my own test writing skills).
Although it knows a lot about Git too, as a user that knows Git only moderately well, I am too scared to type any of its suggestions in the command line.
It sometimes makes up stuff that does not exist. How many times it excited me by suggesting a very compelling paper, C# feature, or data structure that does not exist!
It has a very good feel for things that would be nice if it was possible. It often suggests C# code that could work, but does not. For example, it has suggested internal static classes for private extension methods, or extending from classes that turn out to be sealed, or overloads of generic methods where only the type constraints differ.
It sometimes suggests how to configure Rider to do something that is in fact impossible. This can be quite frustrating because you never know if instructions could be off because it applies to a different version or not.
It sometimes makes assumptions that are not true. For example, it is reasonable to come up with a list of things to test, but it may assume you implement an algorithm for objects that implement an interface. Of course, this can usually be fixed, but if you cannot do it within an iteration or two drift kicks in (see below) and you cannot get a preferred solution.
It is not good at giving links to high-quality reference implementations. Requests for these usually don’t work, or the answer points to low-quality code, or code that is very obscure. Sometimes I would prefer a reference made by a human, especially when published in a journal, for example, rather than have a questionable version created by the bot.
When you push it, it can become silly. If you ask it for 50 new methods for a class, some of them will be duplicates, or not useful.
It drifts in style and focus. It tends to use new variable names, or a new brace style, or new idioms in successive answers. This makes it frustrating trying to tweak code. It will often switch the programming language too (almost always to Python, in my case).
This, combined with the quantity of text it can generate, makes it less useful to write cohesive code that it can in principle write. For example, it generally cannot produce a full set of unit tests for a class. Up to a point (using a list as a base), this can be broken up into successive requests, but it tends to forget aspects of the conversation and may neglect to produce all the tests.
The drift means sometimes tweaking a method too much will make it break — it forgets what the method is supposed to do or try too hard to do what you ask.
Here are how these exchanges typically look:
Me: Can you please give me a C# method for quicksorting an array?
ChatGPT: Sure! *Gives code*
Me: Can you please use clearer names?, and call the method simply “Sort”.
ChatGPT: Certainly! *Gives code with better names but now the local variables start with underscores*
Me: Please do not use underscores in variable names.
ChatGPT: *Fixes the variable names but now it is not quicksort anymore*.
Me: Please change it back to quicksort.
ChatGPT: I apologize for misunderstanding your previous message. *Gives new quicksort code. It still has goodish variable names but annoyingly they are different.*
Me: Can you use the names you had before.
ChatGPT: *Gives the code with the right variable names but now it is Python.*
Me: …
So instead of moving forward with the tasks at hand, it makes another sidewise change so you move diagonally. In my experience, it is almost impossible to get it back on track after two or three of these diagonal movements, so when this happens I give up or start over.
It is not good at more advanced algorithms. For example, it was not able to produce a correct version of 4-way merge sort. (This is also the type of algorithm that seems a little daunting to me as a human programmer.) When failing in this way, it usually gives an algorithm that looks structurally correct. However, when testing it does not give the desired result. Occasionally after a bit of prompting you can get it to fix small errors, but after about three tries it drifts and the further iterations take you farther from a solution. I usually do not try to debug these wrong solutions: to me, it feels more likely to be fantasy code than code that can work but has an error in it.
Something interesting
It sometimes comes up with ideas that are not quite right but also not quite wrong. For example, in Algorithms (2011, Sedgewick and Wayne), one of the exercises (2.1.14) asks you to implement what they call “dequeue sort”, a terribly bad name since the previous exercise involves “deck sort”, and both algorithms are explained with a deck of cards. I wondered if the algorithm perhaps had a better name out in the wild so I asked ChatGPT if it recognized my implementation, and it responded that it is a version of gnome sort. At first, I thought it was completely wrong, but then it occurred to me that it kind of works like gnome sort if the gnome stays stationary and instead you rotated the list. This is the type of connection that would be impossible for me to discover (other than accidentally).
So far, this happened very rarely; I cannot think of another example.
Concerns
Despite its current limitations, it is very impressive technology, and I do wonder about the future of it and us. Here are some things I worry about.
In the future, who will make the content? Much of the chatbot’s programming knowledge must come from sources like Stack Exchange and GeeksForGeeks. However, if AI bots increasingly handle user questions, these sites might struggle to keep the audience necessary to live on. While there will likely always be questions beyond the scope of a chatbot trained on a fixed dataset, will it be enough for sites like Stack Exchange to thrive? The same goes for other knowledge sources, such as books.
(Of course, once a chatbot can interact with tools like compilers and sandboxes to perform tests, it could generate knowledge similar to humans, which would dramatically alter the situation.)
Who will control the AI? Current limitations on AI input and output size prevent the creation of entire books or code libraries. Even if these limitations are overcome, coherence may still be an issue. Assuming this problem is solved, will average people be able to harness AI on this scale?
For example, the idea of an AI generating a book on an obscure topic tailored to your existing knowledge is alluring and would be an incredible boon for education, but if only a few companies can do this, their monopolies could become the gateway to knowledge and other great things. AI has the potential to be incredibly beneficial for humanity, but its positive impact may not be realized if controlled solely by companies that don’t prioritize the common good.
What will happen to knowledge workers? It’s easy to imagine AI replacing humans in various fields. For now, the secrecy of console manufacturers and the AI’s current limitations prevent it from taking over my specific job of porting games. But this may change in the future. And even without that, an AI, guided by an experienced developer, could already handle tasks typically assigned to junior programmers, reducing the demand for their roles.
What dangerous code can be produced that slips through the cracks? Errors in AI-generated code can be subtle and might still go unnoticed even with careful testing. Although this also occurs with human-generated code, the rapid pace of AI code creation could make it impossible to examine all the code at the right level and address all the issues that may arise.
Will we survive? It is maybe a bit dramatic to ask this, especially since the AI is still so silly at times. But even now, what will ChatGPT do if it had the following:
- The ability to trade (that is, access to a bank account).
- Access to social media.
- Access to some type of freelance site (maybe translation or editing).
- A profit-making loop that makes it take jobs, and develop a following on social media.
Maybe providing it with a profit-making loop will be difficult. But it seems within reach, and once you have that, you have an AI that has money and people that listen to it. It is certainly something to think about.
Conclusion
After using ChatGPT a lot, you certainly run into the limitations of its intelligence; you can clearly see it is not thinking about stuff; you get a feel for how the probability works that causes it to make the sideways changes as it is trying to comply with requests.
Even where it is good it may still not be quite good enough. For example, as a result of my investigation into algorithms, I have a big code base that is not documented. I could have ChatGPT do it, and it will probably do a decent job. But I am reluctant to do it simply because I will have to do it in batches and I know it will generate the XML docs with minor inconsistencies that will bother me too much and I will feel compelled to change them. And I am not sure I want to spend hours editing a computer’s dry XML comments…
That said, even with the drawbacks, I have found ChatGPT to be a valuable resource. The ability to quickly ask basic questions and receive helpful answers beats any other methods available to us.
Interestingly, using ChatGPT stimulates me to think differently about the code I’m working with, making me more aware of the subtleties I need to consider. This reminds me of the way search engines have changed our thought processes. When formulating a question, we often anticipate the engine’s biases and consider how we need to flavor our search query to obtain what we need. We don’t look for “brick” we look for “brick movie”.
While ChatGPT is particularly suitable to explore a topic such as data structures and algorithms, which is already abstract and has abundant resources available, I wonder how useful it would be in more hands-on, gooey domains like game development. And I’m not sure how helpful ChatGPT would be for beginners who don’t have the experience and thinking tools needed to analyze and interpret its responses.
I will continue to use ChatGPT where possible. If you haven’t already, I would recommend checking it out, but: buyer beware!