ChatGPT and other bots aren’t even general artificial intelligence. They don’t understand anything. Some people are now pushing the misconception that the large language models we have now are already AI and can replace human workers.
I saw this article on social media with the title:
AI chatbots were tasked to run a tech company. They built software in under 7 minutes — for less than $1.
https://www.businessinsider.com/ai-builds-software-under-7-minutes-less-than-dollar-study-2023-9
The chatbots did not run any company and they didn’t do any real programming.
What would even be a “team of AI bots”? I guess I was phrased like that because that stock photo was used and they actually think the bots are androids. None of this is real. The paper just makes wild claims but in reality there are still no companies replacing actual programmers. Maybe there will be a breakthrough in actual, general AI soon and I will be out of a job. But I find that rather unlikely to happen in the next years. Maybe in ten years. Or maybe it takes much longer. Nobody knows, because we do not yet have general artificial intelligence. We have chat bots and they only generate answers that we are likely to find satisfying. ChatGPT does not generate text that is factually correct.
Before I start I would like to point out that what is commonly referred to as “AI” is used by many companies including the one I work for. There are many applications and it’s often impressive what can be done. But as of now this technology doesn’t run companies or replace human programmers.
I won’t just copy the complete article, but I will comment on some pieces of it. I did read the paper, and it does mention a lot of the problems, such as hallucinations (such as incomplete implementation of functions, missing dependencies, and potential undiscovered bugs), but I would rather comment on the article, which will be read my more people. The paper is interesting and what they are working on might be useful in the future. I will reference the actual paper wherever it is relevant.
And its’ not $1. The creating of those LLMs cost a lot of energy. It might be cheap now but it will cost more than that.
From there, researchers assigned AI bots specific roles by prompting each one with “vital details” that described the “designated task and roles, communication protocols, termination criteria, and constraints.”
So it’s not a realistic project at all. Most clients change their requirements all the time. Most real programming work has to be agile. However, the paper mentions that they used the water fall model. So it’s not agile or iterative. The article doesn’t mention that. I actually find it interesting that they let the bots create documentation and even a user manual. I kind of doubt that those generated documents are accurate and useful but this type of use might actually help humans writing better documentation as long as they do the proofreading.
During each stage, the AI workers chatted with one another with minimal human input to complete specific parts of the software-development process — from deciding which programming language to use to identifying bugs in the code — until the software was complete.
As if software was ever “complete”. The waterfall model isn’t iterative like Scrum, but once you release version 1.0 you usually don’t just say it’s complete and the project is over. The article doesn’t mention waterfall, so this statements is meaningless.
Maybe after 10 years you get software that can be considered stable. But even then you still have to adopt it to new technology. After 4 weeks you wouldn’t have that. But at the same time they claim they did in 7 minutes what humans would do in 4 weeks.
And then what? Would you use the bots to maintain the code? What if they end up in some problem they can’t solve? Do they design the code so humans can take over and fix the problems?
Researchers, for example, tasked ChatDev to “design a basic Gomoku game,” an abstract strategy board game also known as “Five in a Row.”
I can go to github and clone a Gomoku game in less than 7 minutes. What took them so long? In the paper I can see the bots generated images for the UI. But those must have been generated based on existing assets from existing games. That’s a rather simple task. Doing that in Photoshop wouldn’t take long. A human designer could do that within a few minutes.
And what I don’t understand is how they tested this. Does the game actually follow the correct rules? Is it stable? According to the paper the bots did all the testing, so we simply do not know. I can’t find a link to the source code, so I can’t test it.
At the designing stage, the CEO asked the CTO to “propose a concrete programming language” that would “satisfy the new user’s demand,” to which the CTO responded with Python. In turn, the CEO said, “Great!” and explained that the programming language’s “simplicity and readability make it a popular choice for beginners and experienced developers alike.”
The user of the game isn’t the programmer. The “simplicity and readability” is completely irrelevant. And the bots will do the programming, so it could just as well be Assembler or brainfuck, if they were really that good at programming. But they are not. They just generate code from existing open source code without understanding any of it.
The bots didn’t understand anything because they are not (general) AI. They just generate text. They don’t understand what a user is or what a game is. They just knew that lots of such basic games are written in Python and chose it without understanding anything. And they know that some CEOs do not understand anything about the actual work that is done at the company and just write “Great!”, so nobody finds out that they have no idea what is going on. Maybe they are good at replacing CEOs. We need further research on that.
Conclusion
What large language models can do today is often impressive and it can be fun. The research being done is just as impressive and it show us how the technology can be used. The paper explains a lot. The sensationalism of the article on the other hand is not helping anyone understand what is really going on right now.
So called AI is already used by many companies. But not yet for creating software. However, there are LLMs offering help to programmers. I don’t use it but I wouldn’t mind having an assistant based on some LLM who improves my unit tests or makes helpful suggestions. Especially when they are good enough to inspect the running system based on a prompt to find the cause of the problem. AS far as I know that doesn’t exist yet. Maybe that will be a thing in the future. And I can imagine that there will be a lot of software that is made so that a LLM can configure the existing components to build relatively complex but very common system, such as a web shop, a system for creating PDFs, some app for an existing web API, or a UI for data entry with input validation. Many companies need such software and using a framework with a LLM to configure it might be standard in a few years.
There are so many use cases for deep learning. But to replace general natural intelligence we would need general artificial intelligence, which doesn’t exist yet.