GitHub Copilot: Weighing the Pros and Cons of Auto-generated Code
GitHub Copilot: Weighing the Pros and Cons of Auto-generated Code
  • Reporter Kim Seo-jin, Won John
  • 승인 2021.09.05 22:29
  • 댓글 0
이 기사를 공유합니다

▲Illustration of a programmer
▲Illustration of a programmer

 

If AI were able to automatically complete the code you had been struggling with for countless nights, would you truly count on the unaccustomed technology? GitHub Copilot is a brand-new extension for Visual Studio Code, a popular source-code editor for software development. The recently unveiled technology uses AI to help developers write code. Copilot is powered by Codex, a deep neural network language model developed by OpenAI and GitHub. It is a descendent of the previous OpenAI language model—Generative Pre-trained Transformer 3 (GPT-3)—and has been repurposed to generate syntax-abiding code instead of normal text.
Copilot helps developers by automatically suggesting new lines of code fitting the context or writing entire functions in code based on the name of the function or comments attached to the function. Since Codex has been trained in both programming and human languages, it can convert natural language instructions into computer-intelligible code, allowing for a streamlined workflow from conception to realization. For example, by writing the name and docstring (a string literal specified in source code) of a function that should “write text to filename”, Copilot would automatically generate the body part of the function and suggest the content to the developer. After a light touch on the tab key, the suggestion would be added to the code.
However, it must be noted that Copilot does not actually “understand” code. It is a deep-learning model that delivers text that seems to fit the context based on the given data. Also, Copilot does not test the code it suggests, so the code may not be compiled or run correctly as given. Hence, GitHub recommends that every responsible programmer should always test and review the output. Overall, the initial reaction of the developer community at large was positive, regarding the new tool as an innovation that could significantly reduce work hours spent on repetitive coding. Copilot is even seen as a precursor into a paradigm shift to comment-based programming, similar to the shift from machine language programming to high-level programming languages in the 20th century.
As the programming community looked deeper into the details of the new technology, however, many controversies arose around Copilot.  The supposed copyright infringement by AI, and GitHub’s commercial profits from open-source code tainted the reputation of Copilot even before its official launch. The controversies were based on the fact that Codex was trained on open-source repositories protected by copyright laws. While reading open-source data and using the data to train new learning models may not be a problem, selling the AI-generated code back to the developer community who provided the original data seemed to entail a moral and legal issue.
There have been rebuttals about this accusation claiming that Copilot creates transformative works from the source data, and hence should be protected by the Fair Use laws. GitHub backed up these claims by saying that only 0.1% of the code generated is identical to the code in the training dataset. However, many testers who tried out the software claim that Copilot regurgitated verbatim, non-transformed code on a higher frequency than announced by GitHub. As debates surrounding Copilot flared up all around the Internet, some software development industry executives announced that they would ban their employees from using Copilot, concerned that the use of legally unstable software could land the company in dire consequences. Some developers have also started to turn away from GitHub, which was regarded as the epitome of open-source development platforms, and are migrating to rival platforms such as GitLab or BitBucket.
Copilot is currently treading on untested legal grounds concerning the implications of AI-generated code and commercial AI products at large. Can a deep learning model based on public data be copyrighted? Is AI-generated data transformative or “creative” enough to warrant Fair-Use protection? These are questions that needed to be answered to solve the problems of Copilot. For the time being, there will be a tough debate over the potential of Copilot for drastically optimizing development man-hours, the transformative nature of AI, the need to strictly abide by copyright laws, and the moral issues of a corporation capitalizing on open-source code. Only time will tell whether Copilot will emerge as the herald of a new age of auto-generated software development, or collapse under legal conflicts and allegations.
 

▲GitHub logo / Dreamstime
▲GitHub logo / Dreamstime