One of the seminal events in artificial intelligence (AI) in 2023 was the decision by OpenAI, the creator of ChatGPT, to disclose almost no information about its latest large language model (LLM), GPT-4, when the company introduced the program in March.
That sudden swing to secrecy is becoming a major ethical issue for the tech industry because no one knows, outside OpenAI and its partner Microsoft, what is going on in the black box in their computing cloud.
The obfuscation is the subject of a report this month by scholars Emanuele La Malfa at the University of Oxford and collaborators at The Alan Turing Institute and the University of Leeds.
In a paper posted on the arXiv pre-print server, La Malfa and colleagues explore the phenomenon of “Language-Models-as-a-Service” (LMaaS), referring to LLMs that are hosted online, either behind a user interface, or via an API. The primary examples of that approach are OpenAI’s ChatGPT and GPT-4.
“Commercial pressure has led to the development of large, high-performance LMs [language models], accessible exclusively as a service for customers, that return strings or tokens in response to a user’s textual input — but for which information on architecture, implementation, training procedure, or training data is not available, nor is the ability to inspect or modify its internal states offered,” write the authors.
Those access restrictions “inherent to LMaaS, combined with their black-box nature, are at odds with the need of the public and the research community to understand, trust, and control them better,” they observe. “This causes a significant problem at the field’s core: the most potent and risky models are also the most difficult to analyze.”
The problem is one that has been pointed out by many parties, including competitors to OpenAI, especially those banking on open-source code to beat out closed-source code. For example, Emad Mostaque, CEO of generative AI startup Stability.ai, which produces tools such as the image generator Stable Diffusion, has said that no enterprises can trust closed-source programs such as GPT-4.
“Open models will be essential for private data,” said Mostaque during a small meeting of press and executives in May. “You need to know everything that’s inside it; these models are so powerful.”
La Malfa and team review the literature of the various language models, and identify how obfuscation prevents an audit of the programs along four critical factors: accessibility, replicability, comparability, and trustworthiness.
The authors note that these concerns are a new development in AI ethics: “These issues are specific to the LMaaS paradigm and distinct from preexisting concerns related to language models.”
Accessibility concerns the issue of keeping code secret, which disproportionately benefits huge companies with huge R&D budgets, the writers allege.
“With the computational power distributed unevenly and concentrated in a tiny number of companies,” they write, “those with a technological, yet not computational, advantage face a dilemma: While open-sourcing their LMaaS would benefit them in terms of market exposure and contribution to their codebase by the community, releasing the code that powers a model may rapidly burn their competitive advantage in favour of players with higher computational resources.”
In addition, the uniform pricing of the LMaaS programs means people in less developed economies are at a disadvantage in accessing the tools. “A starting point to mitigate these issues is thus analyzing the impact of LMaaS and, more generally, pay-per-usage artificial intelligence services as a standalone, pervasive, and disruptive technology,” they suggest.
Another issue is the increasing gap in how LLMs are trained: the commercial ones can re-use customer prompts and thereby set themselves apart from programs that use only public data, the authors observe.
Also: How does ChatGPT work?
LMaaS’ commercial licenses, they write, “grant companies the right to use prompts to provide, maintain, and improve their services,” so that there’s no common baseline of training data from which everyone draws.
They offer a chart (below) that assesses the disparity in whether language models gather customer prompts for training and “fine-tuning”, which is a stage that in some cases enhances a language model’s abilities, and whether they let users opt out.
After describing at length the various risks, La Malfa and team propose “a tentative agenda” to address the four areas, urging, “we need to work as a community to find solutions that enable researchers, policymakers, and members of the public to trust LMaaS.”
For one, they recommend that “companies should release the source code” of their LMaaS programs, if not to the general public, then “LMaaS should at least be available to auditors/evaluators/red teams with restrictions on sharing.”
Companies, they propose, should not totally do away with older language models as they roll out new ones. Or, at least, “all the parameters that make up a model should be hashed, and a log of ‘model commits’ should be offered by model maintainers to the user, as the maintainer updates the model.” And the field, including journals and conferences, should “discourage the usage of models” that don’t pursue such precautions.
For benchmarking, tools need to be developed to test what elements an LMaaS has digested of its prompts, so that the baseline can be set accurately.
Clearly, with LMaaS, the topic of AI ethics has entered a new phase, one in which critical information is kept under lock and key, making ethical choices a more difficult matter for everyone than they have been in past.