The case for regulating generative AI

By THE INDEPENDENT UG

Research leads to advocating a regulatory model that is both adaptable and tailored to specific contexts

COMMENT | S. ALEX YANG & ANGELA HUYUE ZHANG | The impending rollout of the European Union’s Artificial Intelligence Act represents the bloc’s latest attempt to cement its status as a regulatory powerhouse. This ambitious legislation, which aims to impose stringent regulations on AI technologies, underscores the EU’s commitment to proactive governance.

Meanwhile, the United States has taken a very different path. Despite the sweeping executive order issued by President Joe Biden in October 2023, the country still lacks a cohesive AI regulatory framework. Instead, a surge of litigation has overwhelmed U.S. courts, with leading AI firms being sued for copyright infringement, data-privacy breaches, defamation, and discrimination.

Given that litigation is expensive and often drags on for years, the EU’s strategy may appear more forward-looking. But the common-law system might actually prove to be a more effective mechanism for tackling the myriad challenges posed by generative AI. This is particularly evident in copyright law, where a growing number of artists, publishers, and authors are embroiled in legal battles against AI giants like Microsoft, OpenAI, and Meta over the use of copyrighted material.

At the core of these disputes is the question of whether the training of large language models (LLMs) should qualify as fair use, a classification that would exempt tech firms from compensating content creators. For its part, the EU’s AI Act includes a provision mandating the disclosure of copyrighted materials, enabling copyright holders to opt out of AI training databases. The hope is that this transparency requirement will facilitate compensation negotiations between content creators and AI firms.

But the EU’s sweeping regulation could backfire if European regulators fail to strike an appropriate balance between innovation and equity in addressing the question of fair use. For starters, restricting the use of copyrighted materials for LLM training could raise data-acquisition costs, potentially impeding the growth of the AI industry. Microsoft, for example, has raised concerns that the requirement to compensate copyright holders might disproportionately affect small and medium-size firms, especially those with limited financial and legal resources.

At the same time, a growing number of commentators and policymakers have warned that without ensuring fair compensation for content creators, the creative sector – especially the beleaguered news industry – could collapse. These fears were exacerbated in June, when publishing giant Axel Springer announced plans to cut 200 jobs at the German tabloid Bild and replace some of these roles with AI. This trend has persisted over the past few months, reflecting a wave of journalist layoffs triggered by an advertising crunch.

The news industry’s crisis could have grave consequences beyond immediate job losses. The future development of AI technologies depends heavily on the availability of high-quality, human-generated content. As studies have shown, training AI models on AI-generated data could corrupt them, potentially to the point of complete failure.

To be sure, striking the right balance between these two conflicting policy priorities will not be easy. In a recent paper, we provide the first analytical exploration of the fair-use dilemma. We identify three critical factors that shape regulatory outcomes: the availability of data for AI training, the models’ quality, and the industry’s competitive dynamics.

For example, imagine a scenario in which data for AI training are abundant, particularly in emerging areas like text-to-video generation. Under these circumstances, regulation has little effect on the amount of data available to startups aiming to refine their LLMs. By adopting a more permissive approach to fair use, regulators could enable firms to improve the quality of their models, thereby boosting profits for both AI companies and content creators and enhancing overall consumer welfare.

But these dynamics can shift quickly when data for training AI models – particularly models that rely heavily on new content – are scarce. This is especially true for relatively mature technologies such as text generation, given that companies such as OpenAI depend on a continuous influx of news content to train and update their chatbots.

In such a scenario, permissive fair-use policies could weaken incentives to produce new content, thereby shrinking the pool of data available for AI training. This shortage would be particularly acute in highly competitive markets, characterised by high demand for fresh training data. Moreover, the growing sophistication of AI models could exacerbate the crisis of training data shortage by making creators overly reliant on AI for content generation.

These findings lead us to advocate a regulatory model that is both adaptable and tailored to specific contexts. The EU’s AI Act, whose broad mandate applies to all firms regardless of their specific industry sectors, combined with the pace of AI development and the competitive structure of the market, increases the likelihood of serious unintended consequences. Consequently, the common-law system, which is based on case-by-case adjudication, may turn out to be a more appropriate institutional framework for regulating AI.

*****

S. Alex Yang is Professor of Management Science and Operations at the London Business School. Angela Huyue Zhang, Associate Professor of Law at the University of Hong Kong, is author of the forthcoming` High Wire: How China Regulates Big Tech and Governs Its Economy’ (Oxford University Press, 2024).