‘Open Source’ Has a Definition, Let’s Get Serious About Defending It

The Llama large language model claims to be open source, but OpenInfra COO Mark Collier says it is anything but.

Apr 29th, 2024 10:00am by Mark Collier

Featued image for: ‘Open Source’ Has a Definition, Let’s Get Serious About Defending It

Feature image by Braydon Anderson on Unsplash.

AI might be the rare transformational epoch in tech that is being UNDER-hyped. The multi-trillion-dollar market reality might just surprise us all. So, it’s not surprising to see the world’s largest companies and governments develop strategies, products and workflows to capture a portion of this market.

But, somewhere in the overlapping Venn diagram of misguided AI regulations, undermined definitions of open source, and garden-variety megalomania, the global economy and the future of innovation is at risk.

Regulations can be shaped, and megalomania is incurable, so let’s focus on the third variable, the one triage demands we direct our immediate attention toward: the intentional or negligent collateral damage wreaked on the global open source community by players in the AI mosh pit.

Last week’s announcement of Llama 3 is only the latest open source threat in a string of obscenities hurtled at the term — which, we should all remember, has a definition that we all worked hard to create and defend. And while the Llama 3 LLM (and its predecessors) is highly impressive and deserves credit for achieving benchmarks that move practical AI forward, one inaccuracy has been reported repeatedly, and it needs empathic correction: The Meta license for Llama 3 is not open source.

To be fair, there’s ambiguity here, and the Open Source Initiative is leading an open, community-driven process to detail how the open source definition applies in the context of an AI world. But even without that process, we already know that Meta’s custom license, by restricting usage and the ability to create derivative works, violates multiple tenets of both the current definition of open source and any final work of the OSI community that’s specific to AI.

This element of the open source definition — the unimpeded permission to create and exploit derivative works — is central to what has made open source a powerful enabler of innovation globally. Any license that restricts this is, by definition, not open source. The below excerpt from the Llama 3 license illustrates this point:

b. Redistribution and Use.
i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service that uses any of them, including another AI model, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Meta Llama 3” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama 3” at the beginning of any such AI model name.

We can say with a high degree of confidence that the forthcoming definition of open source AI is unlikely to include model assets licensed under such commercially restrictive terms invoked by Meta and others, as they fundamentally restrict access and therefore freedoms.

Although the press have frequently created confusion by falsely calling Meta’s Llama series of models “open source,” a recent article was amended to clarify the situation.

Built on Open Source

It’s important that we recognize the strides that open source communities are already making to drive progress in support of AI workloads.

Nvidia relies on Kata Containers and Kubernetes for seamless migration of existing AI/ML workloads to confidential environments, while combining LLMs with GPU-accelerated computing. And a few weeks ago, the OpenStack community announced Caracal, the 29th release of the project, with features that include strengthened support for AI workloads, including live GPU migration.

This was exciting progress for organizations like StackHPC that are already supporting AI workloads with the U.K.’s largest supercomputer, Dawn. The company posted on LinkedIn, “With StackHPC’s focus on delivering HPC and AI for private cloud, we look forward to bringing the features and benefits of the OpenStack Caracal release to our community and customers.”

Open source has a vital role to play in the evolution of AI, and it’s already delivering results. With the threats open source is facing — from malignant relicensing and sleight-of-hand marketing phrases — a global community of developers and users who have come to rely upon the bedrock principles of open source are understandably nervous and a little angry.

As new audiences are exposed to the term for the first time, our global community has a critical role to play in educating them about open source. Namely that it has unleashed immeasurable creativity and market value, and that it has a very specific definition that must be defended if that value is to be preserved for the AI revolution.

Mark Collier is Chief Operating Officer of the Open Infrastructure Foundation, where he is helping to expand and evolve the Foundation’s mission, bringing the unique, open collaboration method established by the OpenStack community to many new open source projects across...