Who Gets to Build with AI? Rethinking Openness in the Age of Foundation Models
As conversations around AI development continue to evolve, the open-source movement sits squarely at the heart of some of the field’s most pressing ethical, technical, and geopolitical debates. For those working on data infrastructure or AI governance, the term “open-source AI” is becoming increasingly central and increasingly contested. In this piece, we aim to unpack the meaning of open-source AI, explore the camps that have formed, and reflect on what is at stake for those of us trying to build and deploy AI in a manner that’s safe, equitable, and useful.
What Is Open-Source AI?
At its core, open-source AI refers to AI systems released under open licenses that give users the freedom to use, study, modify, and share the model or software, and, ideally, to reproduce or improve upon it. For this to be meaningful in the AI context, three components need to be available:
The source code: Including the architecture and any supporting implementation.
The model weights: The learned parameters that make the model useful.
The training Data: Or at least transparent documentation of how the model was trained.
In theory, if any of these pieces are missing, we’re not really talking about full openness. In practice, however, many current AI releases only provide some of these pieces. It’s common to see model weights made public, but with the training data and methodology kept private. This halfway approach, sometimes criticized as openwashing, can create confusion. Are we examining a truly open-source model, or merely a publicly downloadable artifact?
The Open Source Initiative (OSI) stepped in to define what does and doesn’t count as open-source AI, drawing a clear line: if it’s not inspectable, reproducible, and modifiable end-to-end, it’s not truly open. Anything less might still be helpful, but it falls short of the transparency and collaborative potential that the term “open-source” implies.
The Core Tension: Risk or Safeguard?
At the heart of the open-source AI debate lies a bigger question: does openness make society safer, or does it increase risk?
For those who view open-source AI as a threat, the argument goes as follows: when powerful models are freely downloadable, they can be misused by anyone - spammers, propagandists, cybercriminals, or worse. Why publish a model that could write disinformation at scale or help a bad actor simulate a chemical attack?
The opposing view is that transparency is the best defense. Supporters of open AI argue that many of the risks already exist in internet knowledge, malicious actors, and weak digital infrastructure are not new. What’s new is the opportunity to democratize access to powerful tools, allowing researchers, civil society, and developers to identify flaws, develop countermeasures, and create safer, fairer, and more inclusive tools. In this framing, open-source AI is less like handing out blueprints for harm and more like building the public roads and bridges that society needs: Auditable, shared, and improvable by all.
Two Main Camps
Camp 1: Closed = Safer
This group comprises most of the major labs (OpenAI, Google DeepMind, Anthropic) and a number of national security experts. Their view is that advanced AI is a dual-use technology, similar to nuclear research or bioengineering. They argue that open-sourcing powerful models too early could let anyone, even those without specialized knowledge, cause significant harm.
OpenAI, for example, initially committed to openness but now publishes far less content than it once did. This more cautious stance has gained traction across the field, from Anthropic’s warnings about AI-generated bioweapons to former Google CEO Eric Schmidt’s calls for export controls on AI models. Underlying these positions is a growing concern that transparency, rather than preventing harm, may actually accelerateerate misuse..
In response, these organizations typically favor managed access - offering their models via APIs, gated licenses, or safety filters that restrict use to approved contexts. This helps them monitor deployments, enforce safeguards, and intervene when necessary. It also, not incidentally, aligns with their commercial interests.
Camp 2: Open = Safer
On the other side are open-source communities like LAION, AI startups like Mistral, and major researchers like Meta’s Yann LeCun. They argue that openness breeds trust, improves safety, and decentralizes power.
This camp often draws inspiration from the broader open-source software movement. The Transformer architecture, PyTorch, and even early versions of GPT were openly published, spurring a flood of innovation. In this view, AI advances more rapidly and becomes safer when more people can inspect, improve, and adapt it.
For example, Meta’s LLaMA 2 and Mistral’s 7B models were released with permissive licenses, and the response was immediate: within weeks, the community had developed safety filters, plugins, fine-tunes, and other adaptations. These were improvements that no single company, however well-funded, could have accomplished alone.
And it’s not just about technical gains. Open AI models also allow customization for local languages, cultures, and needs—something closed models can’t do effectively. Advocates view this as a means to challenge monopolies, expand access, and foster a more equitable AI future.
What’s at Stake
The debate over open-source AI isn’t just about technology. It’s about governance, trust, equity, and security.
Trust and Transparency: If people don’t understand how AI systems are trained or what data went into them, it’s hard to build trust. Open models offer a pathway to transparency and accountability.
Governance: Closed models consolidate control in a handful of companies. Open models invite collaborative oversight, which may be essential for democratic governance of powerful technologies
Innovation: Openness accelerates progress. Instead of duplicating work behind closed doors, open-source fosters a shared foundation for experimentation and invention.
Equity: Who gets access to AI? Just the well-funded firms in Silicon Valley or researchers, nonprofits, and startups around the world? Openness lowers the barrier to entry and levels the playing field.
Security: There’s no easy answer here. Open models could be misused, but they also enable more people to build defenses, audit vulnerabilities, and reduce reliance on opaque systems.
The Gray Area: Openness Is a Spectrum
It’s tempting to frame open-source AI as a binary (open or closed), but the reality is more complicated. Most models fall somewhere in between.
Meta’s LLaMA 2 is open in code and weights, but the training data is withheld, and the license includes some usage restrictions.
Mistral 7B is more permissive, but also doesn’t include the training dataset.
Even Stability AI’s Stable Diffusion model, which helped trigger the open-source image model boom, wasn’t fully open. It used datasets scraped from the web, raising copyright and privacy questions.
To further clarify these distinctions, a recent EU Data Protection report outlines key differences in privacy risks based on model openness:
Closed models & closed weights: Often minimal external transparency. Users rely entirely on the provider’s privacy safeguards, making it difficult to independently verify compliance with data protection regulations.
Open models & open weights: Risk of personal data exposure and security breaches if training data contains sensitive or harmful content. Partial access may prevent full scrutiny of model training data and privacy vulnerabilities.
Open source: Open-source models share the same privacy risks as open models and open weight models. While open source fosters transparency and innovation, it also increases risks, as modifications may introduce security vulnerabilities or remove built-in safety measures.
These gray zones matter. For developers, understanding the terms of use and what’s open is essential. For policymakers, they highlight that new regulatory and ethical frameworks need to keep pace, not only with the technology, but with how it’s shared. For business owners and non-profit directors alike, these nuances shape critical decisions about what tools to adopt, how much control they have over their AI systems, and what long-term dependencies they’re accepting. A model that appears “open” on the surface may still come with licensing constraints, limited documentation, or restrictions on its use.
Our Take, And Why This Matters
Foundation LLMs are powerful. They provide organizations with quick access to advanced language capabilities and enable teams to accelerate the development of tools for tasks such as summarization, classification, or information retrieval. For many, these large pre-trained models offer a fast on-ramp to experimenting with generative AI.
But smaller open-source models (often referred to as Small Language Models, or SLMs) offer a different kind of opportunity. They allow organizations to move beyond consumption and begin shaping AI to reflect their own needs. This includes fine-tuning for specific tasks, but also for local languages, cultural contexts, or sector-specific domains. Open-source models unlock a layer of ownership that closed systems rarely allow.
This flexibility is especially important for small businesses, non-profit organizations, and governments alike. For teams operating with limited resources or outside major tech hubs, being able to start from a well-built, downloadable model, rather than relying on access to a proprietary API, opens up a practical and strategic path forward. It enables organizations to experiment in secure environments, adapt models to their specific context, and enhance their understanding of how these systems function.
And increasingly, these decisions have geopolitical weight. As Aubra Anthony argues in her recent Lawfare article, On the Path to AI Sovereignty, AI Agency Offers a Shortcut, countries around the world (particularly those in the Global South) are not just chasing sovereign AI for the sake of national pride or technological supremacy. They’re trying to avoid being locked into closed, foreign-controlled systems that don’t serve their languages, cultures, or priorities. For many, the more realistic path to autonomy lies in building AI agency. This means developing the internal capacity to shape, adapt, and govern AI tools without needing to own the full stack of compute infrastructure.
This is where open models can play a powerful role. They enable local adaptation without requiring a team or country to replicate Silicon Valley’s infrastructure from the ground up. They lower the barrier to participation while increasing transparency and auditability. And they create space for culturally relevant innovation, something that’s especially important when the dominant commercial models are trained almost entirely on Western internet data.
In our view, open-source models are not just useful; they are enabling infrastructure. They support everything from experimentation to localization, from internal security reviews to long-term cost control. And they open the door for organizations to shape AI in ways that align with their mission, rather than the incentives of a commercial vendor.
If the core question is Who gets to build with AI?, then open-source models help ensure the answer isn’t limited to a handful of well-resourced companies or countries. They provide a foundation for meaningful participation, both technically and socially. Whether you're a small organization refining a model on your own documents or a national government trying to preserve linguistic diversity, the ability to download, inspect, and adapt a model matters.
Final Reflections
The tensions here, between risk and access, control and collaboration, aren’t going away. But what we know is this: Open-source AI is a governance choice as much as a technical one. It shapes who gets to innovate, who benefits, and who gets to participate in steering the most powerful general-purpose technology of our time.
As a team working on applied technology and data systems, we’ve learned that complete transparency and collaboration come with tradeoffs, but also with enormous potential. If we want AI to be safe, fair, and widely applicable, we can’t just focus on what to build. We also need to ask: who gets to build with it?
And that’s where openness makes all the difference.