AI Archives - Creative Commons https://creativecommons.org/tag/ai/ Wed, 28 May 2025 14:01:13 +0000 en-US hourly 1 https://wordpress.org/?v=6.3.5 Understanding CC Licenses and AI Training: A Legal Primer https://creativecommons.org/2025/05/15/understanding-cc-licenses-and-ai-training-a-legal-primer/?utm_source=rss&utm_medium=rss&utm_campaign=understanding-cc-licenses-and-ai-training-a-legal-primer Thu, 15 May 2025 17:51:13 +0000 https://creativecommons.org/?p=76580 Whether you are a creator, researcher, or anyone licensing your work with a CC license, you might be wondering how it can be used to train AI. Many AI developers, who wish to comply with the CC license terms, are also seeking guidance.  The application of copyright law to AI training is complex. The CC…

The post Understanding CC Licenses and AI Training: A Legal Primer appeared first on Creative Commons.

]]>
Whether you are a creator, researcher, or anyone licensing your work with a CC license, you might be wondering how it can be used to train AI. Many AI developers, who wish to comply with the CC license terms, are also seeking guidance. 

The application of copyright law to AI training is complex. The CC licenses are copyright licenses, so it follows that applying CC licenses to AI training is just as complex. 

The short answer is: AI training is often permitted by copyright. This means that the CC license conditions have limited application to machine reuse. This also means that using a more restrictive CC license in an effort to prevent AI training is not an effective approach. In fact, restrictive licensing may actually end up preventing the kind of sharing you want (like allowing for translation, for example), while not being effective to block AI training. 

For the long answer, read our new guide that provides a legal analysis and overview of the considerations when using CC-licensed works for AI training. 

👉  For an at-a-glance overview, head over to the Using CC-Licensed Works for AI training webpage

👉  For a more in-depth analysis, check out our handy PDF download

👉 For those who love a visual, take a look at our supplementary flowchart

If the CC licenses have limited application to machine reuse, what agency do creators have in the AI ecosystem? 

This is an important question. As you’ve heard us talk about before, we’re actively developing a CC preference signals framework to help bridge this gap. The framework is designed to offer new choices for stewards of large collections of content to signal their preferences when sharing their works, using scaffolding inspired by the architecture of the CC licenses. This is not mediated through copyright or the CC licenses. It is governed by something that tends to be even more widely adopted: a social contract. Stand by for the release of the paper prototype of CC preference signals framework at the end of June 2025. 

While you are here, please consider making an annual recurring donation via our Open Infrastructure Circle. This work will require a large amount of resourcing, over many years, to make happen. 

The post Understanding CC Licenses and AI Training: A Legal Primer appeared first on Creative Commons.

]]>
CC @ SXSW: Protecting the Commons in the Age of AI https://creativecommons.org/2025/04/09/cc-sxsw-protecting-the-commons-in-the-age-of-ai/?utm_source=rss&utm_medium=rss&utm_campaign=cc-sxsw-protecting-the-commons-in-the-age-of-ai Wed, 09 Apr 2025 15:18:38 +0000 https://creativecommons.org/?p=76386 SXSW by Creative Commons is licensed under CC BY 4.0 If you’ve been following along on the blog this year, you’ll know that we’ve been thinking a lot about the future of open, particularly in this age of AI. With our 2025-2028 strategy to guide us, we’ve been louder about a renewed call for reciprocity…

The post CC @ SXSW: Protecting the Commons in the Age of AI appeared first on Creative Commons.

]]>
SXSW by Creative Commons is licensed under CC BY 4.0

If you’ve been following along on the blog this year, you’ll know that we’ve been thinking a lot about the future of open, particularly in this age of AI. With our 2025-2028 strategy to guide us, we’ve been louder about a renewed call for reciprocity to defend and protect the commons as well as the importance of openness in AI and open licensing to avoid an enclosure of the commons. 

Last month, we took some of these conversations on the road and hosted the Open House for an Open Future during SXSW in Austin, TX, as part of a weekend-long Wiki Haus event with our friends at the Wikimedia Foundation. 

During the event, we spoke with Audrey Tang and Cory Doctorow about the future of open, especially as we look towards CC’s 25th anniversary in 2026.  In this wide-ranging conversation, a number of themes were reflected that capture both where we’ve been over the last 25 years and where we should be focusing for the next 25 years, including: 

  • The Fight for Technological Self-Determination: Contractual restrictions are increasingly being used to lock down essential technologies, from printer ink to hospital ventilators. The push for openness and economic fairness must go beyond just content-sharing and extend to fighting for the rights of people to repair, modify, and use technology freely.
  • Shifting from Resistance to Building Alternatives: The open movement is not just about opposing corporate restrictions but also about creating viable, open alternatives. Initiatives like Gov Zero show that fostering decentralized, user-controlled platforms can help counteract monopolistic digital ecosystems.
  • The Power of Exit as a Lever for Change: Simply having the option to leave restrictive platforms can influence corporate behavior. Efforts like Free Our Feeds and Bluesky aim to create credible exit strategies that prevent users from being locked into exploitative digital environments.
  • Beyond Copyright: New Frameworks for Openness and Innovation: While Creative Commons began as a response to copyright limitations, the next phase should focus on broader issues like supporting an infrastructure for open sharing, ethical AI development, and open governance models that empower communities rather than just limiting corporate control.
  • Reclaiming the Ethos of Open Source and Free Software: The movement must reconnect with its ethical roots, focusing on freedom to create, share, and innovate—not just openness for the sake of efficiency. This includes resisting corporate capture of “openness” and ensuring technological advances serve public interest rather than private profit.

Since the proliferation of mainstream AI, we’ve been analyzing the limitations of copyright (and, by extension, the CC licenses since they are built atop copyright law) as the right lens to think about guardrails for AI training. This means we need new tools and approaches in this age of AI that complement open licensing, while also advancing the AI ecosystem toward the public interest. Preference signals are based on the idea that creators and dataset holders should be active participants in deciding how and/or if their content is used for AI training. Our friends at Bluesky, for example, have recently put forth a proposal on User Intents for Data Reuse, which is well worth a read to conceptualize how a preference signals approach could be considered on a social media platform. We’ve also been actively participating in the IETF’s AI Preferences Working Group, since submitting a position paper on the subject mid-2024 .

SXSW by Creative Commons is licensed under CC BY 4.0

As CC gets closer to launching a protocol based on prosocial preference signals—a simple pact between those stewarding the data and those reusing it for generative AI training—we had the opportunity during SXSW to chat with some great thought leaders about this very topic. Our panelists were Aubra Anthony, Senior Fellow, Technology and International Affairs Program at Carnegie Endowment for International Peace; Zachary J. McDowell, Phd, Assistant Professor, Department of Communication, University of Illinois at Chicago; Lane Becker, President, Wikimedia LLC at Wikimedia Foundation, and our very own Anna Tumadóttir, CEO, Creative Commons to explore sharing in the age of AI.  A few key takeaways from this conversation included: 

  • Balancing Norms and Legal Frameworks: There is a growing interest in developing normative approaches and civil structures that go beyond traditional legal frameworks to ensure equitable use and transparency.
  • Navigating AI Traffic and Commercial Use: Wikimedia is adapting to the influx of AI-driven bot traffic and exploring how to differentiate between commercial and non-commercial use. The idea of treating commercial traffic differently and finding ways to fundraise off bot traffic is becoming more prominent, raising important questions about sustainability in an open knowledge ecosystem. From CC’s perspective, we’ve found that as our open infrastructures mature they become increasingly taken for granted, a notion that is not conducive to a sustainable open ecosystem.
  • Openness in the Age of AI: There is growing reticence around openness, with creators becoming more cautious about sharing content due to the rise of generative AI (note, this is exactly what our preference signals framework is meant to address, so stay tuned!). We should emphasize the need for open initiatives to adapt to the broader social and economic context, balancing openness with creators’ concerns about protection and sustainability.
  • Making Participation Easy and Understandable: To encourage widespread participation in open knowledge systems and for preference signal adoption, tools will need to be simple and intuitive. Whether through collective benefit models or platform cooperativism, ease of use and clarity are essential to engaging the broader public in contributing to open initiatives.

Did you know that many social justice and public good organizations are unable to participate in influential and culture-making events like SXSW due to a lack of funding? CC is a nonprofit organization and all of our activities must be cost-recovery. We’d like to sincerely thank our event sponsor, the John S. and James L. Knight Foundation for making this event and these conversations possible. If you would like to contribute to our work, consider joining the Open Infrastructure Circle which will help to fund a framework that makes reciprocity actionable when shared knowledge is used to train generative AI.

The post CC @ SXSW: Protecting the Commons in the Age of AI appeared first on Creative Commons.

]]>
From Strategy to Action: Focus Areas for 2025 https://creativecommons.org/2025/03/03/from-strategy-to-action-focus-areas-for-2025/?utm_source=rss&utm_medium=rss&utm_campaign=from-strategy-to-action-focus-areas-for-2025 Mon, 03 Mar 2025 18:24:20 +0000 https://creativecommons.org/?p=75883 Astronomical Clock by olemartin is licensed under CC BY-NC-SA 2.0. The team here at Creative Commons was delighted to publicly release our new organizational strategy on January 22, after almost a year of intensive team, community, and board consultations. For the next several years, our focus will be to: Strengthen the open infrastructure of sharing…

The post From Strategy to Action: Focus Areas for 2025 appeared first on Creative Commons.

]]>
Astronomical clock
Astronomical Clock by olemartin is licensed under CC BY-NC-SA 2.0.

The team here at Creative Commons was delighted to publicly release our new organizational strategy on January 22, after almost a year of intensive team, community, and board consultations. For the next several years, our focus will be to:

  • Strengthen the open infrastructure of sharing
  • Defend and advocate for a thriving creative commons
  • Center community

These goals are high level, as they tend to be when packaged up as part of a multi-year strategy. These goals should also feel familiar, for an organization whose mission it is to empower individuals and communities around the world through technical, legal, and policy solutions that enable the sharing of education, culture, and science in the public interest. But there are important nuances included in these goals and subsequent short-, medium-, and long-term objectives that point to intentional and meaningful shifts in the ways we operate to meet this moment. 

Of course the legal layer of the open infrastructure—the CC licenses and legal tools themselves—must be strengthened. But also, new sharing frameworks must be explored for changing times. 

Of course we must ensure the ongoing survival of the commons. But strategies need to evolve from solely being a sensible argument around opening up access to information. We know that greater access facilitates advances in education, in the scientific arena, and in our ability to understand and appreciate the diversity of cultural heritage that exists. However, those who previously saw the obvious benefits to sharing may now be hesitant, uncertain about how their works will be used or contextualized, through advances in Artificial Intelligence (AI) and machine learning. 

Finally, one might think that centering community goes without saying, but actually, it doesn’t. As an organization that has only achieved what it has because of a strong community of advocates bringing their expertise and passion to bear, we know we cannot continue to impact the social norms and legal frameworks of sharing without full participation.

So what does all of this mean for our work today, and throughout this year? Since we are currently operating in the age of AI, where all content also functions as data, we are focusing our work in two key areas:

  1. Data governance, shaped by legal and norms-based infrastructure to facilitate sharing.
  2. Sustaining open licensing in the age of AI, as high value contributions to the commons at scale that must be sustained through reciprocity.

This focus is guided by CC’s core principle: ideas and facts should not be commodified. As we reimagine sharing in the age of AI, we also draw on our history which reminds us to resist the reflex to expand copyright. Instead, we believe developing new norms, as part of a healthy data governance framework that prioritizes sharing in the age of AI, is the best approach to meeting our mission.  

Data Governance

Our friends at Open Future define data governance as “how rules for data use are created and enforced. This includes laws, standards, and social norms that guide what people can and can’t do with data. Good governance ensures fair and responsible data sharing.”

CC plays a unique role within data governance across the open internet. The CC licenses provide a form of legal and social norms guidance that has facilitated sharing on the internet for the last 25 years. We think of CC’s role within data governance as providing critical infrastructure that enables community-driven, fair, and responsible data sharing. The challenge is that what is considered fair and responsible data sharing is not static; it evolves based on context. And while this has always been true, AI has brought issues of fairness, transparency, trust, accountability, and more to the forefront for CC and for our many collaborators and colleagues who are committed to human-centered approaches to data governance. 

In 2025, we need to continue to explain how the CC licenses interact with AI training, and champion preference signals as a way to advance the data governance we need to meet this moment. You’ve heard from us on this subject in the past, and there is much more to come as we find partners to pilot this work with in the coming months. Policy and legal environments will also continue to play a significant role in both driving and influencing the data governance landscape of the future. CC’s role in advocating for balanced copyright and policies that drive access to knowledge, especially as new legislation, particularly around AI, is passed and implemented, is instrumental in representing civil society and advocating on behalf of the public interest.

Sustaining Open Licensing in the Age of AI

The use of the CC licenses has resulted in billions of items being released openly. Today, these items have also become parts of AI training sets—this is a significant shift that is influencing the norms around open licensing. Our priority is increasing sustainable sharing and access, but we now must consider “what about AI?”. We believe that openly licensed collections of content, which act as high-value contributions to the commons, must continue to be prioritized. 

However, many creators (artists, researchers, educators, and everyone in between) are understandably concerned about their contributions to the commons being reduced to small pieces of data within huge datasets where they lose agency over how their works are being used. We believe that the antidote to this is reciprocity. We believe it is time for the open movement to ask for something in return when there is disproportionate benefit from use of open datasets. We aim to do this by developing relationships with AI model builders on behalf of those who contribute to the commons, ensuring that training datasets remain collectively owned, sustain the commons, and that data governance principles are respected.

We need more open educational, cultural, scientific, and research data to allow more rapid scientific discovery and collaboration. Sharing must continue in the age of AI and we are committed to supporting open licensing at scale, taking the context of AI into consideration. 

There are new and layered complexities in the open sharing world, and we’re excited and determined to help clarify and address these challenges. We’d like to see open sharing grow as a collective strategy  to advance the public interest. In 2025 (and beyond, I’m sure), we will be finding ways to facilitate agency for the movement and facilitating even more sharing and access, while ensuring that the commons remain resilient and sustainable.

If you’d like to support this work, consider joining the Creative Commons Open Infrastructure Circle. Our most dedicated supporters ensure that every day we can show up and do the valuable work of preserving and growing the global commons of knowledge and culture from which we all benefit.

The post From Strategy to Action: Focus Areas for 2025 appeared first on Creative Commons.

]]>
The AI Action Summit & Civil Society’s (Possible) Impact https://creativecommons.org/2025/02/18/the-ai-action-summit-civil-societys-possible-impact/?utm_source=rss&utm_medium=rss&utm_campaign=the-ai-action-summit-civil-societys-possible-impact Tue, 18 Feb 2025 18:51:45 +0000 https://creativecommons.org/?p=75852 The Conciergerie, Paris by Mustang Joe is marked with CC0 1.0. On February 10 and 11, 2025, the government of France convened the AI Action Summit, bringing together heads of state, tech leaders, and civil society to discuss global collaboration and action on AI. The event was co-chaired by French President Macron and Indian Prime…

The post The AI Action Summit & Civil Society’s (Possible) Impact appeared first on Creative Commons.

]]>
The Conciergerie, Paris
The Conciergerie, Paris by Mustang Joe is marked with CC0 1.0.

On February 10 and 11, 2025, the government of France convened the AI Action Summit, bringing together heads of state, tech leaders, and civil society to discuss global collaboration and action on AI. The event was co-chaired by French President Macron and Indian Prime Minister Modi. This was the third such Summit in just over a year, the first two in the UK and South Korea respectively. The next one is to be hosted in India, with a firm date not yet set.

Creative Commons was invited to be an official participant in the Summit, and given room to speak on a panel about international AI governance. Given our continued advocacy for public interest AI, and on-the-ground work, particularly in the US and EU, to interrogate new governance structures for data sharing, open infrastructures, and data commons, the Summit was an important venue to contribute to the global conversation.

We focused on three things in our panel and direct conversations:

  1. Civil society matters, and must continue to be included. While we may not hold the pen on drafting declarations, or be in the negotiating room with world leaders and their ample security teams, we must continue to (loudly) bring our perspectives to these spaces. If we aren’t there, then nobody is. Without civil society, there can be no public interest. 
  2. The importance of openness in AI. What it means, who benefits from it, and how we think critically about ongoing (dis)incentives to participate in the open knowledge ecosystem.
  3. Local solutions for local contexts, local content, and local needs.

Civil Society Matters

Civil society matters because we represent real concerns from real people. A people-centered approach to AI must inevitably be a planet-centered approach as well, one simply cannot and should not exist without the other.

Included in the civil society contingent at the Summit were also major philanthropic foundations who have long focused on public interest technology. Encouragingly (we hope) they have joined forces with private investment and governments to launch Current AI, a coalition which is advocating ‘global collaboration and local action, building a future where open, trustworthy technology serves the public interest’. The Summit also saw the launch of ROOST (Robust Open Online Safety Tools), which was born out of a conversation at a prior Summit around the absence of reliable, robust, high-quality open source tooling for trust and safety. ROOST adds a critical building block to the open source AI ecosystem as tools to allow anyone to run safety checks on datasets before use and training should (hopefully) result in safer model performance.

But philanthropy is not a business model for something that is set to become ubiquitous public infrastructure at a greater level than is already the case with the internet currently. The investments of philanthropy alone will not be enough to steer the public interest conversation to the top of the action agenda. There must be matching political will and public investment, and we’ll be watching closely for evidence that actions are following words.

Our view is that governments should prioritize investment in publicly accessible AI, which meets open standards and allows for equitable access. These are key drivers of innovation and every sector stands to benefit. Governments can lead the way on investing in compute, (re)training people, and preparing and encouraging high quality openly licensed datasets, to level the playing field for researchers, innovators, open source developers, and beyond.

Openness in AI

Openness in AI continues to be a broad and multifaceted topic: how do we continue to foster open sharing, making it resilient, safe and trustworthy while we’re hearing from our community some examples of creators and organizations choosing more restrictive licenses now, or hesitating to share at all in an attempt to regain agency over how their content is used as training data. Our future depends on protecting the progress of the last 20 years of open practices. The answer does not lie in a misguided shift from CC BY to CC BY-NC-ND. We have to think more holistically.

The CC licenses alone are not a governance framework in and of themselves, but what they represent are absolutely critical components of legal and social norms that support data governance that can serve the public interest.

In the context of data governance, we see our role in helping negotiate preferences for reuse of datasets containing openly licensed works. We need to ensure that folks are still incentivized to participate and contribute to the commons, while feeling their voices are heard and their work is contributing in mutually-beneficial ways. If you are the steward of a large open dataset, we want to hear from you.

Local Solutions for Local Contexts

From CC’s perspective, local solutions for local contexts are where we need to put our energy. As Janet Haven from Data & Society frames it, let’s focus on collaboration for AI governance, rather than striving for a single, global governance structure. One size does not fit all, and even issues that are global needs, like planetary survival, will require very different efforts by country or region. It was rather encouraging to hear examples of “small” language models from across the world, that emphasize language preservation and cultural context. Efforts to record, catalog, and digitize language and cultural artifacts are underway. This is yet another area where we see a need to systematically articulate and clearly signal preferences for reuse, so that local efforts thrive and are respected appropriately.

Where We Go From Here

We heard from many fellow civil society organizations that the tone in France differed markedly from previous Summits in the UK or South Korea. There was a welcome diversity of civil society voices on panels and in workshops, with a steady drumbeat of calls for safe, sustainable, and trustworthy AI. “Open source” and “public interest” were phrases uttered in many major interventions. But aside from us collectively being able to fill a few volumes on how we define these terms anyway (sustainable for who?) the real impact of the Summit will be seen in the ways in which we collaborate from now on.

The political discussions at the Summit focused heavily on the false dichotomy of regulation versus innovation – and yes, the language used heavily fed into the narrative that those are mutually exclusive. Much emphasis on the desire for regional investment (and superiority), while offering global collaboration, was mildly disheartening but also fully expected. Political statements around public interest were repeated but vague. Canadian Prime Minister Trudeau, who emphatically urged everyone to not forget the people, stating that “the benefits must accrue to everyone”. Whether those in power will pay attention to that message is anyone’s guess. Take, for example, The Paris Charter on Artificial Intelligence in the Public Interest, which says all of the right things but lacks in terms of both widespread endorsement and meaningful steps towards implementation.

We are clear-eyed on the fact that AI is here, has been for quite some time, and will not go away. We need collaborative, pragmatic approaches to steer towards what we see as beneficial outcomes and public interest values. While there were glimmers of hope from some who hold legislative and executive power, it’s clear that civil society has a lot of advocacy work ahead of us.

The Summit culminated in countries signing onto a declaration, with notable omissions from the United States and UK. As always, it is once the media cycle moves on where we will see any lasting impact. In the meantime, let’s not wait for another global Summit to take action.

The post The AI Action Summit & Civil Society’s (Possible) Impact appeared first on Creative Commons.

]]>
Why Digital Public Goods, including AI, Should Depend on Open Data https://creativecommons.org/2025/01/27/why-digital-public-goods-including-ai-should-depend-on-open-data/?utm_source=rss&utm_medium=rss&utm_campaign=why-digital-public-goods-including-ai-should-depend-on-open-data Mon, 27 Jan 2025 17:34:43 +0000 https://creativecommons.org/?p=75806 Acknowledging that some data should not be shared (for moral, ethical and/or privacy reasons) and some cannot be shared (for legal or other reasons), Creative Commons (CC) thinks there is value in incentivizing the creation, sharing, and use of open data to advance knowledge production. As open communities continue to imagine, design, and build digital…

The post Why Digital Public Goods, including AI, Should Depend on Open Data appeared first on Creative Commons.

]]>
Acknowledging that some data should not be shared (for moral, ethical and/or privacy reasons) and some cannot be shared (for legal or other reasons), Creative Commons (CC) thinks there is value in incentivizing the creation, sharing, and use of open data to advance knowledge production. As open communities continue to imagine, design, and build digital public goods and public infrastructure services for education, science, and culture, these goods and services – whenever possible and appropriate – should produce, share, and/or build upon open data.

Open Data by Auregann is licensed under CC BY-SA 3.0.

Open Data and Digital Public Goods (DPGs)

CC is a member of the Digital Public Goods Alliance (DPGA) and CC’s legal tools have been recognized as digital public goods (DPGs). DPGs are “open-source software, open standards, open data, open AI systems, and open content collections that adhere to privacy and other applicable best practices, do no harm, and are of high relevance for attainment of the United Nations 2030 Sustainable Development Goals (SDGs).” If we want to solve the world’s greatest challenges, governments and other funders will need to invest in, develop, openly license, share, and use DPGs.

Open data is important to DPGs because data is a key driver of economic vitality with demonstrated potential to serve the public good. In the public sector, data informs policy making and public services delivery by helping to channel scarce resources to those most in need; providing the means to hold governments accountable and foster social innovation. In short, data has the potential to improve people’s lives. When data is closed or otherwise unavailable, the public does not accrue these benefits.

CC was recently part of a DPGA sub-committee working to preserve the integrity of open data as part of the DPG Standard. This important update to the DPG Standard was introduced to ensure only open datasets and content collections with open licenses are eligible for recognition as DPGs. This new requirement means open data sets and content collections must meet the following criteria to be recognised as a digital public good.

  1. Comprehensive Open Licensing:
    1. The entire data set/content collection must be under an acceptable open licence. Mixed-licensed collections will no longer be accepted.
  2. Accessible and Discoverable:
    1. All data sets and content collection DPGs must be openly licensed and easily accessible from a distinct, single location, such as a unique URL.
  3. Permitted Access Restrictions:
    1. Certain access restrictions – such as logins, registrations, API keys, and throttling – are permitted as long as they do not discriminate against users or restrict usage based on geography or any other factors.

The DPGA writes: “This new requirement is designed to increase trust and confidence in all DPGs by ensuring that users can fully engage with solutions without concerns over intellectual property infringement. Simplifying access and usage aligns with the DPGA’s goal of making DPGs truly open and accessible for widespread adoption… it helps foster an environment and ecosystem where innovation can thrive without legal uncertainties.”

AI and Open Data

As CC examines AI and its potential to be a public good that helps solve global challenges, we believe open data will play a similarly important role.

CC recognizes AI is a rapidly developing space, and we appreciate everyone’s diligent work to create definitions, recommendations, and guidance for and warnings about AI. After two years of community consultation, the Open Source Initiative released version 1.0 of the Open Source AI Definition (OSAID) on October 28, 2024. This definition is an important step in starting the conversation about what open means for AI systems. However, the OSAID’s data sharing requirements remain contentious, particularly around whether and how training data for AI models should be shared.

CC is of the opinion that just because it is difficult to build and release open datasets, that does not mean we should not encourage it. In cases where training data should not or cannot be shared, we encourage detailed summaries that explain the contents of the dataset and give instructions for reproducibility, but nonetheless that data should be defined as closed. When data can be made open and shared, it should be.

We agree with Liv Marte Nordhaug, CEO, Digital Public Goods Alliance who said in a recent post: “With regards to AI systems, there is a need to ensure that we don’t inadvertently undermine the open data movement and open data as a category of DPGs by advancing an approach to AI systems that is more permissive than for other categories of DPGs. Maintaining a high bar on training data could potentially result in fewer AI systems meeting the DPG Standard criteria. However, SDG relevance, platform independence, and do-no-harm by design are features that set DPGs apart from other open source solutions—and for those reasons, the inclusion of [AI] training data is needed.”

Next Steps

CC will continue to work with the DPGA, and other partners, as it develops a standard as to what qualifies an AI model to be a digital public good. In that arena we will advocate for open datasets, and consideration of a tiered approach, so that components of an AI model can be considered digital public goods, without the entire model needing to have every component openly shared. Updated recommendations and guidelines that recognize the value of fully open AI systems that use and share open datasets will be an important part of ensuring AI serves the public good.


¹Digital Public Goods Standard
²Data for Better Lives. World Bank (2021). CC BY 3.0 IGO

The post Why Digital Public Goods, including AI, Should Depend on Open Data appeared first on Creative Commons.

]]>
Six Insights on Preference Signals for AI Training https://creativecommons.org/2024/08/23/six-insights-on-preference-signals-for-ai-training/?utm_source=rss&utm_medium=rss&utm_campaign=six-insights-on-preference-signals-for-ai-training Fri, 23 Aug 2024 14:49:02 +0000 https://creativecommons.org/?p=75346 “Eagle Traffic Signals – 1970s” by RS 1990 is licensed via CC BY-NC-SA 2.0.. At the intersection of rapid advancements in generative AI and our ongoing strategy refresh, we’ve been deeply engaged in researching, analyzing, and fostering conversations about AI and value alignment. Our goal is to ensure that our legal and technical infrastructure remains…

The post Six Insights on Preference Signals for AI Training appeared first on Creative Commons.

]]>
Eagle Traffic Signals – 1970s” by RS 1990 is licensed via CC BY-NC-SA 2.0..

At the intersection of rapid advancements in generative AI and our ongoing strategy refresh, we’ve been deeply engaged in researching, analyzing, and fostering conversations about AI and value alignment. Our goal is to ensure that our legal and technical infrastructure remains robust and suitable in this rapidly evolving landscape.

In these uncertain times, one thing is clear: there is an urgent need to develop new, nuanced approaches to digital sharing. This is Creative Commons’ speciality and we’re ready to take on this challenge by exploring a possible intervention in the AI space: preference signals. 

Understanding Preference Signals

We’ve previously discussed preference signals, but let’s revisit this concept. Preference signals would empower creators to indicate the terms by which their work can or cannot be used for AI training. Preference signals would represent a range of creator preferences, all rooted in the shared values that inspired the Creative Commons (CC) licenses. At the moment, preference signals are not meant to be  legally enforceable. Instead, they aim to define a new vocabulary and establish new norms for sharing and reuse in the world of generative AI.

For instance, a preference signal might be “Don’t train,” “Train, but disclose that you trained on my content,” or even “Train, only if using renewable energy sources.”

Why Do We Need New Tools for Expressing Creator Preferences?

Empowering creators to be able to signal how they wish their content to be used to train generative AI models is crucial for several reasons:

  • The use of openly available content within generative AI models may not necessarily be consistent with creators’ intention in openly sharing, especially when that sharing took place before the public launch and proliferation of generative AI. 
  • With generative AI, unanticipated uses of creator content are happening at scale, by a handful of powerful commercial players concentrated in a very small part of the world.
  • Copyright is likely not the right framework for defining the rules of this newly formed ecosystem. As the CC licenses exist within the framework of copyright, they are also not the correct tools to prevent or limit uses of content to train generative AI. We also believe that a binary opt-in or opt-out system of contributing content to AI models is not nuanced enough to represent the spectrum of choice a creator may wish to exercise.  

We’re in the research phase of exploring what a system of preference signals could look like and over the next several months, we’ll be hosting more roundtables and workshops to discuss and get feedback from a range of stakeholders. In June, we took a big step forward by organizing our most focused and dedicated conversation about preference signals in New York City, hosted by the Engelberg Center at NYU.

Six Highlights from Our NYC Workshop on Preference Signals

  • Creative Commons as a Movement

Creative Commons is a global movement, making us uniquely positioned to tackle what sharing means in the context of generative AI. We understand the importance of stewarding the commons and the balance between human creation and public sharing. 

  • Defining a New Social Contract

Designing tools for sharing in an AI-driven era involves collectively defining a new social contract for the digital commons. This process is essential for maintaining a healthy and collaborative community. Just as the CC licenses gave options for creators beyond no rights reserved and all rights reserved, preference signals have the potential to define a spectrum of sharing preferences in the context of AI that goes beyond the binary options of opt-in or opt-out. 

  • Communicating Values and Consent

Should preference signals communicate individual values and principles such as equity and fairness? Adding content to the commons with a CC license is an act of communicating values;  should preference signals do the same? Workshop participants emphasized the need for mechanisms that support informed consent by both the creator and user.

  • Supporting Creators and Strengthening the Commons

The most obvious and prevalent use case for preference signals is to limit use of content within generative AI models to protect artists and creators. There is also the paradox that users may want to benefit from more relaxed creator preferences than they are willing to grant to other users when it comes to their content. We believe that preference signals that meet the sector-specific needs of creators and users, as well as social and community-driven norms that continue to strengthen the commons, are not mutually exclusive. 

  • Tagging AI-Generated vs. Human-Created Content

While tags for AI-generated content are becoming common, what about tags for human-created content? The general goal of preference signals should be to foster the commons and encourage more human creativity and sharing.  For many, discussions about AI are inherently discussions about labor issues and a risk of exploitation. At this time, the law has no concept of “lovingly human”,  since humanness has been taken for granted until now. Is “lovingly human” the new “non-commercial”? Generative AI models also force us to consider what it means to be a creator, especially as most digital creative tools will soon be driven by AI. Is there a specific set of activities that need to be protected in the process of creating and sharing? How do we address human and generative AI collaboration inputs and outputs? 

  • Prioritizing AI for the Public Good

We must ensure that AI benefits everyone. Increased public investment and participatory governance of AI are vital. Large commercial entities should provide a public benefit in exchange for using creator content for training purposes. We cannot rely on commercial players to set forth industry norms that influence the future of the open commons. 

Next Steps

Moving forward, our success will depend on expanded and representative community consultations. Over the coming months, we will:

  • Continue to convene our community members globally to gather input in this rapidly developing area;
  • Continue to consult with legal and technical experts to consider feasible approaches;
  • Actively engage with the interconnected initiatives of other civil society organizations whose priorities are aligned with ours;
  • Define the use cases for which a preference signals framework would be most effective;
  • Prototype openly and transparently, seeking feedback and input along the way to shape what the framework could look like;
  • Build and strengthen the partnerships best suited to help us carry this work forward.

These high-level steps are just the beginning. Our hope is to be piloting a framework within the next year. Watch this space as we explore and share more details and plans. We’re grateful to Morrison Foerster for providing support for the workshop in New York.

Join us by supporting this ongoing work

You have the power to make a difference in a way that suits you best. By donating to CC, you are not only helping us continue our vital work, but you also benefit from tax-deductible contributions. Making your gift is simple – just click here. Thank you for your support.

The post Six Insights on Preference Signals for AI Training appeared first on Creative Commons.

]]>
Questions for Consideration on AI & the Commons https://creativecommons.org/2024/07/24/preferencesignals/?utm_source=rss&utm_medium=rss&utm_campaign=preferencesignals Wed, 24 Jul 2024 16:24:08 +0000 https://creativecommons.org/?p=75311 “Eight eyes. Engraving after C. Le Brun” by Charles Le Brun is licensed via CC0. The intersection of AI, copyright, creativity, and the commons has been a focal point of conversations within our community for the past couple of years. We’ve hosted intimate roundtables, organized workshops at conferences, and run public events, digging into the…

The post Questions for Consideration on AI & the Commons appeared first on Creative Commons.

]]>
Eight eyes. Engraving after C. Le Brun” by Charles Le Brun is licensed via CC0.

The intersection of AI, copyright, creativity, and the commons has been a focal point of conversations within our community for the past couple of years. We’ve hosted intimate roundtables, organized workshops at conferences, and run public events, digging into the challenging topics of credit, consent, compensation, transparency, and beyond. All the while, we’ve been asking ourselves:  what can we do to foster a vibrant and healthy commons in the face of rapid technological development? And how can we ensure that creators and knowledge-producing communities still have agency?

History and Evolution

When Creative Commons was founded over 20 years ago, sharing on the internet was broken. With the introduction of the CC licenses, the commons flourished. Licenses that enabled open sharing were perfectly aligned with the ideals of giving creators a choice over how their works were used.

Those who embrace openly sharing their work have a myriad of motivations for doing so. Most could not have anticipated how their works might one day be used by machines: to solve complex medical questions, to create other-wordly pictures of dogs, to train facial recognition systems – the list goes on.

Can we continue to foster a vibrant and healthy commons in today’s technological environment? How can we think innovatively about creator choice in this context?

Preference Signals

Preference signals for AI are the idea that an agent (creator, rightsholder, entity of some kind) is able to signal their preference with regards to how their work is used to train AI models. Last year, we started thinking more about this concept, as did many in the responsible tech ecosystem. But to date the dialog is still fairly binary, offering only all-or-nothing choices, with no imagination for how creators or communities might want their work to be used.

Enabling Commons-Based Participation in Generative AI

What was once a world of creators making art and researchers furthering knowledge, has the risk of being reduced to a world of rightsholders owning, controlling, and commercializing data. In this bleak future, it’s no longer a photo album, a poetry book, or a family blog. It’s content, it’s data, and eventually, it’s tokens.

We recognize that there is a perceived tension between openness and creator choice. Namely, if we  give creators choice over how to manage their works in the face of generative AI, we may run the risk of shrinking the commons. To potentially overcome, or at least better understand the effect of generative AI on the commons, we believe  that finding a way for creators to indicate “no, unless…” would be positive for the commons. Our consultations over the course of the last two years have confirmed that:

  • Folks want more choice over how their work is used.
  • If they have no choice, they might not share their work at all (under a CC license or strict copyright).

If these views are as wide ranging as we perceive, we feel it is imperative that we explore an intervention, and bring far more nuance into how this ecosystem works.

Generative AI is here to stay, and we’d like to do what we can to ensure it benefits the public interest. We are well-positioned with the experience, expertise, and tools to investigate the potential of preference signals.

Our starting point is to identify what types of preference signals might be useful. How do these vary or overlap in the cultural heritage, journalism, research, and education sectors? How do needs vary by region? We’ll also explore exactly how we might structure a preference signal framework so it’s useful and respected, asking, too: does it have to be legally enforceable, or is the power of social norms enough?

Research matters. It takes time, effort, and most importantly, people. We’ll need help as we do this. We’re seeking support from funders to move this work forward. We also look forward to continuing to engage our community in this process. More to come soon.

The post Questions for Consideration on AI & the Commons appeared first on Creative Commons.

]]>
Recap & Recording: “Open Culture in the Age of AI: Concerns, Hopes and Opportunities” https://creativecommons.org/2024/06/05/recap-recording-open-culture-in-the-age-of-ai-concerns-hopes-and-opportunities/?utm_source=rss&utm_medium=rss&utm_campaign=recap-recording-open-culture-in-the-age-of-ai-concerns-hopes-and-opportunities Wed, 05 Jun 2024 17:19:59 +0000 https://creativecommons.org/?p=75193 In May, CC’s Open Culture Program hosted a new webinar in our Open Culture Live series titled “Open Culture in the Age of AI: Concerns, Hopes and Opportunities.” In this blog post we share key takeaways and a link to the recording.

The post Recap & Recording: “Open Culture in the Age of AI: Concerns, Hopes and Opportunities” appeared first on Creative Commons.

]]>
In May, CC’s Open Culture Program hosted a new webinar in our Open Culture Live series titled “Open Culture in the Age of AI: Concerns, Hopes and Opportunities.” In this blog post we share key takeaways and a link to the recording.

With CC considering new ways to engage with generative AI, we are excited to share highlights from the conversation that demonstrate some of the complex considerations regarding open sharing, cultural heritage, and contemporary creativity.

Suzanne Duncan, Chief Operating Officer at Te Hiku Media, New Zealand, said that her organization was born out of the Māori rights movement. It is collecting an archive of Māori language samples on its own platform to maintain data sovereignty. Te Hiku Media is now working to use AI tools to teach the language to heritage language reclaimers. Suzanne recommended that the best way to ensure diverse representation in AI outputs is to have communities involved in the building and testing of AI models, ideally by communities, for communities.

Minne Atairu, interdisciplinary artist and doctoral student in the Art and Art Education program at Teachers College, Columbia University, USA, shared examples of her works using the Benin Bronzes, artworks from Nigeria stolen by the British in the 19th century, and the changes that happened in the visual representation of art after the looting took place. Using images of the stolen items, she used models to explore visuals and materials and convert text to 3D models. Minne hopes that better ways of attribution and compensation can be re-envisioned, and that the wealth generated by AI and other technologies should be spread among creators, not just tech executives.

Bartolomeo Meletti, Head of Knowledge Exchange at CREATe, University of Glasgow, Scotland, spoke about copyright law and copyright exceptions in the UK, EU and US, focusing on what one can do with AI and copyrighted works without permission from the copyright owner, especially for purposes of research and education. He works to create guidance about how to navigate those permissions with generative AI in mind.

Michael Trizna, Data Scientist at the Smithsonian Institution, has explored how generative AI can help to speed up processes like providing “alt text” (text descriptions of visual materials) to images, without compromising the accuracy of the audio or visual description of works. He has also worked on an AI values statement, including labeling AI generated content as such and mechanisms for the audience to provide feedback. Mike raised concerns about the fact that only a few large cultural heritage institutions are resourced to engage with generative AI responsibly.

Overall, panelists conveyed a need for greater AI literacy to enable people to interrogate AI and ensure it can be used for good.

Watch the recording here.

CC is a non-profit that relies on contributions to sustain our work. Support CC in our efforts to promote better sharing at creativecommons.org/donate.

 

What is Open Culture Live?

In this series, we tackle some of the more complex challenges that face the open culture movement, bringing in speakers with personal and professional expertise on the topic.

The post Recap & Recording: “Open Culture in the Age of AI: Concerns, Hopes and Opportunities” appeared first on Creative Commons.

]]>
Webinar: Open Culture in the Age of AI: Concerns, Hopes and Opportunities https://creativecommons.org/2024/04/29/webinar-open-culture-in-the-age-of-ai-concerns-hopes-and-opportunities/?utm_source=rss&utm_medium=rss&utm_campaign=webinar-open-culture-in-the-age-of-ai-concerns-hopes-and-opportunities Mon, 29 Apr 2024 19:06:25 +0000 https://creativecommons.org/?p=75082 On Wednesday, 8 May 2024, at 2:00 pm UTC, CC’s Open Culture Program will be hosting a new webinar in our Open Culture Live series titled “Open Culture in the Age of AI: Concerns, Hopes and Opportunities.”

The post Webinar: Open Culture in the Age of AI: Concerns, Hopes and Opportunities appeared first on Creative Commons.

]]>
Black and grey image of speckled orbs in the background. In the foreground the text reads
An Original Theory or New Hypothesis of the Universe. Plate XXXI. ”. By Thomas Wright. Public Domain

On Wednesday, 8 May  2024, at 2:00 pm UTC, CC’s Open Culture Program will be hosting a new webinar in our Open Culture Live series titled “Open Culture in the Age of AI: Concerns, Hopes and Opportunities.”

At CC, we promote better sharing and open access to cultural heritage to help build and sustain vibrant and thriving societies. With generative AI entering the scene, what are some of the issues to consider to ensure institutions make the most of this new technology and avoid its pitfalls as they fulfill their missions? In this panel we will discuss some of the opportunities and risks that come along with embracing generative AI in cultural heritage institutions, and some ideas for engaging in this new technology for the benefit of institutions, creators, as well as curious visitors and learners.

Firstly, looking inwards,  what are some of the ways in which cultural heritage might implement the use of AI to automate and improve labor-intensive processes as well as explore and enrich their data?

Secondly, looking outwards, when it comes to sharing their cultural heritage collections and related data online, potential use as AI training data is on the minds of many institutions. On the one hand, collections can offer important and useful training data for beneficial projects. Indeed, more diverse inputs to training datasets could aid in countering bias and ensuring outputs are more representative. On the other hand, especially in the age of AI, sharing collections needs to be done responsibly, respectfully and ethically, and institutions must remain guided by their public service missions. With generative AI here to stay, how can these considerations be adequately balanced? How can cultural heritage institutions play a role in contributing to the development of responsible AI?

We will be joined by a panel of experts including:

  • Suzanne Duncan, Chief Operating Officer at Te Hiku Media, New Zealand
  • Minne Atairu, interdisciplinary Artist, and doctoral student in the Art and Art Education program at Teachers College, Columbia University, USA
  • Bart Meletti, Head of Knowledge Exchange at CREATe, University of Glasgow, Scotland
  • Michael Trizna, Data Scientist, Smithsonian Institution, USA

Register here. 

CC is a non-profit that relies on contributions to sustain our work. Support CC in our efforts to promote better sharing at creativecommons.org/donate.

What is Open Culture Live?

In this series, we tackle some of the more   challenges that face the open culture movement, bringing in speakers with personal and professional expertise on the topic.

The post Webinar: Open Culture in the Age of AI: Concerns, Hopes and Opportunities appeared first on Creative Commons.

]]>
Exploring a Books Data Commons for AI Training https://creativecommons.org/2024/04/08/exploring-a-books-data-commons-for-ai-training/?utm_source=rss&utm_medium=rss&utm_campaign=exploring-a-books-data-commons-for-ai-training Mon, 08 Apr 2024 15:00:35 +0000 https://creativecommons.org/?p=74919 What role do books play in training AI models, and how might digitized books be made widely accessible for the purposes of training AI? What dataset of books could be constructed and under what circumstances? A new paper investigates the concept of a responsibly designed, broadly accessible dataset of digitized books to be used in training AI models.

The post Exploring a Books Data Commons for AI Training appeared first on Creative Commons.

]]>
A colorful illustration of a set of books

Our work on copyright has long focused on supporting libraries and archives in the service of their missions to preserve and ensure access to culture. Our 2022 copyright reform agenda centers those sorts of institutions (and more generally GLAMs) and the critical role they play in society. Among other things, that agenda calls attention to the ways in which copyright might impede libraries and archives who wish to make their collections available for research uses, including use for AI training in order to fulfill their public interest missions.

That issue – AI training – has become ever more relevant. The concept of mass digitization of books, including to support text and data mining, of which AI training is a subset, is not new. But AI training is newly of the zeitgeist, and its transformative use makes questions about how we digitize, preserve, and make accessible knowledge and cultural heritage salient in a distinct way.

In 2023, multiple news publications reported on the availability and use of a dataset of books called “Books3” to train large language models (LLMs), a form of generative AI tool.  The Books3 dataset contains text from over 170,000 books, which are a mix of in-copyright and out-of-copyright works. It is believed to have been originally sourced from a website that was not authorized to distribute all of the works therein. In lawsuits brought against OpenAI, Microsoft, Meta, and Bloomberg related to their LLMs, the use of Books3 as training data was specifically cited. 

The Books3 controversy highlights a critical question at the heart of generative AI: what role do books play in training AI models, and how might digitized books be made widely accessible for the purposes of training AI for the public good? What dataset of books could be constructed and under what circumstances? 

Earlier this year, we collaborated with Open Future and Proteus Strategies on a series of workshops to explore these questions and more. We brought together practitioners on the front lines of building next-generation AI models, as well as legal and policy scholars with expertise in the copyright and licensing challenges surrounding digitized books. Our goal was also to bridge the perspective of stewards of content repositories, like libraries, with that of AI developers. A “books data commons” needs to be both responsibly managed, and useful for developers of AI models. Today, we’re releasing a paper based on those workshops and additional research. 

While this paper does not prescribe a particular path forward, we do think it’s important to move beyond the status quo. Today, large swaths of knowledge contained in books are effectively locked up and inaccessible to most everyone. Large companies have huge advantages when it comes to access to books for AI training (and access to data in general). At the same time, as the paper highlights, there are already relevant examples of nonprofit and library-led efforts to provide responsible, fair access to books for many more people, not just the privileged few. We hope this paper can support further research, collaboration and investment in this space.

Read the full paper

The post Exploring a Books Data Commons for AI Training appeared first on Creative Commons.

]]>