Unlocking the Potential of Public Data In The Digital Services Act
What experts say about how to get Article 40.12 of the DSA right and my own recommendations
(One quick note to get started…this newsletter is occasionally going to get really into the weeds on internet policy and this is one of those times but I promise others will be much more light-hearted!)
On April 22 of last year, the European Union passed the Digital Services Act (the DSA). Along with the Digital Markets Act, it represents one of the most sweeping pieces of legislation passed by any country in the world to try and regulate large online platforms, including social media. One core component of the Digital Services Act is mandating more transparency and it approaches transparency through a variety of different mechanisms, including third-party audits to required reporting to publicly available risk assessments to a lot more.
Importantly, the DSA also lays the groundwork for more transparency through data sharing…one of the main themes you’ll find me regularly writing about.
Most of the data sharing requirements are spelled out in Article 40. Within that article, there is Article 40.12 that specifically focuses on the idea of public data. In some places, I’ve heard this referred to as the “CrowdTangle” provision because of some of the ways in which CrowdTangle demonstrated the value of making public data available to researchers and civil society and the hopes that all platforms might potentially have to provide something similar.
So, why write about this right now?
Public comments about how to get Article 40.12 right are all due to the European Commission by midnight tonight. I wanted to use this newsletter to talk about some of those submissions and also share the recommendations that I submitted, which are pasted in full below.
The truth is that there are a *lot* of details to get right in the implementation of Article 40, let alone the entire DSA (let alone the DMA as well). That’s why I was so glad to see so many incredible people and organizations submit detailed recommendations. In total, there are over 110+ submissions from a lot of leading experts and organizations from around the world. They cover everything from terms of service for researchers to how to structure “clean rooms” to best practices for ethical review of research proposals.
I used my submission to highlight the profound potential of public data to serve as a powerful tool for accountability, as well as collaboration and research, for large online platforms.
I also talked through a few specific things I think are critical to unlocking the potential of public data, including:
creating a vetting process that allows for tiered access to data and allows for a broad swath of civil society to participate (including journalists),
using the moment to protect scraping in the public interest, and
building out an intermediary body so that neither platforms or regulators have too much power in the proposed system.
What’s really encouraging though is that after I submitted my own letter and began reading some of the other submissions, it became clear that a lot of experts and organizations were recommending the exact same things.
For instance, below are just a few of the other submissions that hit on all the same ideas, including ensuring broad access to data (including journalists), making the case for an intermediary body and pointing out how essential public interest scraping is:
Julia Angwin and a lot of prominent journalists from around the world
Daphne Keller from Stanford Law on the importance of protecting scraping
And those are the ones I noticed and I have a lot more submissions to get through. I’m sure there are more. I hope the Commission takes this feedback seriously and does their best to work those recommendations into the final Delegated Act.
There are also some other submissions that hit similar topics but get into the weeds on other important parts of the implementation that I also highly recommend, including:
I’m also sure there are more and as I read through the rest of them, I’ll try and update this list.
Ok, here’s my full letter, which you can also see here.
DSA Call for Evidence on Article 40.12: Data Access for Researchers
Submitted by Brandon Silverman
May 30, 2023
Summary
The Digital Services Act has the potential to be one of the most important pieces of internet regulation ever passed. However, like all ambitious legislation, its ultimate success is going to come down to getting the details right. I’m writing in order to highlight the particularly profound opportunities around Article 40.12 to deliver real, meaningful transparency and to recommend a few specific principles that I think are critical to ensuring it lives up to its promise.
My Background
For some background, I was the CEO and Co-Founder of CrowdTangle, a social media transparency tool that the New York Times called "perhaps the most effective transparency tool in the history of social media.”
We were a private company started in 2011 but were acquired by Meta in late 2016. By the end of 2020, we were providing usable, real-time public social media data to a wide swath of civil society, including academics, researchers, journalists, non-profits, human rights activists, and more. We were proof that you both can find a middle ground between public data and privacy and that if you provide public data in useable, real-time tools, you can deliver genuinely meaningful transparency.
CrowdTangle is still around today but its future is very much up in the air. That’s why over the last two years, I’ve helped design and advocate for regulations that would require platforms to share more data with the outside world, including testifying in the U.S. Senate. It’s also why I believe Article 40.12 has the potential to be so impactful.
Real-Time Public Data Is A Powerful Tool For Accountability
Over the course of the 10 years that I led CrowdTangle, I had a front-row seat to the profound importance of privacy-safe social data to the public interest and that’s why I believe so much in the potential of Article 40.12 to help us build a better internet. My team saw countless examples of how civil society can use data to serve the public interest when they’re given the tools & data they need, including:
protecting the integrity of elections,
supporting and empowering a free press,
preventing real-world violence,
identifying foreign interference,
fighting global pandemics, and more.
I also witnessed over and over how public data can also lead to internal change at platforms. Whether it was recalibrating how news content was ranked, changing policies about harmful content, or assigning more resources to new risk areas, real-time public scrutiny (including by journalists) forced Meta to manage their products more responsibly.
While our work was a testament to what’s possible when you provide usable, real-time public data to civil society, the truth is that we were also just scratching the surface. We were often limited by internal incentives and politics of working from inside a platform, we frequently struggled with uncertain government regulations that sometimes pulled our work in opposite directions and most importantly, we only provided data for a small handful of platforms.
Article 40.12 has the potential to solve all the structural and practical limitations we faced in our work and to go even further. Combined with the rest of the transparency mechanisms in the DSA, Article 40.12 could help unlock a new era of public accountability for large platforms and enable a much more collaborative, & ultimately democratic, governance of our digital information ecosystems.
Getting The Details on Article 40.12 Right
When it comes to ensuring that Article 40.12 has the impact so many of us hope it does, there are a few key areas that are critical to get right, including:
(1) Providing tiered access to different types of data in order to ensure civil society & journalists can play a role in turning transparency into impact
Any comprehensive transparency regulation of large platforms should involve different types of data designed for different audiences operating under different governance structures and with different outcomes in mind. For instance, providing robust, non-public datasets to vetted researchers in responsible and privacy-protecting ways is critical; so are audits, anonymized reports, and transparency for users. However, what’s also critical is giving a broad, diverse swath of civil society, including journalists, the ability to monitor important and meaningful public content on large platforms in real-time.
The practical reality is that platforms, as well as the actors, behavior, and content that live on them, are all changing constantly. Meanwhile, we know that academic processes can be slow. We need oversight that empowers a diverse set of experts that can provide accountability & understanding that can move as quickly as the platforms, their users, and the industries they exist in. Narrowly targeted academic access isn’t enough. The goal should include empowering responsible, open-ended monitoring in the public interest from a diverse group of stakeholders across civil society (including journalists) and giving those entities as much flexibility and openness in their use of the data as possible in ways that still manage some of the risks.
Like other parts of the DSA, this also presents an opportunity for Europe to help establish best practices around best practices for research that could become industry norms. Lastly and perhaps most importantly, it’s also critical that platforms are not empowered to be the sole decision-makers in when & what data is provided but instead, as much as possible, a financially independent intermediary body is ultimately the final decision-maker.
(2) Creating a global definition for meaningfully public content
One of the challenges in building more comprehensive transparency regulation for digital platforms is the lack of industry standards and regulatory norms around what the definition of “public” is. While GDPR attempted to make progress on that definition through the concept of “manifestly made public”, there continues to be a challenge in putting that term into practice. The Digital Services Act could use Article 40.12 to play a powerful role in helping set global norms around defining manifestly public content in more detail and in ways that would ultimately allow for the study & research of important & meaningfully public data, including by civil society, while also protecting user privacy, including embracing the idea of contextual privacy. If the DSA was able to make progress on this, it would be a powerful and industry-changing step forward and one that would have global ramifications.
At CrowdTangle, while our methods were certainly imperfect at times, we used a variety of different considerations for ensuring that when content was included in our system, users had a reasonable understanding their content was public. Users’ posts did not get included merely because their accounts were not set as private and thus were technically available to the public. Instead, we considered factors including:
using minimum thresholds for account size (the threshold varied based on platform and account structure),
looking at the design affordance of different account types (including account types that were broadcast by design, as well as public forums),
giving special consideration to accounts that represented public figures (including elected officials, media outlets, etc.) and more.
There have been some efforts to explore similar mechanisms in other legislative contexts around the world, including in the Platform Accountability and Transparency Act, a bill that was introduced in the U.S. Senate; however, the DSA is in a particularly powerful position to be the standard bearer in setting the first industry standards in the space.
(3) Ensuring usability
When it comes to sharing data, there’s a big difference between what data is technically available and what is actually usable. In order to ensure Article 40.12 delivers the impact we all hope, including being as broadly accessible as possible, it should be designed in a way that focuses on making sure the data is actually usable. Among other things, that requires providing multiple modes of access, including real-time APIs, front-end interfaces that are searchable & customizable, and scraping (see below). All three are critical.
(4) Protecting scraping in the public interest
One critical component of accessibility to public data is creating reliable legal frameworks that protect responsible scraping in the public interest. There are academics, scholars, and journalists who have submitted recommendations that cover this topic more deeply, including Daphne Keller from Stanford University, Julia Angwin formerly of the Markup, and others. I hope those recommendations are taken very seriously and included in the final implementation. They are critical.
The Risks & Challenges
There's no shortage of challenges to getting any details around transparency and data sharing right, including balancing data sharing with privacy concerns, protecting trade secrets, preventing surveillance issues, and more. Those issues are real. However, we owe it to ourselves to find a way through the challenges in order to get the transparency we all say we want.
The Foundation of a New, Better Internet
The power of usable public data is that it has the potential to serve as a single policy lever that can inform and shape all of our policy interests. Without it, we risk being stuck in an endless cycle of spending years slowly crafting intricately-tuned solutions for narrowly defined problems in an industry that is defined by a complicated web of constantly evolving platforms. On the other hand, with true, meaningful transparency that is available to a wide swath of experts & civil society, we can give ourselves a long-term foundation to identify, understand and respond to real risks in time to do something about them. Article 40.12 has the potential to be that underlying foundation. But it needs to get the details right.
You:" Fan of transparency"
Post your NPI: I've asked my local Bar Association to post the NPI's of all of our county's lawyers. But what if AI could do that for us?😁😁😁😁😁
There's a SNL skit:"Retirement Home For Politicians That Lost Their Final Election"... I'd buy Life Insurance on High NPI thin skinned lawyers whose group would suffer the most suicides😁😁😁😁
*Narcissistic Personality Inventory