Today, Meta announced that CrowdTangle would finally be shut down later this year.
It’s bittersweet for a lot of reasons (especially seeing all the love from so many old friends today), but it’s also not unexpected. I’ve spent a lot of time over the last two years working on issues related to transparency & data access, including following the Meta Content Library, and I wanted to share a few thoughts.
The timing seems incredibly irresponsible.
Closing down CrowdTangle 12 weeks before the U.S. Presidential election is hard to defend, given how much more work there is to do in the MCL (see below) and what’s likely to happen this cycle.
It’s a near certainty that Trump and his allies are likely to try and interfere with the election with another wave of misleading information about voting processes, and the worst case is they’ll end up contesting the results (and maybe with violence all over again). Unless the MCL can get to real parity, CrowdTangle should have been left open until the election & a peaceful transfer of power are over.
And, of course, that’s to say nothing of all the other elections happening around the world this year.
Despite Meta’s claims, the Content Library isn’t close to being a replacement for CrowdTangle at the moment.
I’ve been following the development of the MCL for the last year or so, and I know how difficult the back-end work is for a project like this. I think the team behind it has done some incredible work, and in fact, what they’ve built has the potential to be much more powerful than CrowdTangle ever was. They’ve also made some early decisions that indicate they’re headed in the right direction around the types of organizations they are building for. That’s exciting.
But let’s be very clear: it’s not close to CrowdTangle yet.
There are some areas where the MCL has way more data than CrowdTangle ever had, including reach and comments in particular. Those are major improvements. But there are also some huge gaps in the tool, both for academics and civil society, and simply arguing that it has more data isn’t a claim that regulators or the press should take seriously.
One of the most important lessons I learned in 10 years of doing this work was that providing real transparency is rarely just about the amount of data you provide. It’s also about how useable the data is, who can use it, and the terms of use. Unfortunately, compared to CrowdTangle, there is a ton of missing functionality when it comes to actually getting any insights about the data, including being able to aggregate data at the account level or topic level, being able to export the data easily into the workflow of researchers through email notifications or browser extensions, being able to create live displays, having a way to benchmark the performance of individual posts to get ahead of viral stories, etc.
It’s also about how much support, training, and collaboration you offer to help understand the data. Real transparency of complex systems usually requires a sociotechnical effort. For instance, we had a team of over a dozen amazing partnership leads based all over the world, regularly working with partners to help them get the most out of CrowdTangle.
And, of course, it’s about who gets to use the Library.
CrowdTangle was available not just to academics, fact-checkers, and non-profit researchers but also to the broader news industry. It seems as if that door is being permanently shut to all those partners. Moreover, when I left two years ago, the CrowdTangle interface had tens of thousands of active users. Our CrowdTangle Chrome Extension had almost 100,000 registered users. On top of that, we regularly published Live Displays that were publicly available, some of which received millions of views. In fact, it was almost exactly three years ago to the day that Facebook announced the launch of over 150 CrowdTangle Live Displays as a part of Facebook’s official response to the beginning of the COVID-19 pandemic.
My understanding is that the MCL has a few hundred users, a very small partnerships team and limited ability to export any of the data outside of its clean room environment.
Of course, that doesn’t mean Meta and the MCL can’t eventually close these gaps and they’ve built what seems like a great foundation. They’re also early in their journey. But that’s also why making any claims of equivalency right now is premature and why shutting down CrowdTangle in five months makes me so worried about the U.S. election (and other elections happening this year).
The good news is that one of the most promising signs about the MCL has been how much they’ve been improving the program since it was first announced. That’s a great sign.
I first heard about the Meta Content Library in August of last summer, and even in that timeframe, it's gotten meaningfully better. For instance, in those months, they added comments to their system (one of the most popular academic requests we got for years and something not originally included), revised some overly-restrictive terms on how users could talk about the data they were seeing, they’ve started onboarding fact-checkers (a huge step forward), and perhaps, most importantly, they’ve begun to treat different types of public data with different privacy protections by letting users download high-profile public figure data (not all data carries the same risks).
If they continue to improve (especially when it comes to overall useability, particularly for civil society organizations), there’s a real path for the MCL to not only match CrowdTangle but eventually, maybe even surpass it.
However, if they can’t or don’t get there, we could end up looking back and realizing that this announcement was actually the beginning of the end for a lot of the outside world to meaningfully be able to monitor what’s happening on their platforms.
For election protection groups to monitor for voter suppression online.
For fact-checkers to meaningfully be able to track and respond to misinformation and disinformation.
For human rights groups to study war crimes and human rights violations online.
For local communities in the developing world have a say in platform policies and content moderation.
For news outlets to understand what narratives and stories are spreading online.
And Meta would have made that decision during a year in which more people are going to vote in elections than at any point in human history.
Let’s hope that’s not true. A lot of us will be paying close attention.
But more than anything, I want to highlight that there’s a much larger story here than just Meta’s announcement, and it’s a reason to be hopeful. In fact, it could represent a profound change in our ability to monitor what's happening on the public internet.
The reason that Meta is launching the Content Library and the reason that CrowdTangle wasn’t shut down years ago isn’t because of any voluntary decisions Meta made about what they were interested in providing. It’s because of a wave of new data access laws being considered and passed around the world. And more than anything, the single one that has had the most impact at the moment is Article 40.12 in Europe’s Digital Services Act (the DSA).
The DSA is a sweeping piece of internet regulation passed last summer in Europe, and one of the core pillars of the entire legislation is mandating more transparency. Within the law, Article 40.12 specifically requires platforms to provide real-time access to public data (and is sometimes referred to as the “CrowdTangle” provision).
While a lot of the DSA is still waiting to get implemented, Article 40.12 is law right now.
And that’s why Meta's announcement is just a small part of what’s happening right now.
Over the last 12 months, over a dozen of the largest digital platforms in the world have quietly launched programs that let researchers get access to public data in real-time, including:
TikTok
Alibaba
Snapchat
reddit
Alphabet (including YouTube)
Twitter/X
LinkedIn
And more
For most of those platforms, it represents the first time they’ve *ever* made that sort of data available.
You can see full list of all the new programs and how to apply for them here. Note: If you’re a researcher, go apply today!
I left Meta convinced that if we want real transparency from some of the largest tech platforms in the world, we couldn’t rely on voluntary efforts. We had to start regulating the sort of data access we wanted. Europe has managed to pass some of those regulations, and Article 40.12 is starting to have a real impact.
That’s why I think the bigger story here is that we might actually be on the cusp of having more real-time access to what’s happening in our biggest & most influential public spaces than we’ve had since the early days of the internet.
It’s why I’m increasingly optimistic that the real long-term legacy of CrowdTangle will end up being to help inspire a permanent set of regulations that make real-time access to public data a legal requirement and an ongoing part of how we manage the internet responsibly & collaboratively.
Of course, like any piece of regulation, there’s still a lot of work to do (and a lot of ways all of this could go south)…on the part of the platforms, civil society, funders, and regulators.
For instance, given how the DSA was written, platforms have far too much discretion to determine how their 40.12. programs are set up. The European Commission needs to write guiding principles for 40.12 that spell out some of the details on what’s expected, including things like whether access is granted on a project-by-project basis or at an organizational level (spoiler: it should be at the organizational level), detailing what reasonable rate limits are and more.
We also need mechanisms to make sure that platforms do what they say they’re going to do. We need ways to monitor the actual governance and access programs, including turnaround times, reasons for denials, etc. (for instance, is it reasonable for Meta to have a single U.S.-based research center in charge of vetting for the entire world?). We need ways to audit the actual data being provided to make sure it’s accurate and comprehensive (for instance, Article 40.12 requires access to all “publicly visible data,” but once again, Meta seems to have selectively chosen to exclude public fact-checking labels).
We also need more international standards that guide some of the technical nuances of these programs, in large part so that other countries can follow in the DSA’s footsteps around 40.12. We’ll also need philanthropic funding to start providing more resources for researchers to study this data, not to mention an entirely new infrastructure of open-source tools that can provide analytics and insights.
Needless to say, there’s a lot of work to do. But since the advent of social media, we’ve been talking about the role of transparency in helping build a better internet and even a better world. Civic leaders, elected officials, human rights advocates, trust & safety professionals, academics, whistleblowers, and even tech executives themselves have all talked endlessly about the importance of transparency.
But the truth is that platforms never provide the transparency we need.
That might finally be changing.
The reason Article 40.12 exists is because of the tireless work of a group of advocates around the world who have been dedicated to advancing data access and transparency.
Lastly, as proud as I am of all the work that we did at CrowdTangle (and I’m very proud!), I’ve been blown away by the small community of academics, civil society organizations, and regulators that have been working behind the scenes for years on how to design & pass laws that require more platform transparency and data sharing, including fighting for and advocating for Article 40.12. I’ve gotten to know some of them over the last two years and have been in awe of their thoughtfulness, knowledge, and sheer amount of work they’ve put into this issue.
While it’s impossible to list everyone, the group I’ve spent the most time with and have each had a huge impact on the space includes Rebekah Tromble (I’ve lost track of trying to count the hours she has put into this work), Anna Lenhart, Claire Pershan, Hilary Ross, and Laura Edelson.
But the community also includes so many others, including Naomi Shiffman, Mathias Vermeulen, Luca Nicotra, Mark Scott, Nate Persily, John Perrino, Renee DiResta, Jeff Allen, Sahar Massachi, Cameron Hickey, Fabio Giglietto, Ethan Zuckerman, Brandi Geurkink, J. Nathan Matias, Daphne Keller, Rose Jackson, Paddy Leersen, Maria Ressa, Julia Angwin, John Sands, Eli Sugarman, Jamie Neikrie, Brian Boland, Becca Ricks, Alex Stamos, Brendan Nyhan, Talia Stroud, Joshua Tucker, and there’s a ton more as well. I’ve learned so much from this group and I’m so thankful for all the work they’re doing in this space.
great writeup