What is meaningful transparency?
A few quick ways to evaluate transparency efforts by social platforms
Ok, trying out Substack for the first time…here goes!
From the Congressional hearing on the Twitter Files to TikTok’s launch of the Transparency Center in Los Angeles to the first mandated transparency reports in Europe, transparency continues to be very top of mind for platforms and lawmakers these days.
However, one of the most frequent questions I often hear from lawmakers and journalists whenever they see the launch of some new platform transparency project is…“is this real transparency or not?"
In all honestly, it’s a complicated question and one that requires a lot more space than this post but I’m going to try and share some of the ways I think about that question.
Two quick thoughts before we dive in
Before I get into how I think about that, I think there are two important things to note.
First, the truth is that there are a lot of different types of transparency with a lot of different goals and oftentimes, there's simply a disconnect between what the goals of a platform are and what the expectations and hopes of the outside world are (and of course, it’s usually that the platform goals are just more narrow than the public’s hopes). From accountability transparency to adversarial transparency to collaborative transparency, what we need is much more clarity about the point of any effort. I'm going to try and write more on this soon as well.
Second, the other important thing is that what we ultimately need is a place where various types of existing transparency reports & projects can actually be studied and graded on an ongoing basis; where folks with technical expertise and background in the space can go deep into these projects and report back where they succeed, where they fail and how they need to evolve.
Thankfully, I know some smart people are thinking about this exact need (I'm also trying to help with it) and I'm hopeful we'll see something along these lines in the next 3-12 months enter the public sphere.
An incomplete list of a few heuristics to judge transparency
I tend to think about "meaningful transparency" as something along the lines of mechanisms for enabling both accountability and collaboration, as well as significantly contributing to the public's understanding of the design and impact of platforms. So, with that basic understanding, here are a few of the ways I tend to look at transparency initiatives out there in the industry and when something might not be real transparency:
How to evaluate self-curated “reports” - If a platform is releasing any sort of transparency report but they have decided entirely on its own what metrics to include, how to calculate those metrics, and the frequency with which to release reports, it’s probably not meaningful transparency.
Content moderation reports have become a near-industry standard in the last few years and I think they’ve actually incorporated a lot of external feedback; so I generally don’t consider those reports as ones where platforms are exclusively choosing all their own metrics. However, they still leave a lot to be desired. In general, they need more auditing and there are some glaring areas in which they are still inadequate (including where they haven’t responded to external feedback), which is partly why regulators are stepping in to require some of the metrics that the platforms aren't giving up voluntarily or don't want to commit the resources to maintain.
While some of these reports can be fascinating, letting a platform dictate entirely the terms of the reports means it’s entirely too easy to curate them in ways that are friendly for the narratives they’re trying to push and that’s something auditing can’t account for. This is generally where I put Meta’s “Widely Viewed Content Reports”. It’s an interesting report that I’m glad exists and I think they have led to some narrow but important impact (mostly internal) but given how limited the data is, the fact that the reports go out of their way to make the case for how non-representative the data is, the complete lack of even attempting to justify why they chose the metrics they did, the tiny sample size and more…it’s not the sort of meaningful transparency regulators should give any credence too.
These types of reports should all be audited and we should get to a place where that’s a baseline requirement for their seriousness but even that isn’t always sufficient to make the reports truly meaningful. Here is where Meta gets credit for being the first to have their reports audited but it’s also a low bar, unfortunately (and also a surprisingly short audit).
Who gets access to data - If a platform is releasing raw data or representative datasets (versus curated reports that are usually publicly available) to a specific audience, it should ensure that either (1) the audience is as independent as possible (including that they have no financial relationship or no mutually-aligned interests), or (2) they should release them as broadly as possible.
I think one of the obvious problems with the Twitter Files and something that came up repeatedly in the Congressional hearing on March 9th was it was not clear at all how Twitter chose the reporters they released the files to.
And once again, independent experts should have an ongoing ability to audit the pipelines to ensure they're set up correctly and in a way that matches the intent of the program.
The particular value of academic and researcher access and how to structure it - If you've followed any of my work over the last year, you'll know that I’ve come to believe in the necessary role of external academics & researchers being able to study platforms as a key piece of transparency (happy to go into why). So, for me, if researchers and academics can’t study sensitive datasets in privacy-protecting ways or collaborate on any sort of co-designed studies on your platform, you’re not being meaningfully transparent. You also have to get the specifics right, including (but not at all limited to):
If those researchers need pre-approval for their publications, you’re not being meaningfully transparent and this is where TikTok's latest researcher API completely fails.
There are a lot of structural inequities built into the academic system across the globe and any access mechanism should also be as conscious of those as possible and have built-in ways to ensure the data is fairly and responsibly accessible, especially to those communities that might need it the most.
To that point, if you are charging for access to those datasets, you’re not being meaningfully transparent, and where Twitter's new researcher API seems likely to fail.
Similarly, if you are threatening researchers, academics, or others who are conducting research in the public interest with lawsuits for scraping public data, you’re not being meaningfully transparent.
The importance of real-time access to important content and ways for the rest of civil society to see what’s happening - I also think real-time access to what’s happening with particularly important content is key (no surprise). Studying platforms by looking back at them 2-3 years later isn’t sufficient. So, if civil society, especially journalists, human rights activists, and election protection organizations, don’t have a way to easily monitor important public content in real-time, you’re not being meaningfully transparent.
One limitation of gated datasets that are only available to academics is that the field generally has a very slow turnaround time for its findings…which means, that platforms can avoid scrutiny and accountability, as well as any opportunity for meaningful collaboration, during particularly intense and important moments, like elections, natural disasters and more.
Transparency that is really just marketing - If a transparency initiative is almost entirely supporting a narrative that the platform wants, it’s probably not meaningful transparency. Similarly, if a transparency project is owned and managed by a comms or marketing team, it’s probably not real transparency.
Again, another issue with both the Twitter Files and the Widely Viewed Content Reports is the degree to which they very conveniently advanced a narrative that the company wanted out there. It’s not a definitive way to judge a project but it makes it very hard to trust projects where that’s the case (and in some cases, it’s actually an accurate sign that the project was in fact a marketing one).
I’d maybe even go a step further for people structuring these programs inside platforms, comms and marketing teams should actually have very little input at all on these projects.
Large gaps in data that don’t come with any explanation - This is pretty obvious but if your transparency programs don’t include data about all the major products and formats on your platform, you’re not being meaningfully transparent. And related, if a transparency initiative is missing key data without any reasonable explanation for why or any efforts to improve, it's hard to take seriously as meaningful transparency.
For instance, the current version of CrowdTangle doesn’t include reels, a product format that is quickly becoming one of, if not the single, most important format on the platform. You can’t point to CrowdTangle as a meaningful transparency tool (which Meta recently did in their official reports to the EU’s Code of Practice on Disinformation) if it doesn’t include the most important format on your platform.
Transparency as administrative data - If a platform is only releasing metrics designed entirely to reflect and shed light on its own internal processes and priorities, it's a very limited type of transparency.
Angela Xiao Wu and Harsh Taneja call this "administrative data" in a great paper they wrote about the limits of platform data. While I think administrative data can actually be very instructive and valuable for better understanding a company in itself, it has very specific and important limitations. Put another way, we shouldn't only care what platforms are doing and why...we also want the ability to audit and study the impact of those decisions (and come up with our own metrics).
Transparency without any impact - If your transparency programs don’t yield any actual product or policy changes or product any actual better understanding of your product or processes, you’re not being meaningfully transparent.
One version of transparency that platforms sometime talk about is algorithmic transparency but they’ll frame the mechanics of their algorithm in very simplified terms. So simplified, in fact, that it doesn’t yield any meaningful improvement in how the public or stakeholders understand the system. If nothing is changing, it’s not real transparency.
And to build off of the final point a bit…as someone who believes in transparency ultimately as a means towards simply building better platforms, if you want to evaluate the meaningfulness of any transparency initiative, I think one of the simplest questions you can ask of any project is simply “what impact has this led to?”.
(And by the way, if it’s just internal impact, that’s totally fine and can be incredibly important).
There are obviously a lot more types of transparency…for instance, I don’t have anything in here about user-level transparency, which is also incredibly important, as well as how to structure algorithmic transparency & advertising transparency to make them meaningful…as well as even more ways to evaluate these kinds of efforts. I’m going to try and cover those as well, as well as simplify the entire rubric…but in the meantime, I’d love any comments or feedback!
So excited to see you do this! Can’t wait to read it.
Love the real time access for popular content advocacy. another thing that might help with the slow turnaround from academia is appropriate access to the results of a/b platform tests that platforms do. What do you think of that?