Twitter's Commitment to Transparency: One Step Forward, 12 Million Steps Backwards
How Twitter Dramatically Changed Its Approach to Transparency In One Week But Not In The Way You Think
The last week of March was a pivotal one in the history of social media transparency.
For the first time ever, one of the largest social media platforms in the world posted its entire ranking algorithm online for the world to see. In some ways, this could be considered one of the most radical transparency efforts in internet history, setting a new standard for the rest of the industry.
Unfortunately, that’s not what actually happened.
In reality, the week of March 27th represented a dramatic step backward in Twitter's overall commitment to transparency and marked a seminal moment in how open our digital information ecosystems are. It was the week that Twitter went from being one of the most historically open platforms to now being one of the most closed.
In this post, I'm going to explain exactly what happened and why it’s so important.
Twitter Posts Its Algorithm Online (Kind Of)
In a rare moment of appearing to fulfill one of his promises, Elon posted the Twitter ranking algorithm to two different GitHub repositories (here and here). And in a sign of his overall enthusiasm for the gesture, he immediately jumped onto a Twitter Space to discuss the release himself.
Despite some misleading viral Twitter threads about how much one can learn from this code dump (which I won’t link to for now), the unfortunate truth is that there's much less to be learned from this sort of transparency effort than one might think and that slowly became clear over the course of the next few days as experts dug into the code release.
To put it very simply, the basic challenge is that there are critical parts of a ranking system like Twitter’s that simply weren't included in this release and could never be. The most significant missing components are related to the actual models that perform the ranking (versus a glimpse into some of the code that trains the models like the engagement formula), as well as the data itself. For a lot of experts, it was always obvious that some of the most important parts of ranking could never made be public (for legitimate reasons) and the release only proved that out.
In some ways, it’s like having a few steps of a recipe but not having all the steps (or even the most important ones) and moreover, not even knowing what ingredients are being used: it's helpful, but a far cry from giving you the ability to assess how safe the meal is to eat.
If you’re looking to really get into the weeds, one of the clearer explanations I read was from Arvind Narayanan, and I encourage you just go straight to his article and read it directly. It's excellent. You should also read Solomon Messing, a former data scientist at Twitter and meta, who also has a great write-up.
We Still Learned Some Interesting Things
That being said, there are absolutely some things that were learned from the code release.
For one, it does tell us some interesting and new things about how the engagement ranking works, and once again, Arvind does a great job of examining what we can learn from that.
Second, there were some intriguing (and occasionally embarrassing) nuggets once you delved into the details. Jeff Allen found a fascinating one in one of the associated readme files (which Twitter eventually removed). We also learned about the four types of Twitter users: Democrats, Republicans, power users, and Elon Musk. And honestly, people are still sifting through the code, and I expect more discoveries in the coming weeks.
The release also highlighted one of my favorite but often overlooked benefits of transparency: educating the bosses.
Once the code went live, Elon seemed shocked to learn about some of its details (especially the hard coding of his own metrics tracking). Honestly, I witnessed that same dynamic play out at Facebook when senior leaders learned about their own systems through public reporting on them (after which, teams would often finally get the resources needed to fix them).
So, the release of the code was undoubtedly a step forward, and Twitter should be applauded for doing it. Thank you, Twitter.
However, The Release Didn't Live Up to the Hype
Posting portions of an algorithmic model online doesn't provide the external world with answers to some of the most important questions about social media and the areas where we need the most accountability.
For instance, questions like how fairly Twitter is enforcing its rules (questions Elon liked to ask a lot before buying the platform)? How much content they're removing that they shouldn't? How toxic their entire ecosystem is? To what degree are vulnerable communities are being disproportionately impacted? What role does Twitter play in elections?
For example, the code release doesn't reveal any information about the list of 35 or so VIPs who are supposedly getting extra boosts in ranking, as reported by Platformer.
As Karissa Bell at Engadget points out, the code release didn't even live up to Elon's own expectations. He said in a TED Talk last April that he would open-source the algorithm to ensure "anyone can see that action has been taken, so there's no sort of behind-the-scenes manipulation, either algorithmically or manually."
But there’s no way to study that with this release.
The Most Important Transparency News Wasn't the Code Release
The truth is that when it comes to studying the most pressing questions about Twitter, we know one of the most effective and proven mechanisms is allowing independent experts to study and audit platforms. That’s how you can truly assess the impact of the entire system and not just the potential implications of a fraction of the code.
And that’s where we get to the transparency announcement that mattered the most.
On March 29th (two days before posting its code fragments on GitHub), Twitter announced its new pricing plans for external developers and researchers to access Twitter's APIs.
Researchers have used these datasets for nearly a decade to study Twitter, which historically made it the most open and transparent social network in the world. Rumors of potential changes circulated, but no one knew the details other than Twitter would charge more for access.
When the details were finally announced, it was the worst-case scenario for the research community.
Instead of continuing to provide free access to data for academics (a practice in place for over 10 years thanks to Twitter's commitment to transparency and research), Twitter terminated all academic plans. Now, researchers must pay huge fees to study Twitter, with starting plans at $42,000 a month and reaching up to $210,000 a month. And worse yet, even the priciest plan offers only a fraction of the data previously available for free to academics.
For all intents and purposes, this marks the end of meaningful external research on Twitter. Universities and non-profits simply can’t afford those numbers.
Although the research community has relied too heavily on Twitter data over the years, it remains vital for understanding one of the most important public spaces on the internet (for as long as that remains true) and now it’s been ended. To give a sense of the impact, Chris Stokel-Walker reported at Wired that since 2020, over 17,500 academic papers have cited Twitter data. To learn more about the impact of this decision, I highly encourage you to read this open letter from leading academics and researchers in social media who are going to be directly impacted.
(It’s also worth noting that this announcement comes on top of them deprecating their Twitter Media Research Consortium and their ML Ethics, Transparency and Accountability teams, both of which were doing fantastic work helping researchers study Twitter under the leadership of thoughtful and committed technologists like Rumman Chowdury, Yoel Roth, Hilary Ross and others.)
Studying Twitter Was Often About Studying the Open Internet More Broadly
Unfortunately, the other tragedy of this decision is how much it limits our ability to understand the *rest* of the internet. At its best, Twitter functioned as a chatroom for the internet, and being able to study Twitter meant also having a window into the rest of the open internet. And it was one of the few places you could do that.
In the end, the API announcement on March 29th represented a dramatic, industry-altering step backward for social media research and internet research more broadly. Ultimately, it was a decision that was far more important than Twitter sharing some limited fragments of its code online.
With one step forward and 12,000,000 steps back (roughly the annual cost to access the same amount of data researchers used to get for free), Twitter has transitioned from being the most transparent major platform to one of the most closed-off.
It’s also once again why the most reliable solution to social media transparency is going to be government-mandated transparency requirements (more on that soon).
And I’d probably be remiss not to point out that it’s very possible, if not already well on its way to happening, that Twitter is simply going to cease existing as a meaningful space in the relatively near future…in which case their dramatic shift in transparency will just be another one of the many casualties of Elon’s stewardship of the platform.
One other side note:
Matt Levine has a great write-up today about the state of FTX as their new CEO attempts to wrap his head around all the malfeasance that seems to have taken place but one of the lines stood out to me in particular. He writes that “in an internal communication, Bankman-Fried described Alameda as “hilariously beyond any threshold of any auditor being able to even get partially through an audit””.
Honestly, if you talk to former integrity or trust & safety experts who worked at a large social platform, I think a lot of them would say this sounds very familiar. A lot of the biggest platforms grew so quickly and got so complicated that their internal ranking systems were almost impossible to understand. I definitely saw this at times myself at Facebook and ultimately I think it’s one of the superpowers of requiring more auditing and more transparency: it forces platforms to simply streamline & clarify their own systems (which, by the way, has the side benefit of helping them understand it better themselves). It’s almost a kind of prophylactic transparency.