As repeatedly promised by Twitter CEO Elon Musk, Twitter has opened a portion of its source code to public inspection, including the algorithm it uses to recommend tweets in users’ timelines.
On GitHub, Twitter published two repositories containing code for many parts that make the social network tick, including the mechanism Twitter uses to control the tweets users see on the For You timeline. In a blog post, Twitter characterized the move as a “first step to be[ing] more transparent” while at the same time “[preventing] risk” to Twitter itself and people on the platform.
On that second point, the open source releases don’t include the code that powers Twitter’s ad recommendations or the data used to train Twitter’s recommendation algorithm. Moreover, they include few instructions on how to inspect or actually use the code — reinforcing the idea that the releases are strictly developer-focused.
“[We excluded] any code that would compromise user safety and privacy or the ability to protect our platform from bad actors, including undermining our efforts at combating child sexual exploitation and manipulation,” Twitter wrote. “We [also took] steps to ensure that user safety and privacy would be protected.”
Twitter says it’s working on tools to manage code suggestions from the community and sync changes to its internal repository. Presumably, those will be made available at a future date — there’s no sign of them at the present.
At first glance, algorithm is fairly complex — but not necessarily surprising in any way from a technical standpoint. It’s made up of multiple models, including a model for detecting “not safe for work” or abusive content, the likelihood of a Twitter user interacting with another user and calculating a Twitter user’s “reputation.” (It’s unclear what “reputation” refers to, exactly; the high-level documentation isn’t clear.) Several neural networks are responsible for ranking the tweets and recommending accounts to follow, while a filtering component hides tweets to — forgive the jargon — “support legal compliance, improve product quality, increase user trust, protect revenue through the use of hard-filtering, visible product treatments and coarse-grained downranking.”
In an engineering blog post, Twitter reveals more about the recommendation pipeline, which it claims runs approximately five billion times per day:
“We attempt to extract the best 1,500 tweets from a pool of hundreds of millions … Today, the For You timeline consists of 50% [tweets from people you don’t follow] and 50% [tweets from people you follow] on average, though this may vary from user to user,” Twitter wrote. “Ranking is achieved with a ~48M parameter neural network that is continuously trained on Tweet interactions to optimize for positive engagement (e.g. likes, retweets, and replies). This ranking mechanism takes into account thousands of features and outputs ten labels to give each Tweet a score, where each label represents the probability of an engagement.”
Twitter reveals some of its source code, including its recommendation algorithm by Kyle Wiggers originally published on TechCrunch