Substance: 666 and How Twitter Samples Tweets in Streaming API

6/20/2013

666 and How Twitter Samples Tweets in Streaming API

Labels: 666, sample streaming API, snowflake, twitter

After having played around with Twitter data for a while, I had a question: how Twitter samples the supposedly random tweets to send out through its sample streaming API?

I vaguely remember that it used to say "1% random sample" somewhere on the official documentation but I can no longer find that statement. So I decided to investigate the question by experiments. The result turns out to be far more fascinating than I expected (such as the appearance of 666).

This task would be trivial if I had firehose access but I do not. I initially thought of crawling tweets with ID's near the ones received in the stream sample and then do the counting. But I quickly found out how terribly inefficient that was: the tweet ids seem often to be very sparse. Then, thanks to Twitter's commitment to open source, I found their tweet ID generator on Github, wittily named snowflake (after a snowflake's large number of possible configurations, I suppose). In order to create a distributed solution to global unique ID generation, the essential idea of snowflake is to use timestamp and unique worker ID together to ensure uniqueness in an independent manner.

The first thing I noticed in snowflake is that whereas the 'created_at' property of the returned JSON tweet objects provides timing information at per-second resolution, one can recover per-millisecond timing information from snowflake! With this more precise timing information, some intriguing pattern emerges from the tweets in sample stream: within each second, all received tweets fall within a 10-millisecond-wide window. So we get 10/1000 = 1% of the millisecond timestamps which translates to roughly 1% of all tweets (assuming good randomness in tweet creation time) confirming the claim in my memory. But the surprise does not stop there, that sampling window is the same for every second! It is fixed exactly between the 657th and the 666th millisecond. So there is the 666 in the title. I wonder what is the story behind choosing 666 and this particular scheme of "random" sampling.

To make the post more complete, I should add that: 1. snowflake is used not only for tweet ID's but also direct message ID's. 2. before snowflake was activated sometime on 11/4/2010, Twitter used incremental ID's (the earliest existing tweet being 20).

To start playing with snow, you can use my little python module to create and melt a snowflake ID. (Indeed, you might soon find that not every tweet is delivered even in that 10 milliseconds window.)

If you find this interesting, leave a comment. We can also talk on twitter: @falcondai

25 comments:

Unknown9/11/2013 05:02:00 PM
how about instagram?
ReplyDelete
Replies
Unknown10/20/2015 10:17:00 PM
Hi, good information from your website. May i ask, did you know any source code api that able to execute maximum of 1000 tweets? because i have found the source code that has limit of 100 tweets to be executed.

Tq for your respond, sir.
ReplyDelete
Replies
Anonymous6/29/2016 09:16:00 AM
Here is the documentation.

https://twittercommunity.com/t/potential-adjustments-to-streaming-api-sample-volumes/31628
ReplyDelete
Replies
Anonymous7/29/2016 11:47:00 AM
I'd be interested if anyone has tried this again recently? Has the game changed? Also, I wonder if they miss tweets during high volumes, thus reducing the sample rate?
ReplyDelete
Replies
Anne1/20/2020 12:13:00 PM

I am quite new to Twitter API and Tweepy and I am confused with the rate-limiting concept, I am using the streaming API and I want to gather sample tweets without using any filters such as hashtags or location, some sources state I should not get rate limited with sample tweets as I am getting 1% of tweets and some state otherwise. I keep getting error 420 very often and I was wondering if there is a way to avoid it or make it smoother? Thanks~ Anne from tailored software solutions
ReplyDelete
Replies
Lucas12/12/2020 07:11:00 AM
Twitter is the current top "hot property" Online, yet its prevalence and how to utilize it has confused numerous entrepreneurs. leptitox before and after pictures
ReplyDelete
Replies
sarah12/22/2020 02:28:00 AM
The social systems administration sites like Facebook, MySpace, and so forth were a portion of those destinations offering free types of assistance. Individuals didn't figured they would utilize these social systems administration locales for advancing or marketing their business. smm panel
ReplyDelete
Replies
Lauren Chevalier12/28/2020 06:40:00 AM
This post is really valuable that designed for the new visitors. Pleasing work, keep on writing.
ReplyDelete
Replies
Michael Alexander1/01/2021 05:17:00 AM
They went far over budget and past initial time estimates.
brand management companies
ReplyDelete
Replies
tom cruise4/15/2021 02:44:00 AM
Twitter is the current top "hot property" on the Web, yet its prevalence and how to utilize it has perplexed numerous entrepreneurs. buy facebook accounts
ReplyDelete
Replies
Logan7/15/2021 02:57:00 AM
Current innovation has made our lives so progressed and prosperous that we discover today we are practically unequipped for living without it! https://sites.google.com/view/instagramfollowersbuy/
ReplyDelete
Replies
read full article7/31/2021 02:27:00 AM
In opposition to mainstream thinking, an independent social media supervisor needs to leave his office once in a while! On the off chance that this is an issue for you, you should contemplate beginning another calling. Get More Customers With Social Media
ReplyDelete
Replies
slot 4d8/07/2021 07:56:00 AM
I likewise utilize numerous online authoritative apparatuses, for example, Thunderbird for getting to all my email accounts in a single spot, Dropbox to handily impart archives to customers and bookmarks to monitor every one of the sites I often visit. Main smm panel
ReplyDelete
Replies
Jennet10/12/2021 12:14:00 PM
Since you are setting up your show (post) to share to chosen Facebook Groups. The rundown of Facebook Groups I for one use will be accessible toward the finish of this article. visit site
ReplyDelete
Replies
Taylor 11/16/2021 08:52:00 AM
For specialist organizations, a somewhat unique twist on social media advancement is required as, as a rule, a specialist co-op will sit in visual social media, and text-based social destinations. buy instagram auto likes
ReplyDelete
Replies
win diesel12/29/2021 06:08:00 AM
Is this maybe a pattern that will proceed later on and on second thought of individuals simply doing look on Google, go to YouTube to do look? Maybe YouTube will turn into the #1 site where individuals need to look for data. mélybölcsős fuvarozás Europa-Road Kft.

ReplyDelete
Replies
Petter John1/03/2022 12:35:00 AM
The stuff in the blogs blows out my mind.
user interface design agency
ReplyDelete
Replies
Ronnie1/12/2022 02:58:00 AM
Want to get your social media marketing trending? See your personal Instagram engagement skyrocket? best site to buy instagram followers
ReplyDelete
Replies
Anonymous1/23/2022 02:33:00 AM
Casino queen no deposit bonus codes 2021 #1
Casino queen no deposit bonus codes 2021 #1. Play your favorite slot games at a casino without 다파벳 making a deposit. It's not クイーンカジノ a big deal! · Check this
ReplyDelete
Replies
BEN STOLKER1/27/2022 05:14:00 AM
A portion of the live streaming administrations are YouTube Live, ustream, LivestreamTwitter's periscope and Facebook live streaming help.
https://www.buyyoutubesubscribers.in/
ReplyDelete
Replies
nora2/12/2022 05:13:00 AM
In the event that there is an interference because of blockage on the web, for instance, the sound or video will nonconformist or the screen will go clear. watch tv on laptop
ReplyDelete
Replies
mohsinkhatri2/27/2022 01:04:00 AM
I admire this article for the well-researched content and excellent wording. I got so involved in this material that I couldn’t stop reading. I am impressed with your work and skill. Thank you so much. Mega888 twitter
ReplyDelete
Replies
Yousuf Ansari3/23/2022 05:33:00 AM
I invite you to the page where you can read with interesting information on similar topics. Satta king online
ReplyDelete
Replies
vfmse8/27/2022 08:37:00 PM
With countless youngsters using this stage, it's the capacity to drive music tunes like Taylor's, items, and showcasing efforts.iceliker
ReplyDelete
Replies

Add comment

Pages

6/20/2013

666 and How Twitter Samples Tweets in Streaming API

25 comments: