666 and How Twitter Samples Tweets in Streaming API

After having played around with Twitter data for a while, I had a question: how Twitter samples the supposedly random tweets to send out through its sample streaming API?

I vaguely remember that it used to say "1% random sample" somewhere on the official documentation but I can no longer find that statement. So I decided to investigate the question by experiments. The result turns out to be far more fascinating than I expected (such as the appearance of 666).

This task would be trivial if I had firehose access but I do not. I initially thought of crawling tweets with ID's near the ones received in the stream sample and then do the counting. But I quickly found out how terribly inefficient that was: the tweet ids seem often to be very sparse. Then, thanks to Twitter's commitment to open source, I found their tweet ID generator on Github, wittily named snowflake (after a snowflake's large number of possible configurations, I suppose). In order to create a distributed solution to global unique ID generation, the essential idea of snowflake is to use timestamp and unique worker ID together to ensure uniqueness in an independent manner.

The first thing I noticed in snowflake is that whereas the 'created_at' property of the returned JSON tweet objects provides timing information at per-second resolution, one can recover per-millisecond timing information from snowflake! With this more precise timing information, some intriguing pattern emerges from the tweets in sample stream: within each second, all received tweets fall within a 10-millisecond-wide window. So we get 10/1000 = 1% of the millisecond timestamps which translates to roughly 1% of all tweets (assuming good randomness in tweet creation time) confirming the claim in my memory. But the surprise does not stop there, that sampling window is the same for every second! It is fixed exactly between the 657th and the 666th millisecond. So there is the 666 in the title. I wonder what is the story behind choosing 666 and this particular scheme of "random" sampling.

To make the post more complete, I should add that: 1. snowflake is used not only for tweet ID's but also direct message ID's. 2. before snowflake was activated sometime on 11/4/2010, Twitter used incremental ID's (the earliest existing tweet being 20).

To start playing with snow, you can use my little python module to create and melt a snowflake ID. (Indeed, you might soon find that not every tweet is delivered even in that 10 milliseconds window.)

If you find this interesting, leave a comment. We can also talk on twitter: @falcondai


  1. Replies
    1. I haven't used Instagram's API so I wouldn't know, maybe you can try it and share your experience. This analysis takes advantage of Twitter's id generator, i.e. snowflake (which is open-sourced), to study their streaming API's sampling scheme.

  2. Hi, good information from your website. May i ask, did you know any source code api that able to execute maximum of 1000 tweets? because i have found the source code that has limit of 100 tweets to be executed.

    Tq for your respond, sir.

  3. Here is the documentation.


  4. I'd be interested if anyone has tried this again recently? Has the game changed? Also, I wonder if they miss tweets during high volumes, thus reducing the sample rate?

  5. Businesses can use Twitter as a very effective social media tool for marketing. here it is the twitter account https://twitter.com/darrenwinters01 This man is very helpful.!!

  6. If there is an encoding problem, this can cause the loss of the coherence of the visual data, which can cause problems at the reception end as well.Watch Gotham free online


  7. It is really great work and the way in which you are sharing the knowledge is excellent. Thanks for your informative article
    Php course in chennai

  8. Generally, video streaming is just taking a video and sound flag at the source and transmitting over the web. iptv subscription

  9. wow great Article, the details you have provided are much clear, easy to understand, if you post some more Article, it will be very much useful for me.
    PHP Training in Chennai

  10. Thanks for the great information , i was looking for this information from long.Great blog
    tally course in hyderabad

  11. Very informative and well written post! Quite interesting and nice topic chosen for the post.
    thanks for sharing this nice post,
    tally course in hyderabad

  12. Nice post. Thanks for sharing! I want people to know just how good this information is in your article. It’s interesting content and Great work.
    Thanks & Regards,
    VRIT Professionals,
    No.1 Leading Web Designing Training Institute In Chennai.
    And also those who are looking for
    Web Designing Training Institute in Chennai
    Photoshop Training Institute in Chennai
    PHP & Mysql Training Institute in Chennai
    SEO Training Institute in Chennai
    Android Training Institute in Chennai

  13. Since the approach of the web, doing numerous intelligent things including video streaming has turned out to be conceivable.best iptv service 2019

  14. If you have a high speed satellite internet connection, this is a great time to be a sports fan. football live streams

  15. thanks for sharing this article to us ,it is very nice article Thanks for sharing the details!...best regards.
    Linux Training in Hyderabad

  16. I as of late ran over your website and have been perusing along. I thought I would leave my first remark. I don't realize what to say aside from that I have delighted in perusing. Decent blog. I will continue going to this online journal frequently. Buy Twitter Retweets

  17. Dengan membawa modal yang lebih besar, maka anda memiliki peluang untuk menang lebih besar pula. Hal ini dikarenakan dengan menggunakan akun id pro anda dituntut untuk memiliki modal yang lebih besar agar tidak mengundang kecurigaan dari pihak customer servise
    bandar ceme terbaik
    paito warna terlengkap
    syair sgp

  18. One of the simplest ways is through getting a software program that will allow the individual to record the streaming video of the individual’s choice over the internet. moviebox android

  19. thanks for your information really good and very nice web design company in velachery