CONTENT ANALYSIS

We analyzed 4 million data points to see what makes it to the front page of Reddit. Here’s what we learned.

We applied machine learning to analyze 4,607,160 data points collected by scraping the front page of Reddit for 22 days. Read post for insights on how the Reddit algorithms work.

Why people are so damn interested in getting to the front page of Reddit?

Look at any viral stories on news outlets like CNN.

Look at any viral videos floating around Facebook.

Look at any viral images going around on BuzzFeed.

Look at any trending story on Twitter.

… the breeding ground for all these viral stories often is Reddit.

 

“Reddit Scraping” is a common practice by news agencies to figure out which stories will be the most popular on their own websites.

“Reddit Scraping” is such a common practice now because several thousand random people already UPVOTED these images proving the story is “viral”.

This is why if something hits the front page of Reddit, you can expect to see it popping up on news sources and social media within hours (if not faster). 

For example, let’s look at the front page of Reddit and make it into a “Viral Story” and re-adapt it into articles for different sites:

 

As soon as something hits the front page, media outlets from around the world feature these posts. For example, a Reddit front page giraffe story got on CNN.com homepage in 2 hours.

 

Our Reddit Scraping Process to get 4 million+ data points :

We noticed that the results on the first page were changing very quickly, and set out to scrape the rankings every two minutes. We started on December 16, 2015 and continued up until January 8, 2016.

 

This resulted in the following data:

We scraped the top 100 posts every 2 minutes for 22 days, 3 metrics per position [score/upvotes, number of comments and rank] = 1,584,000 * 3 = 4,572,000 and 15 metrics related to each post out of 2,344 unique posts = 35,160 metrics.

 

Stats for the top 100 posts on the reddit page:

We found 2,344 unique posts appearing on the front page in the three-week period - this is about 106 different posts per day all getting in and out of the front page!

  • Top 100 ranks are collected every 2 minutes - 720 rankings in total.
    ….we later aggregated them to 15-minute intervals - 96 rankings per day.
  • For each post and we look at the headline, content if it is a text post, subreddit, number of subscribers per subreddit, is it an image, video, podcast, is it an 18+, what is the sentiment, or emotional polarity of the headline, is it an internal Reddit's self-post or an external post (e.g. to a photo hosted on Imgur).
  • For each post at each moment we capture the reddit score, the number of comments, and the rank of the post.
  • Then we deleted the posts which were in the top 100 for less than 2 minutes, and got 8,000+ posts.

There is quite a bit of action in the Reddit's top 100!

Here is what we THOUGHT we would see:

Before we got the Reddit's data we thought that the number of comments drives the upvotes of Reddit posts and causes them to appear on the front page. And yet - posts with a small number of comments can often get super high scores and appear in the top 25 of all subreddits, and this does not only happen with images!

Reddit is also notorious in letting negative and cynical posts thrive - maybe due to the main demographic. We wanted to check whether it is actually true, or just an impression because the eye might simply be noticing negative statements better.

These interesting observations were made (and validated) exclusively based on the data. Here is what we discovered:

 

FINDING #1: Starting at 9am PST is the fastest time for getting upvotes

If we look at the average evolution of upvotes over a day, we see that the scores on the front page are growing significantly starting from 9am morning PST, reaching the stable peak between 5pm and 9pm PST.

 

FINDING #2: For text posts, Very Positive or Very Negative posts perform significantly better than Neutral ones.

In general, there is no relationship between the sentiment, or how positive, negative, or neutral the headline of the post itself is and it's popularity. However, if we look at the Reddit's own textual posts (no images), then among them all polar post headlines (positive or negative) perform significantly better than the neutral ones.

 

FINDING #3: Textual self-posts with postitive headlines stay on Reddit’s front page significantly longer

 

FINDING #4: Images get much more upvotes than text posts.

Images are definitely performing much better than textual posts in terms of the maximal scores, while the textual posts get significantly more comments.

 

FINDING #5: However….text posts get more comments and stay on the front page longer

Even though image post on average get more upvotes…..text posts tend to get much more comments and stay on the front page longer. 

This is probably a text post can cause more of a conversation than an image post usually can.For example, the image post below is a cute picture of a dog cuddling a baby….which attracted a massive 4,804 upvotes, but only a measly 301 comments.

The text post however got only 1622 upvotes, and a massive 3,449 comments….because the post inherently lends itself to lively conversation

 

FINDING #6: There are 5 Sub-Reddits that completely dominate the front page of Reddit.

Subreddits r/funny, r/pics, r/gifs, r/TodayILearned, r/gaming dominate the Reddit's front page. Posts from these subreddits get bigger scores, higher ranks, and are most frequently present in the top 25 of Reddit.

 

FINDING #7: The average life of a post on Reddit's front page is 4 hours and 15 minutes.

The average life of a post on the Reddit's front page is 4 hours and 15 minutes. Some posts disappear after 15 minutes, some live for as long as 18 hours. Interestingly, textual self-posts with a positive headline live on the front page significantly longer than the ones with a neural or even negative headline. It pays off to have a positive headline, even if your post is negative in content.

Top page Reddit posts stay on the front page an average of 4 hours 15 minutes

 

FINDING #8: The average scores of Reddit's posts labeled as 18+ are significantly higher at night. Hmm...why would this be? ;-)

 

FINDING #9: Putting a number in your headline increases chances of being among the top posts.

There are a lot more posts with numbers in the headline among the top posts. These posts are also getting slightly higher scores (the difference is statistically significant) than the posts without any numbers is the headline.

 

FINDING #10: The number of comments of posts on the front page are 5.5 times higher than of posts in the top 100 on average.

Not only are posts on the front page ranked much higher, but they are actually 8 times higher in medians than the other posts in the top 100.

 

 

HOW IMPORTANT ARE SUB-REDDITS?

You can’t just “post something on Reddit.” You must post in a Sub-Reddit.

 

EXAMPLE: A funny picture of a guy getting in the face with a water balloon would go on the sub-reddit r/funny.

 

ANOTHER EXAMPLE: A gif post about David Beckham kicking a really cool soccer shot would go under r/sports.

 

Some of the superpopular sub-Reddits are:

  • r/AskReddit with 10,230,000+ subscribers.
  • r/funny with 10,177,000+ subscribers.
  • r/TodayILearned with 10,089,200+ subscribers.

Each of these three examples has more than 10,000,000 subscribers….and estimations of 10x that number in “lurkers” (people on Reddit who read the content but don’t create a Reddit account).

By comparison, the average cable channel in the United States has between 500,000 and 3,000,000 subscribers.

The magic of Reddit is that there are smaller sub-Reddits for very specific interests like:

  • /r/photoshopbattles with 5,090,772+ subscribers (a channel for Photoshop Battles).
  • r/TwoXChromosomes with 4,254,940+ (a channel for women’s issues).
  • r/Europe with 547,500+ (a channel about…..well...Europe).
  • r/Belgium with 24,550+ subscribers. (a channel about Belgium)

But what's interesting is that posts from small subreddits still have a chance to appear on the Reddit's front page, and frequently make it to the front page.

This means if you’re trying to get to the Reddit front page, you don’t have to only post in super-popular sub-Reddits. 

So we watched the top rated posts on the frontpage of Reddit for 22 days and kept tracking data on multiple things:

  • How their ranks are related (if at all) with their upvotes and comments.
  • How they get there.
  • How long they stay and why?

 

We mean….who DOESN’T want to know how to get on Reddit's first page?

 

Which sub-Reddits perform best?

Reddit pre selects the top 50 subreddits to pull their best performing posts on it's front page. We looked at how frequently the posts from each subreddit get into top 25 and got interesting stats:

Top Sub-Reddit:

r/funny has the highest fraction of posts - 9.8%!

Second best performing Sub-Reddit (a two-way tie):

r/pics = 7.2% of all posts.

r/gifs = 7.2% of all posts.

Third best performing Sub-Reddit (another two-way tie):

r/todayilearned = 5.6% of all top posts.

r/gaming = 5.6% of all top posts.

HOVER over the graph to see a the most viewed Sub-Reddits:

 

For the secret Geeks in the audience….

You'll be happy to know the sub-reddit /r/science is fourth most popular subreddit if we look at the top 100, but in the top 25 (front page posts) it really falls down to the 14-th place among the 48 default subs sending posts to the front page.

 

Should You write Negative or Positive or Neutral Posts?:

We thought only negative stuff would work on Reddit (that’s the reputation it gets), but the truth is Reddit is not that bad at all!

The posts are mostly neutral in all Sub-Reddits.

 

We looked at sentiment distributions of the top 2,344 front page posts per Sub-Reddit.

 

The most positive Sub-Reddits were:

  • /r/AskReddit,
  • /r/LifeProTips,
  • /r/GetMotivated.

The most neutral Sub-Reddits were:

  • /r/WritingPrompts
  • /r/WorldNews

The most negative Sub-Reddits were:

  • /r/AskScience (we found this surprising),
  • /r/ShowerThoughts,
  • /r/MildlyInteresting.

 

HOVER over the graph for MORE INFO:

 

How long does it take to get to the front page of Reddit?

It turns out, that on average the posts life on the front page for 4 hours and 15 minutes:

The time to get to the front page varies among posts:

  • Some enter posts that appear in the top 100 immediately enter in the top 25, but it takes some up to 10.5 hours to climb up to the top 25.
  • More than a half of the the posts get into the top 25 in under 2 hours!

This means that within at most 10.5 hours, someone can go from totally unknown, to a worldwide internet celebrity!

 

This graph shows how long Front Page posts stay on the front page. For example:

  • 80% of the posts last on the front page for at least 1 hour.
  • 40% of the posts last on the front page for at least 5 hours.
  • 20% of the posts last on the front page for at least 10 hours.
  • 1% of the posts last on the front page for at least 18 hours.

In almost every case, no posts live on the front page for over 19 hours. This is why Reddit is so addictive, there’s always new content!

 

Some other interesting facts about staying on the front page of Reddit:

  • Top page Reddit posts stay on the front page an average of 4 hours 15 minutes.
  • The average lifetime on the front page of an image is 3.5 hours, while the text posts live for 4 hours and 45 minutes on average.
  • Internal self-posts LIVE SIGNIFICANTLY LONGER than external posts.
  • The average lifetime of a Reddit's self post is 5 hours and 15 minutes.
  • The average lifetime of an external post is only 3 hours and 45 minutes.
  • The average lifetime of text posts with a positive headline is significantly longer than the lifetime of posts with a neutral or negative headline.
  • Textual self-posts with positive headlines stay significantly longer on the front page.

 

The Here’s the Mega List of Reddit Data Analysis Findings!

(if you were too lazy to read the whole post)….

  1. Top page Reddit posts stay on the front page an average of 4 hours 15 minutes.
  2. The average lifetime on the front page of an image is 3.5 hours, while the text posts live for 4 hours and 45 minutes on average.
  3. Internal self-posts LIVE SIGNIFICANTLY LONGER than external posts.
  4. The average lifetime of a Reddit's self post is 5 hours and 15 minutes.
  5. The average lifetime of an external post is only 3 hours and 45 minutes.
  6. The average lifetime of text posts with a positive headline is significantly longer than the lifetime of posts with a neutral or negative headline.
  7. Textual self-posts with positive headlines stay significantly longer on the front page.
  8. Starting at 9am PST is the fastest time for getting upvotes.
  9. For text posts, Very Positive or Very Negative posts perform significantly better than Neutral ones.
  10. Images get much more upvotes than text posts.
  11. However….text posts get more comments and stay on the front page longer.
  12. There are 5 Sub-Reddits that completely dominate the front page of Reddit.

 

If you want to hear about when DataStories puts out a new article, signup your email address at the bottom of the page!

 

P.S. We’d love if you shared this article with colleagues or friends who would find it interesting.

P.P.S. Check other data stories here:

DataStories Case Studies

Posted by
DataStories
A true team work

23 comments

These are some very interesting data points. Particularly about the "best-performing" subreddits.

Looking from a marketer's point of view, it's a bit disappointing though, since many of those five subs are very hard to pitch to, and really only gravitate to the outrageously awesome content that will have gained all of those valuable CNN and Buzzfeed links anyway.

DataStories's picture

Thank you Alex!

Indeed, the top emerging subreddits are hard to pitch to, unless you go super-visual. For me the surprising takeaway is that the communication needs to be transpated into pictures. Reddit has become undeniably visual.

Its like you reaad my mind! You appear to know a lot about this, like yyou wrote tthe book in it or something.

I think that you could do with a few pics to drive the message home a
bit, but instead oof that, this is magnificent blog.
A great read. I will definitely be back.

Keep on writing, great job!

A huge part of marketing is being creative and finding a way to take your niche/business and make it relevant to people you think might find it of interest. The subreddits that offer the biggest opportunity are also challenging - if it wasn't difficult, everyone would be doing it and it wouldn't work. Look at Digg back in the early 2000s - there was so much spam on it.

Saved as a favorite, I really like your site!

I found this extremely resourceful. I will share this on my site soon and link back. The data you brought forth can be of great value to a content marketer. Keep up the great work and thanks.

DataStories's picture

Thank you Jay! We are so happy you liked it. Doing our best.

Interesting article. I highly appreciate the specificities you have mentioned over here.

let me try today and see if we can rank on first page.

Thanks for spending your team's valuable time and getting the facts out of reddit ranking algo

I absolutely ⅼove ʏour blog.. Great colors & theme.
Did you cгeatе this web sitе youгseⅼf? Please reply back as I'm wanting to create my own site
and woulⅾ like to find out where you got this from or exactly what thе tɦeme іs called.
Сheers!

A-MAAA-ZIING data guys.

Ive seen a few posts about how to get on the front page of Reddit lately but this really shows awesome data points and things people haven't touched on.

I love it –– absolutely love it –– when people use their research skills to really deep dive into the "how" and "why" things work they way they do. Thanks for doing such such an in-depth study of Reddit's influence and influencers. You crunched a lot of data, but the basic strategies for (possibly) getting to the top of the site are pretty straightforward. I plan on putting this info to use in the near future.

Thanks for the hard work in getting us data on reddit - one key insightful take away I got was how a marketer can check Reddit to learn to anticipate what might be popular or trending in the next few hours or minutes! And be able to ride the wave or stay on top of mind share...

I can only wonder what would happen if Reddit disallowed scrapping on their website or made it difficult for websites like Buzzfeed to scrap or take articles from them?

What an awesome post guys. Will definitely subscribe to your emailing for more posts like this.

r/nosleep 100% positive? Are you sure?

DataStories's picture

Hey Sundeep,

The thing is that only one post from r/nosleep got into the top 25 over the two weeks that we were logging the data, and it was positive. You can find the number of posts per sub-reddit in a graph with orange bars (right above the "for the secret geeks" section). Hover over the graph and you will see how many post per subreddit made it to the top. The sentiment  per subreddit was only analyzed for the posts from the top 25.

"Even though image post on average get more shares".

What counts as a share on Reddit? Or did you mean votes?

DataStories's picture

Jon, thank you for reading the post carefully! We did mean that image posts get more upvotes (higher scores). Changed!

No problem. Another question. Isn't everyone's Front Page different based on the subreddits they subscribe to?

If so, I take it you did this analysis just using the default subs?

DataStories's picture

Yes, Jon, we used the default subreddits to make the analysis as objective as possible.

Interesting. I'd like to see the specific linguistic processes involved with positivity vs. negativity ratings.

On a slightly related note, I did a study comparing reddit's Front Page during the 2008 presidential election vs. the Front Page during the 2012 presidential election. Much less Obama love, a wider spread of subreddits, and more visual posts. Check it: https://goo.gl/pquNlH

DataStories's picture

What a beautiful post you created, Ian. Thank you for sharing! 

We do sensitivity analysis using a text processing API (https://goo.gl/pUCq0x) using NLTK text classification. It is very simple, but does the job well for the English language.

Thanks for sharing the story; I learnt a great deal from it. I have done an analysis of hot posts on Weibo, the Chinese social media platform often compared to Twitter but kind of similar to Reddit in terms of user anonymity and the rawness and perceived "negativity" of its contents. My sample size had been smaller and the metrics examined not as comprehensive as what your study has described. If you are interested, here's the post: http://www.rpubs.com/Azzurra/WeiboAds

Leave a comment