Public data dump download numbers seem inconsistent with actual

chester.burbidge's Avatar

chester.burbidge

07 Jan, 2018 11:38 AM

I'm trying to get the top 1000 downloaded gems for a personal project.

I've downloaded and restored the data from https://rubygems.org/pages/data into a postgres database. When I join the versions to get the download counts with the command:

COPY (select r.name, v.full_name, v.authors, d.count from versions v join rubygems r on v.rubygem_id = r.id join gem_downloads d on v.id = d.id) TO '/tmp/gem_stats.csv' DELIMITER ',' CSV HEADER;
and analyse and sort the results by most downloaded I get wildly different results to the page https://rubygems.org/stats?page=1

Anyone know why this might be?

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac