Bactrim Order ambien online Gmail Antivirus software Domain names Buy cialis now Auto insurence Spyware doctor Cheap viagra online New york car insurance Tramadol buy Mercury car insurance Free sex Ambien generic Order adipex Repair Weather Online slot Online stock trading Buy ambien Free debt consolidation Ritalin Zithromax Buy generic viagra Car insurance Direct tv Full tilt poker 3 in 1 credit report Adult Car insurance rates Buy xanax Online debt consolidation Bad credit personal loans Cialis generic viagra Forex trading system Nolvadex Massage therapy Discount viagra System antivirus 2008 Generic valium Buy clomid Buy car insurance Spyware protection Reduce debt Zithromax antibiotic Buy lortab Cheap codeine Physician assistant Ed mcmahon Singulair Spyware free downloads Pokerstar.net Psychologist Clopidogrel Cheap auto insurance Valium online Consolidate debt Sertraline Generic ambien Buy vicodin 

Top users and power laws

In a conversation related to my previous postings on power laws, a question came up: If a ranked distribution follows a power law, what percentage of the total is in the highest ranked bin? So for the example of a histogram of users ranked by the % of taggings, what percentage M of all taggings are made by the very top user?

top user in a power law

It turns out that this depends on whether the power law is an exact inverse (Zipf: a = 1) power law or a higher order power law.

The top user u = 1 has M percent of all taggings, so the curve is t = Mu^-a. Each bar measures the percentage of taggings by that user, so the sum of all bars has to equal 1. So for N users we have

M + M/(2^a) + M/(3^a) + … + M/(N^a) = 1

or

M = 1/(1 + 1/(2^a) + 1/(3^a) + … + 1/(N^a)).

For a Zipf law with a = 1, the denominator is the harmonic series, which diverges; so that means the % of taggings by the top user drops as the number of users N gets larger. We can calculate M by remembering that the harmonic series sums to gamma + ln(N) as N approaches infinity, where gamma is the Euler-Mascheroni constant and ln is the natural log. We can check that this is close enough after N = 100, so calculating N = 10 by hand and using this formula for the rest we have:

Gotta love NumSum. But if a > 1, the series in the denominator converges, so that as the number of users N increases, the % of taggings by the top user M quickly settles to a constant:

This is all in follow-up to the fourth point from this post:

(4) While it is true that “bigger systems benefit from both higher heads *and* longer tails,” in general this usually just makes the histogram fit the curve better; it is rather the shape of the curve that determines whether or not “most activity is from a small group of highly active users.”

A Zipf law is a case where a bigger system actually has a distinct effect: the bigger the system, the lower the percentage resident in the highest ranked bin, resulting in a lower percentage of activity from the most active users. In the case of higher power laws, this percentage quickly settles to a steady constant, so size doesn’t have much of an effect once the system is reasonably big.

As an aside, I was also asked to post the graph presented at TagCamp showing a histogram that fits a “long tail” but not a power law, so here it is:

false power law

Although this looks similar to a power law, if we disregard the top two users the histogram actually fits the curve that corresponds to a perfect bell curve PDF. This means that in contrast to a power law, where the average number of taggings per user is essentially meaningless, above this average is maximally meaningful.

Leave a Reply

Moderation is on, so your comment may not show up right away.