The long tail tagging the dog

In a previous post, I mentioned some interesting graphs that could be made from public URL tagging data such as that at del.icio.us. I keep wanting to see these graphs, so I figured I’d post some details and issue a request / challenge / wheedle to the real hackers out there to slap something together. Of course, I’m not sure the data is accessible enough to pull this off, so it may not be possible for a third party to do it…

What I’m picturing is a lot like Durl, except instead of history trends, URLs / tags / users would be associated with distribution graphs. There are many possible such graphs, so you have to pick the ones that answer the most interesting questions. The three that seem most interesting to me are:

Taggraphs (per item)

So for example, upon entering a URL to get the first graph, each bar on the X axis represents a tag, with the Y axis giving the number of users who tagged the URL with that tag. The vertical line corresponds to the median, i.e. half of the taggings were done using tags on either side of that line; thus the long tail could be considered to be the portion to the right of this line, this “area” being equal to that to the left of this line. The horizontal line represents the average Y axis value (of arguable interest in most cases).

One could also generate graphs using the entire set of URLs, tags, or users. Again, many such graphs are possible; here are some that answer what seem to me to be interesting questions:

Taggraphs (all)

After looking at all this, I realize that my previous post was a bit muddled on why Clay’s graph bothered me. The problem isn’t that the users were ordered by decreasing tag usage (also done above), it’s that the Y axis represented the number of tags ever used instead of the number of taggings performed. This makes the “area” under Clay’s curve in the long tail difficult to define, and so it’s hard (I think) to find a question that it answers. Or who knows, maybe if I look at it tomorrow, it’ll make perfect sense.

So, anyone out there up for making these graphs real…?

3 Responses to “The long tail tagging the dog”

  1. Phil Says:

    The graph ideas look interesting; I hope someone with more time and better access to the data picks them up. I’m still uncomfortable with using ranking as an X-axis variable, though; in effect, both axes are carrying the same piece of information (“that bar’s the tallest *and* it’s the first!”) This may be a trivially obvious point, but I do think it’s been overlooked by a lot of the people who have spread the ‘power law’/'long tail’ image; people have even contrasted ‘power law’ distributions with the ‘bell curve’ of a normal distribution, which (of course) you can’t possibly get if you’re using ranking along the X axis.

  2. Adam Says:

    Yeah, something still bothers me about that too. But the classic “long tail” is that of Amazon: products are ranked by sales, and significant overall sales volume comes from the many products that don’t sell much each. So I think that this is the right graph, but I agree people sometimes forget that the X axis doesn’t really represent an independent value, and start talking about bell curves, etc.

    In fact, I think the “power law” terminology itself lends itself to such confusions. Firstly, it’s really an inverse power law. But also, a 1/x graph has a “tail” that is equal to its “head,” but if you want a graph with a “heavy tail” you’d be looking at something like 1/x^.5; even “inverse power law” usually means something like 1/x^2. Plus the idea of comparing “areas” under these curves is problematic since integrals of all such functions are divergent…

  3. EconoMeta » Blog Archive » Turning rankings into graphs Says:

    [...] ly cleared up what was bothering me about those long tail graphs, prompted by Phil’s comment and helped a lot by this article. The issue is that long tail graphs have an x axis com [...]

Leave a Reply

Moderation is on, so your comment may not show up right away.