Tagging and long tails

Clay Shirky posted a great essay on social tagging vs. expert categorization. Tagging is a particularly interesting example of “stuff about stuff” being valuable, because it includes two extra ingredients: social network effects and the ability to address the “long tail” of both content and meaning.

In a system like del.icio.us where each person can tag a URL, value is created along two dimensions: the user who tagged it has presumably fit it into his/her own personal category scheme; and the aggregated tags of many users assign new semantic properties to the URL. Feedback can then be created between these dimensions via an auto-complete functionality (e.g. John Resig’s extension) that, at the moment when a user assigns a tag, displays tags assigned that URL by other users and/or algorithmically “related” tags based on what the user types.

Secondly, the fact that any user can assign a tag to any URL means that many more URLs can be tagged (at no cost) as compared to an expert categorization scheme, and that many more (weighted) meanings can be assigned to each URL. The tagging of obscure URLs addresses the “long tail” of URLs, while the aggregate “tag profile” of a given URL addresses the “long tail” of perceived meanings.

Clay’s essay covers these points in clear prose accompanied by some really helpful charts and graphs. One very minor issue I have, though, is with the “Tag Distributions” chart. Clay refers to “the characteristic long tail of people who use many fewer tags than the power taggers.” While this chart does exhibit a “long tail,” this is simply a result of the fact that the users were ordered by decreasing tag usage (also true of the following three charts) — the X axis here doesn’t represent a value, it is just a sequence of users.

The phrase “long tail” usually refers to the observation that for many distributions, the number of elements with outlying values (the “tail”) may be cumulatively significant compared to the number of elements clustered near the average. Clay might have not even been using the phrase in this way, but once a buzzword gets going, it’s best to use it as conservatively as possible (otherwise people start getting pissed off!).

Some “long tail” charts that would be interesting to see would be URLs by number of times a tag was assigned (showing whether the long tail of obscure URLs cumulatively comprises more tags assignments than the common URLs), or tags for a specific URL by number of times a tag was assigned (showing whether the long tail of obscure tags cumulatively comprises more tags assignments than the common tags). This last chart could also be averaged across many URLs to see if this long tail applies in general to arbitrary links.

3 Responses to “Tagging and long tails”

  1. Phil Says:

    Thanks for flagging this up. Clay’s “long tail” graphic looked so ‘right’ that it didn’t occur to me to question whether it said what it appeared to say.

    It doesn’t, frankly. If you re-crunch the numbers Clay was working with, you do end up with a huge left skew and a long-ish (although bumpy) right tail – working from ‘0-25′ up to ‘550-575′, the figures are
    15,15,7,7,3,4,2,3,0,1,0,0,1,1,0,2,3,0,0,0,0,1,0

    But the skew is towards the *low* end of the distribution – what Clay calls the ‘long tail’ – and (perhaps more importantly) that’s a tiny, tiny sample.

  2. Adam Says:

    Thanks for the comment. It looks like you created a new chart of number of tags by number of users who use that many tags. This wasn’t one of the potential charts I mentioned; this distribution would show whether the long tail of users assigning a high number of tags cumulatively comprises more users than those assigning a smaller number closer to the average. From this sample, at least, it looks like this isn’t the case; most users assign a number of tags close to a median of somewhere around 50.

    I’d still love to see URLs by number of times a tag was assigned, or tags for a specific URL by number of times a tag was assigned.

  3. EconoMeta » Blog Archive » The long tail tagging the dog Says:

    [...] onalization and personal data The long tail tagging the dog In a previous post, I mentioned some interesting graphs that could be made from public URL tagging data such as th [...]

Leave a Reply

Moderation is on, so your comment may not show up right away.