In a previous post, I mentioned some interesting graphs that could be made from public URL tagging data such as that at del.icio.us. I keep wanting to see these graphs, so I figured I’d post some details and issue a request / challenge / wheedle to the real hackers out there to slap something together. Of course, I’m not sure the data is accessible enough to pull this off, so it may not be possible for a third party to do it…
What I’m picturing is a lot like Durl, except instead of history trends, URLs / tags / users would be associated with distribution graphs. There are many possible such graphs, so you have to pick the ones that answer the most interesting questions. The three that seem most interesting to me are:
So for example, upon entering a URL to get the first graph, each bar on the X axis represents a tag, with the Y axis giving the number of users who tagged the URL with that tag. The vertical line corresponds to the median, i.e. half of the taggings were done using tags on either side of that line; thus the long tail could be considered to be the portion to the right of this line, this “area” being equal to that to the left of this line. The horizontal line represents the average Y axis value (of arguable interest in most cases).
One could also generate graphs using the entire set of URLs, tags, or users. Again, many such graphs are possible; here are some that answer what seem to me to be interesting questions:
After looking at all this, I realize that my previous post was a bit muddled on why Clay’s graph bothered me. The problem isn’t that the users were ordered by decreasing tag usage (also done above), it’s that the Y axis represented the number of tags ever used instead of the number of taggings performed. This makes the “area” under Clay’s curve in the long tail difficult to define, and so it’s hard (I think) to find a question that it answers. Or who knows, maybe if I look at it tomorrow, it’ll make perfect sense.
So, anyone out there up for making these graphs real…?