Degrees of anonymity

September 22nd, 2006

We have one primary goal with PrefPass when it comes to changing the user experience on the web: convenience. This aspect is a simple proposition: instead of a registration form, a link in an email, and then yet another password to remember, you can join a site with one click.

But another aspect of PrefPass is that, unlike previous solutions such as form-filling utilities or single sign-on systems, PrefPass keeps users anonymous. This aspect is a bit more complicated.

Anonymous literally means “of unknown name,” but in reality we mean a lot more than that when we use the term. Does it mean that you don’t know who I am in the real world? That you don’t know that I’m the same person that writes a different blog? That you don’t know that I’m the same person that wrote the last post on this blog?

As far as I can tell, there isn’t really a standard way of characterizing these different degrees of anonymity. But there are some standard terms whose meanings can be perhaps slightly bent to cobble together a basic ranking:

- Unknown: absolutely nothing is known about the user

- Anonymous: the user is associated with an identifier that applies across transactions at a single site. Cookies are a common way of anonymously identifying users on a temporary basis.

- Pseudonymous: the user is associated with a pseudonym (AKA handle, username, or nickname) that applies at other sites. Most single sign-on (SSO) systems are designed to prove ownership of a pseudonym.

- Personally identifiable: the user is associated with information which can potentially be used to uniquely identify, contact, or locate them. Personally identifiable information (PII) includes things like name, telephone number, street address, and e-mail address.

One problem with pseudonymity is that it is susceptible to correlation, which can lead to personal identification. For example, if data associated with one pseudonym is collected across many sites, this richer dataset may make it possible to personally identify the user. Or if one site is compromised, this can affect the user across all sites where that pseudonym is used.

For these reasons and others, at PrefPass we decided to provide users with anonymity, not just pseudonymity. To do this, we used what in digital identity circles is called a unidirectional identifier. That just means that when you click on the PrefPass badge at two different sites, each site is provided with a different identifier. That way, each site can recognize you, but no one can tell that the same person joined those two sites.

Of course, PrefPass is all about user control, and with control comes responsibility. For example, you could manually enter a pseudonym at multiple sites where you use PrefPass to join. That would make you pseudonymous at those sites; but it would be your decision, not something built into PrefPass. And no other PrefPass sites would be affected by it.

Now, some might argue that true anonymity is impossible on the internet; that without extraordinary knowledge and care, anyone leaves a trail that can be pieced together to find out who they are in the real world. For example, see this recent blogosphere brouhaha.

But at least for me, that doesn’t at all mean that we should throw up our hands and sign our names on every page we visit. You can do a lot to remain anonymous on the internet, and as always, defaults matter. It’s true, if someone really wants to track you down, and is willing to dedicate time and money to doing so, there’s a decent chance they’ll succeed. But the same is true in the real world: if someone really wants to break into your house, they probably can. But that doesn’t mean that you shouldn’t lock your doors and safeguard the key. Most crimes are crimes of opportunity, and basic good habits will make a big difference in how likely you are to have problems.

I haven’t even touched here on another feature of PrefPass that complements anonymity: transparency. Details will have to wait for another day, but the basic fact is this: when you use PrefPass to personalize a site, the data being used for personalization is visible, editable, and controlled by you. The idea is that by making it easy for sites to ask you for your Prefs directly, they’ll have less reason to try to piece together your interests in some other way. In our view, the trade of Prefs for personalization is happening all the time on the web; so why not make it explicit, easy, anonymous, and transparent?

PS: “Degree of anonymity” is also actually a technical measure of how anonymous you are using a given anonymizing approach. Details here.

Introducing PrefPass

August 4th, 2006

PrefPass is now in private beta! It’s been an intense time getting everything ready to go, but it’s now out there (and in my sidebar).

In the requisite three words, what is PrefPass all about?

Personalization without registration.

As I was talking about in the last post, the idea is to keep things simple and anonymous for users, while giving sites exactly what they need:

- A user identifier (unidirectional: unique to the site and user)
- User prefs for personalization: keywords, interests, etc.

So what’s the point? To reduce the number of registrations and passwords you have to keep track of. How many times have you followed a link to a news site, or tried to check out a cool new app, and decided to bail out when you were faced with yet another registration form? PrefPass replaces that form with one click.

For sites, letting people bypass registration means attracting more users, providing better personalization, and earning higher ad revenues. PrefPass is really lightweight, so there’s no server side integration and no security worries, just a simple javascript button you add to the site. For an example of some things you can do with PrefPass, check out the demo site we set up at YourSuperNews.com.

For blogs, PrefPass is so easy to use that you can essentially add instant personalization to your blog. Check out the widgets in my sidebar for an example (more coming soon). If your blog or app has ads, you can earn more from them by targeting them to user interests, as they are here at EconoMeta. Even if you don’t have any ads, PrefPass complements stats like hits and pageviews by showing you what your readers actually are interested in — an example of this is the Audience Cloud in the sidebar here.

The point is that pretty much any site can be made better by being customized to user interests. And the purpose of PrefPass is to make it as easy as possible for the site and user to form the relationship needed to do this.

We’re off and running, with a long list of cool stuff on the way. Give it a try, and let us know what you think!

Microchunking identity

July 8th, 2006

So as mentioned in my last post, I recently stopped by BarCamp SF (which was great!) and talked about “microchunking identity.” I figured it would be a good way to explain part of the motivation behind the startup I’m working on, PrefPass. This was also the first public demo of PrefPass, so it was pretty exciting for me.

Here’s what I talked about. The concept of “digital identity” has been around for a long time, and usually includes all kinds of complicated functionalities. For example, an identity can:

- Prove you are the same person you were last time you visited
- Prove that you are a specific person or have certain attributes
- Prove that you have authorization or a reputation verified by a third party
- Grant permission for one site to pass your data on to another

But what does “identity” mean in reality for most consumer web apps? Well, basically, it usually means a registration form, an email validation, and then another username and password to remember.

This is kind of a pain for users, which makes it worth asking: what’s the *real* reason sites require registration? For most apps, it’s to do one or more of the following:

- Ensure the user is a human and not a bot
- Associate the user with site data (e.g. settings)
- Associate the user with preference data (e.g. interests)
- Contact the user (e.g. to email a forgotten password)
- Target ads to the user (to make more money)
- Associate the user with a specific person (e.g. a blogger)

Looking at this list, the interesting thing is that only the last item really requires an “identity” as most of us think of it. You’d think that the rest could be done without the oftentimes complex machinery of most identity solutions.

Well, it can, and from a certain perspective, that’s what PrefPass is all about! Instead of the same old [form -> email -> response -> password] sequence, why not just [click]? And why not make it completely anonymous? After all, only the last item above requires you to “prove” that you’re someone in particular. With anonymity, there’s no ID to remember, no privacy issues, no namespace to worry about — just [click]!

Microchunking identity means reducing it to its smallest usable parts. For example, there are some cool solutions that focus on proving that you own a blog URL — that’s a microchunk. With PrefPass, we’re focusing on letting you tell a site that you’re the same person as last time, and that you’re associated with some anonymous metadata representing your interests or preferences. That’s it. By keeping it simple, we hope to be able to solve some real problems for both sites and users, while making a big change in how much users can control their own data.

We’re just about to launch a limited beta, so if this sounds interesting to you, please help us out! You can request a beta invite by clicking on the PrefPass button in my sidebar or by going to PrefPass.com. We’re also looking for additional sites who want to try out PrefPass during the beta. If 1-click registration, instant personalization, or user-targeted ads that pay more sound interesting to you, please give me a shout at adam at prefpass dot com.

Microchunking applications

June 27th, 2006

Many people have been talking about the idea of “microchunking.” This means taking an object, usually a media file, and reducing it to its smallest usable part. The idea is that instead of fighting against innovation, digital media can embrace new technology and still be profitable if it is microchunked, syndicated, and monetized wherever it is consumed.

This is a powerful idea; but why limit it to media? It seems to me that the same logic applies to applications. A big part of what I think is exciting about the latest batch of web apps is that they microchunk what was once a monolithic software application (e.g. Office), make it web-native, and monetize its use via advertising and/or premium service fees.

Going back to digital media, a big part of why it can be effectively microchunked now is that certain enabling technologies are widespread enough to reduce the advantage once held by centralized media: things like editing tools, RSS syndication, and aggregators. The same thing is true for apps; enabling technologies here would include widespread broadband, more active browser techniques like AJAX, and standardized data formats.

I think that there are two main remaining barriers to microchunking apps. The first is the lack of many needed standardized data formats. One big help in this regard could be microformats. The second big remaining stumbling block is that of identity. Big applications, whether on the web or not, have the significant advantage that you just log in once, and then can easily use the different components of the app together.

So how could this identity barrier be knocked down? Well, how about microchunking identity?

As it so happens (or more like, partly motivating this post), microchunking identity is what I just talked about at BarCamp SF. I’ll get to that in the next post.

Could nationalization correct for long-term oil costs?

May 11th, 2006

A recent Business Week cover story points out that despite recent massive profits at big oil companies, their future ability to meet world demand is uncertain at best. A big reason why is that more and more reserves are under nationalized control:

In the 1960s, 85% of known reserves worldwide were fully open to the international oil companies. That number is now 16%. The rest of the world’s oil and gas is either restricted or entirely cordoned off. National champions such as Saudi Aramco, Kuwait Petroleum, and Mexico’s Pemex outweigh publicly traded oil companies in the production contest.

When it comes to commodities, public ownership is a major meta. The article makes the well-documented point that national oil companies are not profit-driven, and so tend to lag in technology, production capacity, and information transparency. This has historically led to a more expensive, volatile market for oil. Now, while it seems clear that price volatility is an enormous negative, both for the world economy and for investments in new energy sources, it isn’t so clear that slower production and higher prices are such a negative.

Let’s assume that the perfect competitive market for oil existed. Then prices would be close to that necessary to cover the cost of extracting enough oil to meet world demand at that moment in time. Adding in speculation and derivatives might help prices take into account likely world demand in the near future, but uncertainty wipes out the possibility of taking into account any longer-term costs.

So in this scenario, the likely outcome seems to be the one widely feared: that prices will remain relatively low until demand outstrips supply given current extraction techniques, at which point prices will spike. While new technologies will be deployed as prices rise enough to justify them, there’s certainly a good chance that the time window affecting prices will be shorter than the time needed to develop and deploy these new technologies. In this case, volatility would be extreme, and the high societal cost of this volatility would have been an externality, missed by the market.

Could it be that the poor performance of nationalized oil companies, by reducing effective supply prematurely, might help correct for this missed cost? By leading to higher prices earlier, it could be argued that they could lengthen the window available to develop new energy sources. The key seems to be that to encourage this development, the reduced supply and higher prices must be consistent and reliable.

Unfortunately, history seems to indicate that in general, national oil companies drift lower and lower in production until additional funds are desperately needed by the government. At this point international oil companies, always waiting in the wings, are called in to help fix things; then production surges, prices plummet, and any efforts to develop alternatives are wiped out. If anything, steady constriction of supply seems to work best under authoritarian regimes, such as OPEC since the 70s, where powerful governments are able to dictate production to oil companies, regardless of whether they are privately or publicly owned.

So it seems to me that the answer is no, nationalization doesn’t really help the situation overall. If it could somehow be ensured that national policies would take advantage of not needing to maximize profits, instead reliably and consistently restricting overall supply, then it could be possible that the impact of “peak oil” could be blunted. But at least in democracies, political motivations can be at least as short-sighted as economic ones; and if they are allowed to share in the profits, private oil companies will always be eager to help out. All this makes it hard to imagine a scenario where nationalization could effectively help matters.

PS: A great resource for thinking about this stuff is Daniel Yergin’s “The Prize”, as I was reminded by Ethan Stock’s latest at OnoTech.

Update: Hugo Chavez of Venezuela has made a novel proposal: to offer long-term deals for oil at $50 a barrel. His purpose is right in line with the above: to prevent future price plunges from destroying the worth of alternative sources invested in now. Chavez is interested in his own “tar sands,” which if economically recoverable would give Venezuela the highest reserves in the world. $50 is $15 less than current prices, but $10 more than the price needed to make this alternative source of oil viable.

This same approach could also make Canada’s tar sands viable, potentially blunting the power of middle eastern oil and removing the threat of any near-term shock from “peak oil”. Of course, the problem with this entire approach is that middle eastern oil is extractable at around $2 a barrel, much less than any tar sands source. Which brings the issue back to political will: if offered lower prices by middle eastern sources, the enormous amount of money and economic leverage involved makes reneging on any such long-term deals (or smuggling) close to impossible to resist.

EconoMetaAds

May 9th, 2006

After all my theorizing, I’d been meaning to try out some ad networks here on this blog as a way to get some first-hand experience. Inspired by Battelle finally buying some ad space, I figured I’d finally sell some. So there they are, in the sidebar. It’s already been interesting, and I’m looking forward to learning more as I try different approaches…and of course, earning millions from my vast readership. :-)

Measures as meta and their economic impact

April 25th, 2006

Recently I keep seeing variations of the same theme pop up around the web: that in a complicated world, we have to try to simplify things by using easily stated and compared measures; but that these same measures tend to distort things, since they sometimes become more important than the reality they purport to represent in the first place.

A great example is GDP, recently dubbed “Grossly Distorted Picture” by the Economist (subscription required, unfortunately). GDP is a very limited and inaccurate measure of economic activity, let alone national well-being. In fact, GDP was initially created to just be a planning tool for wartime production in WWII. But nevertheless, boosting GDP has become a central goal of many countries, which sometimes can lead to strange distortions.

Another example is Alexa’s measure of pageviews to a web site, which has become an easy way to gauge the “success” of a site. But as Alex Castro points out, this measure is in some ways even worse than GDP. Not only can a site attract lots of pageviews without doing anything useful for users, but sites that use AJAX, Flash, or other modern technologies are perceived as less successful because of the fewer page views they generate.

The need to simplify isn’t going anywhere, and so neither are measures and their problems; but it seems worth keeping in mind that any given simplification might be hiding important details, and that it might be worth taking a closer look before drawing any conclusions.

Auctions and inefficiencies in online advertising

March 4th, 2006

The mentions of “Vickrey auctions” I recently came across via various posts led me to this paper. It’s a really interesting analysis concerning how bidding works on search engines like Google and Yahoo, and how advertisers are not necessarily being as well-served as they could be. It also clarifies what “Vickrey” means, and how it was mis-used by both Google and the NYT article referenced in the posts above.

A much-touted feature of many search engine keyword auctions is that even if you bid a high amount on a given keyword, the amount you actually pay is reduced to be a penny above the next-lowest bid. This tends to make advertisers feel safer in placing bids that are high; in fact, at least in the past, it was argued that it made the best approach “truth-telling,” or bidding the actual value of what the ad is worth to you. Google even justified this by referring to Nobel prize winning research showing that such truth-telling had been proven to be the optimal strategy for bidders in “Vickrey auctions” like Google’s.

Truthful bidding is simple and easy for advertisers, and by discouraging low-bidding, tends to increase search engine revenues; so it would seem everyone wins. But as the paper shows, the system used by search engines isn’t a true Vickrey mechanism and does *not* lead to truth-telling being the optimal strategy. In fact, the paper shows that truth-telling under this system always leads to higher prices for advertisers than in a true Vickrey system.

Here’s a quick summary of the paper. If there were only one position in sponsored search listings, reducing the top bidder’s price to be essentially equal to the next highest bidder would be a “second price” auction. In reality there are multiple positions, with different values in terms of CTRs. In this case the obvious generalization, which Google and Yahoo use, is to reduce each winning bidder’s price to the next-highest bidder; this is called “generalized second price” or GSP. Vickrey-Clarke-Groves (VCG) is a more complicated system than this. VCG reduces each winning bidder’s price to be equal to the value lost to all lower winning bidders by being knocked down a position. As the paper shows, this price is always less than under GSP, and leads to truth-telling being the optimal strategy.

So a true VCG auction would better serve advertisers by really making truth-telling the best strategy and by lowering prices paid. But it would probably reduce search engine revenues as compared to the current GSP approach, and would eliminate the systems used by some sophisticated advertisers to constantly try to game GSP, which is inherently unstable and presents fleeting opportunities for gain via low-bidding. The paper points out that Ask and MSN, having less invested into GSP, have an opportunity to attain a comparative advantage over Google and Yahoo by using VCG to reduce these inefficiencies.

All this reminds me of another inefficiency in the online ad market: the high CPCs currently being paid for contextual placements due to mixing search keyword bids with contextual keyword bids. Google’s AdSense is certainly the most relevant example here. In the same way that advertisers are encouraged to bid their true value even though it might not be the most effective strategy, advertisers are also encouraged to extend their keyword bids to contextually targeted ads, even though such ads are known to have lower value than search ads. In fact, although it’s possible to enter different bids for contextually targeted ads, or to skip them altogether, the default behavior is for keyword bids to apply to all ads placed by Google anywhere; and defaults matter.

The result is that for publishers, AdSense has two enormous advantages over other ad networks:

(1) Every search advertiser on Google is by default an advertiser under AdSense. This automatically provides a large and varied pool of advertisers.

(2) The CPC paid by advertisers is by default set by the value of an audience who is actively searching for the keywords bid on. This leads to much higher prices than those actually based on the value of an audience who is just reading text containing the keywords.

In general, these advantages come at the expense of advertisers; it seems to me that this situation presents another great opportunity for competitors like Ask and MSN to differentiate themselves.

An interesting question is raised here: assuming conversion rates for contextual clicks are lower than those for search clicks, automatic inclusion in AdSense must lower the overall conversion rates for advertisers on Google. A recent report shows that indeed, Google comes in dead last among major search engines when it comes to conversion rates. The report attributes this to demographics, but maybe AdSense has just as much to do with it.

The real question, though, is this: have lower conversion rates led to lower keyword bids at Google as compared to other search engines? If so, this represents an effective transfer of money from search profits, where Google is already dominant, to buying market share in the placing of ads on other properties, where Google clearly plans to become dominant. As with the GSP/VCG issue, it’s impossible to say how much of this is accidental and how much is deliberate; but in the end, it amounts to both an impressively effective strategy for Google and a significant opportunity for competitors as these inefficiencies become understood and are wrung from the market.

Update: Ed Clarke, the Clarke in VCG, shows up in the comments! Blogs are amazing.

A (separate) emailed comment notes that Google’s “smart pricing” automatically lowers bids for contextual ad space using an algorithm that is supposed to take into account its lower value. I hadn’t been able to find any mention of this on the Google site, but searching more broadly, I found this and this: looks like this has been around since ‘04. Thanks for the correction, and sorry I missed this!

The thing is, it makes me wonder even more: if “smart pricing” is really taking ROI into account accurately, why are overall conversion rates so low on Google? Maybe it really is demographics…or maybe the algorithm still results in higher than justified CPCs paid. It’s hard to know for sure, since the alorithm is not public, but two facts seem to remain: compared to other search engines, Google conversion rates are low; and compared to other contextual ad networks, Google CPCs paid are high.

Designing for power laws

February 13th, 2006

Ben Hyde makes the great point that systems are often designed with an implicit assumption of uniformity in traffic loads, when in fact these loads usually follow a power law. Ben focuses mostly on network design, but this point is just as valid for application design.

It’s understandable that this happens: you sit down to design the system, and figure “OK, if a user does X then we’ll look up record Y and do Z” when in fact the reality is that most actions will be a small subset of the possible ones and will be initiated by a small subset of users. Keeping this in mind can change your design in a major way; *not* taking this into account can frustrate the users who account for the majority of activity and whose opinions have a big effect on the reputation of your app. At an even higher level, this gets beyond performance and into the realm of features. Specific solutions for dealing with “power users” such as “expert modes” have a mixed reputation, but ignoring the problem won’t make it go away.

It’s interesting that this point runs contrary to the usual “long tail” argument that focuses attention on the *less* active users or objects. I guess the lesson is that we should design for both the head and the tail, and not fall into the trap of designing for a mythical “average user” who doesn’t really exist.

Mashups

February 1st, 2006

MashupCamp looks to be a great event, although with 300 attendees, a museum host, and sponsorship by Sun, Microsoft, Yahoo, and eBay, it probably won’t be the same kind of sweaty, seat-of-the-pants affair that other “camps” have been.

In the meantime, I don’t know how I missed it up until now but John Musser’s ProgrammableWeb site on mashups is just amazing, especially the mashup matrix. If you haven’t seen it already, check it out.