Saturday, January 19, 2008

Personal control of personal information

Personal control of personal information: [...] there's a very interesting move afoot to set up an open standard to describe user "attention" data - which I gather includes browsing history, mostly, but could also certainly include any other information about what a user is interested in...Netflix reviews, Amazon purchases, search queries, you name it. The hope is to break this data away from the many different sites (each if which controls a piece of it) and put it in the hands of users, who can then go and get purchase recommendations (or what have you) from whoever does the best job. [...] [...] I don't see any particularly horrible technical roadblocks, but I do see a lot of interesting technical problems (e.g. reference resolution over this data!) and of course there might be pushback from the people that control the data now. (Via Cranial Darwinism.)

I see two big technical and political obstacles:

  • Privacy and security controls: how can the user control in a sufficiently fine-grained way who can use what pieces of their attention data for what? There's no solution on the horizon that avoids a trusted third party, and even then it will be difficult for the user and the various parties to reach common understanding about particular uses (see the recent issues with social networking sites, which are not just political).
  • Data collector confidentiality: Attention data related to a particular site, say an online merchant leaks out a bit of information about that site's operations. If enough attention data collected by Acme Corp. is made available by Acme's users to BottomFeeder Inc. for a consideration, BottomFeeder may be able to reverse engineer important aspects of Acme's recommender system, for instance. Preventing such leakages is very hard.

1 comment:

William Cohen said...

I guess a clarification is due here: there are plenty of research issues, but I don't see any obstacles to making the current situation better than it is.

So, "how can the user control in a sufficiently fine-grained way..."? A simple baseline would be: scape your own dat back from every place you have an account with, decide on a vendor-by-vendor basis what you want to release, and by default only release to the place it came from. Pretty safe, and if someone thinks they can do something interesting with my amazon music purchases maybe I'll let them. Can we do better? I'm sure we can, but right now we're doing worse.

"Attention data related to a particular site, say an online merchant leaks out a bit of information about that site's operations." True enough! on the other hand, you can make the same statement replacing "merchant" with "customer" and it's still true, yes?

I'm certainly allowed to tell you what I bought at Amazon (and in fact companies often encourage this)...which is all we're talking about, just at scale.