Currently
64°
Partly Cloudy

Advertisement





Opinion

CLASSIFIEDS


Advertisement


Free Ad

Place an ad
in print and online, 24/7 for free, select the Clean Sweep option. Unable to submit Real Estate, Services, and Business Investements at this time.

Get a Subscription


Map the Valley


Subscriber/
Reader Services

Subscribe Now
Contact Customer Service



Another View: Too much information on the ‘Net can damage privacy

When it comes to online privacy, we all appreciate the risk of publicizing juicy information such as incriminating photos or credit-card numbers. But few of us realize a subtler threat: In abundance, innocuous, everyday data can divulge sensitive information as well.

Some questions shouldn't be asked. Employers, for instance, generally are not allowed to discriminate based on marital status, sexual orientation and so on. But our growing digital footprint is threatening our ability to dodge inappropriate inquiries. Through data mining, employers, insurers, advertisers and others can infer the answers to private questions without even asking.

They need two things: a heap of personal data, and the techniques to crunch it. Both are readily available.

People generate and share more information than ever. Besides consciously generated Web content such as blogs, Facebook profiles and YouTube videos, a steady stream of data is exchanged in the background. Companies track our searches, browsing and shopping behavior. Personal electronic devices can silently disclose our location while we post status updates and photos to the Web. All this seems innocent enough -- and the more others do it, the safer we all feel. After all, what's one more Twitter update among millions?

But the crowd doesn't hide you. Instead, it is the key to coaxing value from your information.

Data mining relies on the principle that certain information -- though useless in isolation -- can take on new meaning when viewed en masse, or combined with other data. Scientists already use this technique.

There are two main approaches. First, data integration involves combining different types of data to learn something new. Consider a photograph of a bicycle: Alone, it's an abstract representation. But tag the photo with your home location and a time stamp -- and a public listing identifying the bike as stolen -- and suddenly it becomes very meaningful.

A second approach is data aggregation. Gather enough of a certain type of data, and trends emerge. For instance, a cell phone's location can be determined by tracking its signal. By aggregating enough location data from a single cell phone, we derive an increasingly reliable map of one person's regular routes of travel. From this, we can estimate where the phone's owner is likely to be at a given time and perhaps even guess his home location, income and so forth.

Fusing these approaches is even more powerful: that is, combining and mining multiple data sets, each very large. Google did this last year, pairing aggregate Web search queries with location and timing data to predict which regions would next come down with the flu. It outperformed the Centers for Disease Control and Prevention.

Bringing data mining out of research labs and applying it to personal data is surprisingly straightforward.

Suppose a researcher wants to guess something about you -- say, your political party affiliation. This becomes the target variable, the hidden question. Almost any other individual feature -- smoking, hair color, preference in breakfast cereal -- can be correlated with this target. To do this, the researcher needs to do some background crunching on a "gold standard": a small, representative and reliable group for which the relationship between the target and a given feature is known. Individually, most such correlations are weak, and don't matter much. Some may not matter at all.

But even if no single variable correlates well with the target, a group of them together may. So the researcher characterizes many different combinations of variables and correlates them to a target variable, such as political affiliation. These correlations are often impossible to spot by eye, but computers excel in sniffing them out.

And given several modest correlations, a stronger prediction can emerge. This is the power of data mining: combining weak correlations to generate a powerful statistical predictor of the target variable. These predictions are rarely 100 percent correct, but they don't need to be. And the more correlations are known, the more an answer-seeker can rely on clusters of shadow attributes, innocent bits of freely available data that correlate strongly with the target.

In a world of unlimited data where all statistical correlations have been computed, it becomes unnecessary to ask a forbidden question. The answer can be approximated quietly elsewhere, everywhere, through the web of interconnected data that describes each of us. Today a job candidate can remove her wedding band or otherwise demur when questioned: tomorrow her reply may be impossible to conceal -- or withhold.

The larger our personal data trails grow, the more severe this threat becomes. So if we care about protecting individual privacy in a meaningful way, it is no longer enough just to forbid taboo questions. We must also prevent parties from harvesting and crunching data in a manner that circumvents the need to ask them at all.

Seringhaus is entering his third year at Yale Law School. Gerstein is the Albert L. Williams Professor of Biomedical Informatics and a professor of molecular biophysics and biochemistry and computer science at Yale University.

(June 2, 2009)

POST A COMMENT

 

Hanfordsentinel.com encourages readers to engage in civil conversation with their neighbors. Comments that are submitted are not posted to the site immediately. They go into a queue to be moderated and may take several hours to be reviewed, particularly if they are posted after normal office hours.

We reserve the right to remove comments in total that violate our code of conduct. If you want to report a violation, please e-mail editor@HanfordSentinel.com

For more information please read our Terms of use, and Rules of the Road.

 


Please log in to post comments
*Member ID:
*Password:
  Forgot Your Password?
 
If you don't have an account you can create one for free by clicking the link below.
CREATE ACCOUNT
The following are comments from the readers. In no way do they represent the views of the Hanford Sentinel

Alihandero wrote on Jun 2, 2009 3:59 PM:

" In this case an identifier that might be considered as a political prejudice or a protected minority can be used as a positive tool for generating opportunities rather than denying rights or other exclusionary acts.

You can taylor marketing strategies and specific offers customized to all sorts of identified individuals and groups: blacks, anglos, hispanics, teachers, health care providers, geriatrics, homosexuals, etc.

However, too much specific unique private information should and must be protected by effective laws. This is easier said than done.

The individual must always be given the choice to be excluded and to opt out of the use of their private information by second and third parties.

Period. "




Advertisement


HOT TOPICS

> More Hot Topics


MORE LOCAL NEWS

Lemoore:

    Selma:

    Kingsburg:



    PHOTO GALLERIES

    "More Photos

    Sentinel Photos (120) Albums

    Hanford High vs. Lemoore High Water Polo
    Hanford High vs. Lemoore High Water Polo
    Friday, November, 6 2009
    (14) Photos
    Tigers vs. Bullpups Volleyball
    Tigers vs. Bullpups Volleyball
    Friday, November, 6 2009
    (22) Photos
    Hanford West vs. Redwood High Football
    Hanford West vs. Redwood High Football
    Friday, November, 6 2009
    (13) Photos

    Reader Submitted (7) Albums

    Vintage Hanford
    Vintage Hanford
    Monday, December, 15 2008
    (1) Photos
    Vacation Photos
    Vacation Photos
    Thursday, November, 20 2008
    (35) Photos
    Events
    Events
    Thursday, November, 20 2008
    (38) Photos

    More



    EMAIL UPDATES

    Sign up today to get all your local headlines delivered to your home or work e-mail address, so you don't miss the latest in breaking and local news.
    E-Mail:
    Daily News Updates
    Breaking News Alerts