Archive for the ‘Technology’ Category

Wired Magazine Doesn’t Understand Science
Thursday, June 26th, 2008

There’s been a lot of talk on the statistics/machine learning/computer science blogs this week about an article in Wired called The End of Theory. Basically, everyone thinks the author, one Chris Anderson, has lost his damn mind. The piece argues that the enormous amounts of data available to modern computers, combined with advances in statistical modeling and analysis techniques, will lead to a time when the old scientific method is no longer used. The argument is that we will give up the practice of building and testing hypotheses in favor of querying huge databases for correlations. I’ll use the same passage as Ed Felten to sum up the article:

[...] The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.

Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.

But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. Consider physics: Newtonian models were crude approximations of the truth (wrong at the atomic level, but still useful). A hundred years ago, statistically based quantum mechanics offered a better picture — but quantum mechanics is yet another model, and as such it, too, is flawed, no doubt a caricature of a more complex underlying reality. The reason physics has drifted into theoretical speculation about n-dimensional grand unified models over the past few decades (the “beautiful story” phase of a discipline starved of data) is that we don’t know how to run the experiments that would falsify the hypotheses — the energies are too high, the accelerators too expensive, and so on.

Among the interesting reactions, we’ve got Andrew Gelman, Drew Conway, Fernando Pereira, Cosma Shalizi, and Ed Felten. They have a range of more and less technical reasons for disagreeing, all of which are interesting and seem on-point to me. Dr. Felten’s explanation of his disagreement is the easiest to understand:

To take a simple example, suppose we discover a correlation between eating spinach and having strong muscles. Does this mean that eating spinach will make you stronger? Not necessarily; this will only be true if spinach causes strength. But maybe people in poor health, who tend to have weaker muscles, have an aversion to spinach. Maybe this aversion is a good thing because spinach is actually harmful to people in poor health. If that is true, then telling everybody to eat more spinach would be harmful. Maybe some common syndrome causes both weak muscles and aversion to spinach. In that case, the next step would be to study that syndrome. I could go on, but the point should be clear. Correlations are interesting, but if we want a guide to action — even if all we want to know is what question to ask next — we need models and experimentation. We need the scientific method.

It’s true that correlations are enough if all you want to do is make money selling ads. In that case, as Anderson says, “Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.” But scientists interested in human behavior would see this argument as completely backwards. To a scientist, the behavior is not “the point,” but a place to begin. Science is a process of forming an understanding of the world we live in, and the one thing data mining doesn’t produce is understanding. It may produce actionable predictions, but it won’t explain them to you.

For instance, here’s another claim from the article:

The best practical example of this is the shotgun gene sequencing by J. Craig Venter. Enabled by high-speed sequencers and supercomputers that statistically analyze the data they produce, Venter went from sequencing individual organisms to sequencing entire ecosystems. In 2003, he started sequencing much of the ocean, retracing the voyage of Captain Cook. And in 2005 he started sequencing the air. In the process, he discovered thousands of previously unknown species of bacteria and other life-forms.

It’s great that Craig Venter is able to sequence a bunch of genomes. Everyone agrees this is a cool project. But, more than anything else, it’s a starting point. A bunch of DNA sequences on disk may produce interesting correlations, but they don’t advance our biological understanding of global ecosystems until they’ve been used to build testable hypotheses.

A couple of months ago I was hanging around in the lobby before an invited lecture on machine learning, and wandered into a conversation between the speaker and a couple of the CS faculty here at UCI. Since I’m not entirely comfortable quoting professors from memory months after the fact and without asking permission, I won’t say exactly who, but it was one of the people high up on this list. Anyway, the person in question is an expert in the fields of machine learning and data mining. So I came into the conversation late, and just caught someone repeating a clam from elsewhere that soon machine learning would make the scientific method obsolete; it was a claim very much like Anderson’s. And this professor, whose research involves thinking up clever new ways to mine data, said, “I think that’s exactly the wrong way to think about it.” I don’t remember the rest of the quote verbatim, but the gist was: In a perfect world, machine learning and data mining would become unnecessary, because we would have a sufficiently complete understanding not to have to resort to them. They are effectively stop-gap measures, which we rely on to make predictions (and, in a lot of cases, money) when we’re willing to act without having (or understanding) interpretable reasons. But we shouldn’t look forward to a world when we can stop searching for that understanding.

Mathematica 6, Hardy Heron, Eclipse
Sunday, April 27th, 2008

Another nerdy post meant more for people Googling “Mathematica Hardy text disappeared” than for the regular blog patrons. Feel free to skip it.

(more…)

Ubuntu Gutsy, ATI x1300, Suspend, Compiz, Eternal Happiness
Wednesday, February 20th, 2008

This is a post of supreme geekiness. I’m putting it up in case someone is out there Googling the same terms I did, all day today. Everyone else is welcome to ignore it.

(more…)

Impressively Close, Under the Right Distance Metric
Friday, February 15th, 2008

I finally got around to acquiring the January update for my iPod touch this week. In general, this probably isn’t blog-worthy, but I was really impressed by the new Google Maps tool. On the iPhone, this application tries to locate you by triangulating your position based on cell phone signals. On the touch, that information isn’t available, so it uses wireless networks to approximate your location.

Here’s what happens when I use this tool from my apartment:

Impressively Wrong

Depending upon how you look at it, the result is is either completely wrong (off by 2,975 miles), or exactly right (but dated by 18 months). Going with the latter, I think that’s quite a feat.

Toys for the Interwebs
Friday, July 7th, 2006

Do me a favor and take a minute to play with this. It’s a demo application for a framework called Echo2. This framework “removes the developer from having to think in terms of ‘page-based’ applications and enables him/her to develop applications using the conventional object-oriented and event-driven paradigm for user interface development.”

Basically, it allows a person to create very nice website/applications without having to deal with the usual technologies of the internet. Which is nice, because the sort of people who are capable of designing and implementing complex, elegant applications are all snobs who believe HTML is a pain and javascript is ugly.

When you’re done playing with that, tinker with google spreadsheets.

I’m sort of curious as to whether or not people think that these sorts of “internet applications” are good things. There’s already a lot being written about this sort of thing, so I doubt we’ll have anything new to add, but I’m curious nonetheless.

For my part, I remain curmudgeonly and unconvinced. I like for my applications, my configurations, and my data to live on my hardware. My life would be easier if I could change my mind; I use about five different computers in three different operating systems every week, which means learning a ton of different applications to do the same jobs. Each application on each computer requires its own configurations, and my data becomes spread out, which is a hassle. It seems like I am exactly the sort of person who should jump at these ideas. Nonetheless, I continue to configure Thunderbird to check my GMail account on every operating system on every computer, because the GMail interface simple doesn’t do it for me.

These ain’t ‘Conflict Diamonds’, is they Jacob?
Monday, May 8th, 2006

Clifford at Cosmic Variance scrubs the kitchen floor, and returns with this:

This might be old news to some, but did you know that you can get the remains of your loved ones turned into a diamond?! [...]

There is a company (or companies, e.g Life Gem) that does this for you, and sends you a nice piece of jewelry made out of your dearly departed. There are slight variations in colour of the finished product…. depending upon the non-carbon “impurities” (if you pardon the term) associated with the source material. So it is very personalised indeed.

Dr. Johnson does not report the price of the process, so I went and looked. Having the family turned into an heirloom costs between $3,500, for .2-.29 carats, up to $13,000 for .9-.99 carats.

Isn’t That Where Scary Men Yell At You?
Friday, April 7th, 2006

Just one question for all of the people making such a fuss about Boot Camp.

Are you aware that dual booting sucks?

Gruber points out that it’s a good thing for Mac because it might remove the last stumbling block before a lot of people. But I’m not really sure that it will. I used to keep Windows on my Linux boxes, so that I could play games and whatnot. But all that really meant was that I never played any games and I wasted 5 or 6 gigabytes of hard disk space.

Suppose that you’re a Windows user, and for you Application X is the “killer app.” Whatever it is — Halo or Windows Paint or whatever. Sure, you might think for a second, “Sweet, now I can have a better experience from my OS, and just use Windows when I need Application X.” But after a minute, you’ll also think, “Wait, do I really want to have to stop everything else I’m doing and reboot my computer to use Application X? Do I really want to have to maintain two copies and configurations of a lot of software (mail reader, IM client, etc) in order to exist comfortably in the operating system where I use Application X?”

No, you don’t. So you’re back to choosing between Application X and your operating system.

It’s also true that this is a no-lose situation for Apple. But it isn’t news worthy, because there’s just to way that it wins more than a handful of converts.

Someone Call the Smithsonian
Thursday, February 23rd, 2006

DVD Jon has, in an offhand manner, offered the book from which he learned assembly language for sale. It’s too bad that he’s the antithesis of people like Gates and Jobs, the types who pay way too much money for pieces of technological history, because that’s a pretty cool piece.

Sony’s Turn to Creep You Out
Wednesday, November 2nd, 2005

Here’s a good reason to stop buying cds, if you hadn’t already.

In fact, while I’m at it, don’t use crappy media players that ship free with content, or Windows.

Well, That’s Creepy
Tuesday, October 18th, 2005

In case the government hasn’t creeped you out yet this week: you may want to know whether or not your printer is on this list.