The other day I was browsing books online (something I spend an embarrassing amount of my time on) when I discovered such curiosities as these. Curious about this publishing house I’d never heard of with titles that were either a) identical to those of movies, comics, music videos, etc., or b) nearly nonsensical chains of free association, I became suspicious about this imprint called Alphascript Publishing.
This week I have learned that it is impossible for me to read undergraduate prose within fifty feet of an unoccupied computer. Fighting through their awkward sentence constructions and stilted, thesaurus-driven vocabulary takes more concentration than I can muster when there is anything remotely interesting to do nearby. Honestly, it’s a mystery to me how so many college students can be so bad at putting coherent ideas together on paper.
But, with this understanding, I was able to make good progress for a while this afternoon by isolating myself at a table outside of Phoenix Grill, a little coffee-and-sandwiches deal on campus. Then I got distracted by an out-of-place smell. Confused, I turned to the student a the next table over, and he answered a number of my questions at once.
Me: Does it smell like weed to you?
Student: Heh. Yeah.
(short pause)
Student: I’m smoking it.
Student: <showed me his bowl>
Student: <grinned dopily>
Me: Oh. Oooooh.
There’s been a lot of talk on the statistics/machine learning/computer science blogs this week about an article in Wired called The End of Theory. Basically, everyone thinks the author, one Chris Anderson, has lost his damn mind. The piece argues that the enormous amounts of data available to modern computers, combined with advances in statistical modeling and analysis techniques, will lead to a time when the old scientific method is no longer used. The argument is that we will give up the practice of building and testing hypotheses in favor of querying huge databases for correlations. I’ll use the same passage as Ed Felten to sum up the article:
[...] The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.
Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.
But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. Consider physics: Newtonian models were crude approximations of the truth (wrong at the atomic level, but still useful). A hundred years ago, statistically based quantum mechanics offered a better picture — but quantum mechanics is yet another model, and as such it, too, is flawed, no doubt a caricature of a more complex underlying reality. The reason physics has drifted into theoretical speculation about n-dimensional grand unified models over the past few decades (the “beautiful story” phase of a discipline starved of data) is that we don’t know how to run the experiments that would falsify the hypotheses — the energies are too high, the accelerators too expensive, and so on.
Among the interesting reactions, we’ve got Andrew Gelman, Drew Conway, Fernando Pereira, Cosma Shalizi, and Ed Felten. They have a range of more and less technical reasons for disagreeing, all of which are interesting and seem on-point to me. Dr. Felten’s explanation of his disagreement is the easiest to understand:
To take a simple example, suppose we discover a correlation between eating spinach and having strong muscles. Does this mean that eating spinach will make you stronger? Not necessarily; this will only be true if spinach causes strength. But maybe people in poor health, who tend to have weaker muscles, have an aversion to spinach. Maybe this aversion is a good thing because spinach is actually harmful to people in poor health. If that is true, then telling everybody to eat more spinach would be harmful. Maybe some common syndrome causes both weak muscles and aversion to spinach. In that case, the next step would be to study that syndrome. I could go on, but the point should be clear. Correlations are interesting, but if we want a guide to action — even if all we want to know is what question to ask next — we need models and experimentation. We need the scientific method.
It’s true that correlations are enough if all you want to do is make money selling ads. In that case, as Anderson says, “Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.” But scientists interested in human behavior would see this argument as completely backwards. To a scientist, the behavior is not “the point,” but a place to begin. Science is a process of forming an understanding of the world we live in, and the one thing data mining doesn’t produce is understanding. It may produce actionable predictions, but it won’t explain them to you.
For instance, here’s another claim from the article:
The best practical example of this is the shotgun gene sequencing by J. Craig Venter. Enabled by high-speed sequencers and supercomputers that statistically analyze the data they produce, Venter went from sequencing individual organisms to sequencing entire ecosystems. In 2003, he started sequencing much of the ocean, retracing the voyage of Captain Cook. And in 2005 he started sequencing the air. In the process, he discovered thousands of previously unknown species of bacteria and other life-forms.
It’s great that Craig Venter is able to sequence a bunch of genomes. Everyone agrees this is a cool project. But, more than anything else, it’s a starting point. A bunch of DNA sequences on disk may produce interesting correlations, but they don’t advance our biological understanding of global ecosystems until they’ve been used to build testable hypotheses.
A couple of months ago I was hanging around in the lobby before an invited lecture on machine learning, and wandered into a conversation between the speaker and a couple of the CS faculty here at UCI. Since I’m not entirely comfortable quoting professors from memory months after the fact and without asking permission, I won’t say exactly who, but it was one of the people high up on this list. Anyway, the person in question is an expert in the fields of machine learning and data mining. So I came into the conversation late, and just caught someone repeating a clam from elsewhere that soon machine learning would make the scientific method obsolete; it was a claim very much like Anderson’s. And this professor, whose research involves thinking up clever new ways to mine data, said, “I think that’s exactly the wrong way to think about it.” I don’t remember the rest of the quote verbatim, but the gist was: In a perfect world, machine learning and data mining would become unnecessary, because we would have a sufficiently complete understanding not to have to resort to them. They are effectively stop-gap measures, which we rely on to make predictions (and, in a lot of cases, money) when we’re willing to act without having (or understanding) interpretable reasons. But we shouldn’t look forward to a world when we can stop searching for that understanding.
I know this is going to come as a great shock to you all, but some of the moderators at Conservapedia willfully promote misinformation in order to promote Republican political agendas.
“Pshaw,” you say. “It says right on their about page ‘Conservapedia is a clean and concise resource for those seeking the truth,’” you say.
Yeah, I know. It’s crazy. I didn’t want to believe that this bastion of truthiness would stoop to conscious misrepresentation of obvious facts, but there it is, in the history for their page on Barack Obama.
Apparently Barack referred to himself as having been a “professor” at the University of Chicago, while his official title was “Senior Lecturer.”
“STOP THE PRESSES,” you say. “THAT IS A BIG DEAL.” The people at Conservapedia seem to be with you; they must believe this is very important, because it is the subject of the third sentence on the page.
But really, it isn’t a big deal. Even here, where they’ll give anyone an adjunct position. You should probably calm down. Specifically, the University of Chicago doesn’t see a meaningful difference between professors and Senior Lecturers, saying “From 1992 until his election to the U.S. Senate in 2004, Barack Obama served as a professor in the Law School [...] Senior Lecturers are considered to be members of the Law School faculty and are regarded as professors, although not full-time or tenure-track.”
Several brave soldiers for truth have tried to edit the page to reflect this fact, but so far every effort has been reverted by Conservapedia moderators. Except for the time that Jareddr beat the moderators to the punch, and explained his removal of the link to the University of Chicago’s statement thusly: “Wasn’t a professor, just ‘Served as a professor’”. Likewise, Jareddr isn’t a literate adult, he just serves as an Internet troll.
Andrew Gelman posted a quote today describing a problem with the literature in the statistics community: all of their functions are named “p”. The post is interesting, but at first it made no sense to me. This is because I follow the blog with Google Reader, and there the post looked like this:
“Wait, the only symbol in statistics is a-hat Euro o-e? I’ve never seen one of those. Maybe this explains why stats papers are confusing to me.”
There is a student in my department who is known for this tendency to derail lectures. He always has a lot of questions, and even more ideas. He has a few pet methods (hidden Markov models and neural networks, in particular), and he always thinks the problem at hand could be solved via their application — in fact, he’d usually like for the class to listen along as he thinks out loud, until he has figured out exactly how this is all going to work.
Take today, for example. The class is split equally among computer scientists and biologists. The professor explained a problem, which every computer scientist immediately realized is a pretty canonical example of the sort of problem to which the “expectation-maximization” technique is applied. Then he asked if anyone had any ideas as to how to solve it.
Most of the computer scientists in the room turned to the nearest biologist and whispered, “Yeah, you use EM,” because we are all assholes who like to show off. But no one wanted to waste everyone’s time trying to explain EM without the use of prepared slides, so we waited quietly for the professor to continue.
Which, of course, is precisely the kind of opening that The Guy With Ideas was looking for. He pounced, and started a two-minute exposition that no one followed, but which ended with, “It’s kind of like you start with a hidden Markov model, and use something sort of like EM.”
While I was trying to figure out some way to get those two minutes back, the professor said, “Why do you need an HMM?”
“Because they’re good at solving lots of problems.”
“But they’re not applicable because [... there's a good reason].”
“Yeah, but I think they are, because [... nonsense, we've already covered the reasons why you don't need that kind of machinery].”
“Anyway, you’re half right. We’re going to use EM, but not hidden Markov models.”
Then the professor began to lay the foundation for explaining EM. It’s a long story, and it will probably take the whole next lecture to complete. Five minutes in, The Guy With Ideas pipes back up.
“Wait, what if instead we set this variable to 60% and that one to 25%, and then we [... this went on for a while, and was impossible to follow]. Basically, it’s kind of like EM.”
He seriously said this like 1) he had just invented the ideas behind EM, and had to give us all an example, and 2) he hadn’t just had the same idea a few minutes ago, and 3) the professor hadn’t said that he was correct, way back then.
The professor, who, three meetings into the course, is clearly struggling to deal with the interruptions, just said, “You’re right. We’re going to use EM.”
The Guy With Ideas apparently couldn’t believe that one of his ideas was correct, because he said, “Wait, I’m right?” Then he turned to the girl beside him, pointed two fingers into her face, and yelled, “BOOM! You couldn’t figure it out, and I’m right.”
Score one for you, Guy With Ideas. You totally showed the girl next to you, and it only ate up a total of about ten minutes of class time. I’d say that’s a victory for the record books.
I’m not sure what went on behind the scenes at One Language Log Plaza to provoke this devastating take-down, but Geoff Pullum completely fucks shit up:
Certainly, it is possible that the phrase dada kraut psych mindblowing conscience expanding sublime acid oriented arcana coelestia weirdness has roughly nine stacked attributive modifiers; but one cannot really tell, because it all depends on how it is parsed: doubtless “consciousness-expanding” (I add the helpful hyphen) is intended as a syntactic unit, but one doesn’t know about “kraut psych” and so on. This is basically the problem one finds with quotes from chimpanzee language: chimps are occasionally reported as having signed things with transcriptions like BANANA BANANA HELP REFRIGERATOR GIMME OPEN BANANA GIMME, and syntactically one does not really know where or whether to begin.Part of the problem here is that Eric is one of the younger staffers here at Language Log Plaza. They work with headsets on, they have X-men posters on their walls, they talk about whether Lara Croft’s breasts in the new Crystal Dynamics video game release are as big as before. The average age in their part of the building is approximately 19. They typically list their hobbies as (i)~being wicked cool, (ii)~dancing to their iPods in public places, (iii)~shopping at American Eagle, and (iv)~staying out all night. One does not see them at EVOO; they dine at place where the menu is a series of brightly colored pictures on glass with lights behind them, and often there is a neon sign in the window saying “BURRITOS AS BIG AS YOUR HEAD”. And their reading material does not fully meet the criteria for being called “language”.
Which raises the question: how much would you pay to see Belle Waring and Geoffrey K. Pullum in a heavyweight title bout?
Apologies for the comments not working. I’m looking into it; I need to talk to Adam about getting access to the server.
Meanwhile, a link for Monica, because she’s the only person I know who does this.
This school year, the University of Michigan Law School became the latest graduate school to block wireless Internet access to students in class, joining law schools at UCLA and the University of Virginia.[...]
“When you focus primarily on transcribing everything said, you are not making good use of the class as a practice opportunity,” she wrote in an e-mail to her law students, explaining her decision to ban laptops.
See. I knew it. This is why I never took notes as an undergraduate. Of course, Professor Hawks disagrees:
It seems to me there is an unrecognized selection effect here. Aren’t the students who take notes using laptops in graduate schools very likely to be the same few who did so as undergraduates? Except now they are many because they got admitted?
Or they’re not awesome enough to remember everything the first time. Law schools should only want awesome students.
GrrlScientist reports that Money magazine rated the top jobs in the country, and that college professor came in second. Grrl takes the interesting stance that, while she does desire the job greatly, there’s no way that it should be ranked that high, based on the criteria named in the article.
Unraveling the paradox, she says,
Don’t get me wrong; I wish to be a university biology professor because, despite everything, I still think it is the best job for me, but I never have engaged in that particular fantastical belief that being a college professor is a truly wonderful job for most people out there, all things being equal.
From there, she goes on to make several good points about the odd choices the magazine made when deciding who counted as a professor.
My own reaction was, “Sweet, someone in the world thinks it might make sense to someday want to rejoin academia.” Then I clicked through to the article, and I found out that the number one job is Software Engineer.
The best part is that they gave it the same “ease of entry” rating as for a college professor. Let’s see, on the one hand we have a job which requires a minimum of five years of post-bachelor education, a PhD thesis, multiple publications, teaching experience, possibly multiple years spent in a post doc position, and a grueling interview process. On the other, a bachelor’s degree from any school whatsoever, with a few courses in C++ along the way, and the ability to solve trivial logic puzzles during interviews. Yeah, that’s definitely comparable.
Last year, when Tongue, but No Door was lost for a while, most of our old posts had not yet made it into the Internet Archive. However, they seem to be creeping in, which is nice, because I have some updates for you.
First, you may remember my post about the unique sexual relationships of Bitch PhD. Nothing new to add to that discussion, but in case you missed it, I wanted to point out that Adrianne and Nick interviewed Dr. B for the third episode of Love & Radio.
Secondly, you almost certainly remember Monica’s post about institutional sexism. Now (via Neurodudes) I bring you an interesting paper by Peter Lawrence on gender and academia.
The paper is the kind of thing that Lawrence Summers might have meant to say, if he weren’t in all likelihood a douche. Discussion and extensive quoting from the paper follow, below the fold.