08 Oktober 2005 @ 17:39
A note to Google  

Dear Google:

Please trust me to specify all all the terms I want you to search for. I have many years' experience dealing with computer programs that do exactly what I tell them to, and I'm pretty good at formulating search queries that will return the results I want. I don't need you to try to extrapolate other search terms I might be interested in; I just want you to perform the search I specify. If it doesn't turn up what I'm looking for, then I can probably revise it successfully myself.

For example, suppose I am looking for a midi file of the song "Jesse James," and I type the following into your search box:

If I then press Return or click the Search button, what I would like you to show me is a list of Web pages, sorted after your inimitable fashion, that contain the word midi and the phrase Jesse James. And I appreciate the fact that most of the results you give me do in fact meet the criteria I have specified. I am not, however, interested in results like these:

Harley Davidson Custom Parts at Discount Prices in Cyborg Cycles ...
Discount Harley Parts: Parts For Your Harley: Jesse James West Coast Chopper
www.cyborgcycles.com/ - 12k - Cached - Similar pages

Mid-South Coliseum 1996 (Jarrett)
Mid-South Coliseum drawing 1000. Jesse James Armstrong beat Tracy Smothers (7:35).
The Giant Warrior pinned King Cobra (7:12). ...
www.prowrestlinghistory.com/memphis/jarrett/1996.html - 17k - Cached - Similar pages

Paul Jesse James, Wichita KS ... KY, early 1800's then to Clay Co., IN, early to
mid-1800's, then my gg-grandfather, Thomas R. James and his brother went on ...
www.rootsweb.com/~daisy/jameskin.htm - 15k - 7 Oct 2005 - Cached - Similar pages

Yes, I know that mid is pretty close to midi orthographically, and that mid is semantically related to midi by virtue of the fact that many midi files end with the extension .mid. But I didn't ask you to search for "Jesse James" midi OR mid; I asked you to search for "Jesse James" midi, precisely because I didn't want any irrelevant results about the Mid-South Coliseum or the mid-1800s or mid-glide conversion kits (whatever those may be). I don't want to rule out pages with the word mid in them (as by entering "Jesse James" midi -mid), because that would potentially eliminate many relevant results, but neither do I want you to include any pages that contain mid but not midi.

I don't know how to specify this any more clearly in your search box, so please, please, just Do What I Say; don't try to Do What (you imagine) I Mean. I know how computer programs (are supposed to) work; I will jolly well Say What I Mean.

You're still an excellent search engine, and I did find what I was looking for this time, but if you start second-guessing the user, you'll find yourself on a slippery slope, at the bottom of which lie the hideous broken remains of Microsoft Excel. Don't go there.

Best regards,
Q. Pheevr

Nuværende humør: ever so slightly frustrated
Nuværende musik: The Ballad of Jesse James
lascribe on 8. Oktober, 2005 15:20 (UTC)
Google's search is getting fuzzier and fuzzier. When you use wildcards -- and I'm glad I can, unlike on other search engines -- the results are sometimes way off what you explicitly indicate.

(Though I wonder what a mid-glide conversion kit may be.)
Q. Pheevrq_pheevr on 8. Oktober, 2005 15:38 (UTC)

Wildcards are such a nice feature. I wonder why Google is undermining their effectiveness by making non-wildcards act as if they were wildcards? Presumably they think this will help people (perhaps especially people who aren't so adept at formulating their queries) find what they're really looking for, but it's really unhelpful to people like you and me.

Maybe they should have two versions: Google-DWIM and Google-DWIS. Then we could stage Googlefights between them—same query, different search algorithms, who gets the best results?

Google-DWIS and Google-DWIM
Resolved to have a battle,
For Google-DWIS said Google-DWIM
Could do no more than prattle....
Merlemerle_ on 8. Oktober, 2005 18:09 (UTC)
I'll put my money on Google-DWIS, ten to one.

Hmm, Googlefight. Simpler than the rocks/sucks tests, but.
Q. Pheevrq_pheevr on 8. Oktober, 2005 16:18 (UTC)

p.s. — I'd like to believe that a "mid-glide conversion kit" is a device that helps one produce the sound /ə̯/ (very handy for producing those BBC English diphthongs), but it appears to have something to do with motorcycles instead.

lascribe on 8. Oktober, 2005 16:33 (UTC)
I don't need no stinkin' conversion kit to produce BBC vowels. If anything, my vowels are too BBC. But I balance it with odd rhoticizations. (But I do have problems with [w] ... one of the two points where my German native language sticks its head out. That and voiced word-final stops.)

You had me reach for the IPA chart there. A non-syllabic schwa, huh?

queenlizzie on 8. Oktober, 2005 15:23 (UTC)
The folks on the discussion board I monitor recently started a thread entitled something like "WTF happened to Google?" where they were discussing the craptastic results of recent Google searches. The must have changed something fundamental. I don't like it at all.
Q. Pheevrq_pheevr on 8. Oktober, 2005 15:53 (UTC)

Maybe they really have switched to PigeonRank™ this time.

Henrytahnan on 8. Oktober, 2005 16:04 (UTC)
Google does the same thing with names. Woe betide anyone who wants to search for "Phil" and not "Philip": Googling "blade runner" phil shows how useless that is. Or compare "blade runner" android -android (zero hits) to "blade runner" phil -phil (330,000 hits).

Well--some names. Not Liz, or Rich, or Bob, or Mike...one search with "Becca" turned up "Becky" highlighted, though "Becca" did appear elsewhere on the page. Is it just Phil?

But the morphological relation thing does extend beyond names and file extensions: "blade runner" androids -androids gets 47,000 hits, all of them with "android".

It's terribly annoying. Perhaps worse than the "oh I bet this is what you meant" nature of it is how unpredictable it is; searches on Mike, Bob, Becca do fine, and suddenly a search with Phil ambushes you.
Q. Pheevrq_pheevr on 8. Oktober, 2005 16:13 (UTC)

I think I've encountered Google's singular/plural substitutions before, but I didn't know about the Phil/Philip business—that's really bad. I'm unpleasantly reminded of the people whom I have had to tell explicitly (and sometimes repeatedly) that my first name is my first name, and not just the first half of it.

Merlemerle_ on 8. Oktober, 2005 17:08 (UTC)
It's not just singular/plural. They conjugate verbs, as I discovered just yesterday.

I was searching to find when CompUSA acquired Good Guys. So, a standard simple search, ["comp usa" "good guys" acquisition] (figuring that hits from before the acquisition would help bracket when it occurred). I was most consternated to find matches -- and highlighted terms -- like "acquired" and "acquiring".

NO! That was not what I was searching for!

Maybe there's some sort of advanced advanced search where you can tell it to leave off on correcting you and just show the things you asked for. I truly hope there is...
Merlemerle_ on 8. Oktober, 2005 17:50 (UTC)
["good guys" "comp usa" acquired]

Sorry, should have provided proof.
Henrytahnan on 8. Oktober, 2005 16:17 (UTC)
A followup: androids -androids gives zero hits. As you'd expect it to; what you wouldn't expect is that adding a search term makes the results less restrictive.

Also returning zero hits: {androids -androids blade} and {androids -androids runner}. But give it {androids -androids blade runner}, even without the quotes, and stand back.
Q. Pheevrq_pheevr on 8. Oktober, 2005 16:25 (UTC)

Interesting. And it's not just the relatedness of the search terms that's doing it, either; you can get the same effect with androids -androids jade blunder; leave out either jade or blunder, and you get nothing, as expected; put them both in, and you get 213 results.

I guess it's not such a bad thing if a search that ought to come up empty doesn't, but I wish I knew what its little algorithm thinks it's doing.

lascribe on 8. Oktober, 2005 18:33 (UTC)
Just found out[1] that if you put quotes around your search term, Google doesn't do singular/plural substitution. Or add a "maybe" to that statement, given all the fuzziness.

[1] While investigating the canapés»canopies maybe-eggcorn
Merle: lambdamerle_ on 9. Oktober, 2005 08:55 (UTC)
Good find! It seems to block verb re-conjugation as well (at least on the three tests I tried).

Double-quotes: it's what's for Google.
Q. Pheevrq_pheevr on 9. Oktober, 2005 10:29 (UTC)

Well done! It appears to work for my test case, too; there are no spurious mid results on the first page, anyway, and there were in the quotation-markless search.

I guess the quotation-mark convention makes sense, but I would have thought (and hoped) that exact matches should be the default, rather than requiring to be specified.

(Deleted comment)
Vizcachachillyrodent on 9. Oktober, 2005 08:01 (UTC)
I was just thinking how appealingly geeky this whole thread was. Like, I don't know what the hell they're saying, but it's so darned irresistible!

Q. Pheevrq_pheevr on 9. Oktober, 2005 10:44 (UTC)

Thanks! I've added you, too.

The pottery in your latest post is gorgeous.

the_delithedeli on 9. Oktober, 2005 09:41 (UTC)
I know that it's witch-hunty to publicly agree to the obvious, but you couldn't be more right about the position of Excel™ on the slope. Word™ is nearby.

Meanwhile, will you please actually send that note to google, if you haven't?
Q. Pheevrq_pheevr on 9. Oktober, 2005 10:35 (UTC)
Meanwhile, will you please actually send that note to google, if you haven't?
Oh, I'm sure they can find it. Anyway, it seems less urgent now, in light of lascribe's discovery, although some of the paradoxical results noted by tahnan and by the folks at Language Log probably deserve a note of their own.
Prof. Bleen: Shizuku6_bleen_7 on 9. Oktober, 2005 11:07 (UTC)
The default settings in Microsoft Word are the scientific writer's worst nightmare, especially with respect to capitalization. I resent being informed that I have committed an error when I haven't, and resent even more my non-errors being automatically corrected. The enzyme ribonuclease is properly abbreviated RNase; I do not want it corrected to Rnase. Similarly, when I want to express the range of the index variable i, as in i = 1, 2, 3,..., n, I do not want i automatically capitalized.

Fortunately, all of these auto-correcting functions may be disabled in Word, but few people take the time to do so. Do you have a feeling for how default settings of word processors might be inadvertently guiding the evolution of written language?

(I'm a friend of cutiepi314 and of chillyrodent.)
wolfangel78 on 11. Oktober, 2005 16:53 (UTC)
The problem, of course, is that search engines are aiming themselves towards people with less computer knowledge/ability than you.

What bothers me most is their inconsistent morphing. (Especially in gmail.)

And Excel! The program which insists on opening in new windows for each but will not allow you to click the little red button to close just that window (unlike, say, Word).

