2011年5月31日 星期二

The sexy little software that spots smutty jokes

Many years ago, there was a cricket match between the West Indies and England. (Cricket's a little like baseball, only takes longer and often no one wins.)

The radio commentators were talking about a bowler (pitcher) called Michael Holding. The batsman (hitter) was Peter Willey. One of the commentators then said, quite naturally: "The bowler's Holding, the batsman's Willey."

After a nanosecond of silence, the whole commentating crew was reduced to monstrous giggles. They couldn't stop. There was nothing they could do. Because, well, one of them had suggested that the bowler was holding something very personal belonging to the batsman.

These little word plays tend to please boys without end. It's one of the things that distinguishes boys from arrogant twerps of metal like Watson, IBM's sad little Jeopardy player.

Talking of computer science (thank you for your patience), it seems that two boffins at the University of Washington have created software that actually, finally manages to detect a little nuance of smut beneath the skirts of a seemingly innocent sentence.

Chloe Kiddon and Yuriy Brun say their creation can detect all of those jokes so beloved by the recently departed Michael Scott from "The Office"--yes, the "that's what she said" jokes.

According to the New Scientist, these two have created a wonderful way for computers to finally stand with us in a bar and guffaw while holding a large beer.

The wording of the scientists' research abstract is really quite something: "We identify a subproblem--the 'that's what she said' problem--with two distinguishing characteristics: (1) use of nouns that are euphemisms for sexually explicit nouns and (2) structure common in the erotic domain."

Well, of course. I wish I could have put it that way myself. What they did was try to find every possible word that substitutes for every possible smutty noun and then cross-reference them with the way people, ahem, insert a little gentle filth into language.

Because I am an idealist, I want to believe that this software--which is called Double Entendre via Noun Transfer, or DEviaNT--is perfect. I want to believe that I could tell it anything remotely double-entendrish and it would titter on cue.

My idealism rarely gets me anywhere. For, though I am sure theirs is a deep and worthy quest, there is some way to go. The New Scientist says there was a 70 percent success rate (with the scientists themselves claiming that with just a little better data, they could push this to 99.5 percent).

However, the scientists' abstract says: "Experiments on Web data demonstrate that our approach improves precision by 12 percent over baseline techniques that use only word-based features."

I assume, therefore, that computers are 12 percent closer to being a little funnier--or rather to knowing when some fickle human is trying to be a little funnier.

Some might wonder whether this is all worth it, especially as double-entendres don't ever stand still. They seem to expand into new areas, with new nuances being thrust upon old words.

As if to nudge my suspicion, one paragraph from the scientists' work made me laugh, perhaps unreasonably: "Let SN be an open set of sexually explicit nouns. We manually approximated SN with a set of 76 nouns that are predominantly used in sexual contexts."

Will you be doing any manual approximation today?


View the original article here

沒有留言:

張貼留言