NLP is Hard

Thursday 17 September 2015

Tags:

One of the areas often associated with Artificial Intelligence is that of Natural Language Processing, or NLP. The ability to communicate and interact with a computer using normal language – in my case English.

One of the areas often associated with Artificial Intelligence is that of Natural Language Processing, or NLP. The ability to communicate and interact with a computer using normal language – in my case English.

NLP is hard. This is easily demonstrated with the phrase:

“Time flies like an arrow, fruit flies like a banana.”

The construct “subject flies like object” can have two different meanings depending if you interpret it “subject verb preposition object” or “subject noun verb object”. How we choose to interpret the sentence is based on context that we infer. There is, to my knowledge, no species of fly called the time fly so “flies” must be a noun. Similarly, while it is grammatically correct to assert that all fruit shares the flight characteristics of a banana, it is unlikely that all fruit does.

Thus it is more likely we are talking about the fruit fly and “like” is a preposition.

But it gets worse. In our new offices we have a Board Room. As expected it contains a long table with rounded ends. You’d be forgiven for thinking that it’s called the Board Room because it contains a board room table. But then the meeting room also contains a similar table.

Perhaps it’s where we expect the board to meet. After all, it’s got the projector in it, and is the nicer of the two meeting rooms. Or perhaps it got the name because there is a blackboard wall, and the name quite literally means “the room with the blackboard in it”.

I honestly can’t remember when and why we started calling it the Board Room. Nor can we determine the exact reasoning through inference. We can only determine probable reasons.

It might be possible, by careful questioning of Rainbird members and previous tenants of the office, to determine a most probable reason for the name. An interesting etymological exercise, but pointless for NLP.

To fully understand the term “Board Room” you don’t want to determine if it’s “board” (committee) or “board” (thin, flat piece of hard material). Instead you want to determine that its “Board Room”, a label for a location, whose exact meaning and derivation are not important.

And if you think that’s bad, consider that “Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo,” is a valid American English sentence. NLP is hard enough for humans, let alone computers.