Higher-dimensional vectors used to express meaning of words and sentences to computers
27 replies, posted
[quote]AS YOU read this article, your brain not only takes in individual words, but also combines them to extract the meaning of each sentence. It is a feat any competent reader takes for granted, but it's beyond even the most sophisticated of today's computer programs. Now their abilities may be about to leap ahead, thanks to a form of graphical mathematics borrowed from quantum mechanics.
"It's important for people like Google," says physicist Bob Coecke at the University of Oxford, who is pioneering the new approach to linguistics. At the moment computers "only understand sentences as a bag of different words without any structure".
Coecke's approach, aired at a recent workshop in Oxford, is based on category theory, a branch of mathematics that allows different objects within a collection, or category, to be linked. This makes it easy to express a problem in one area of mathematics as a problem in another, but for many years was viewed even by its creators as "general abstract nonsense".
That changed when Coecke and his colleague Samson Abramsky used a graphical form of category theory to formulate some problems in quantum mechanics in a way that can be understood more intuitively. It provided a way to link quantum objects, written as vectors, to each other. That's useful for representing quantum teleportation, say, when information passes instantaneously between certain locations via a specific route.
Coecke likens traditional approaches to such problems to watching television at a pixel level. "Rather than seeing the image, you get it in terms of 0s and 1s," he says. "It wouldn't mean anything to you." By translating quantum mechanical processes into pictures, higher-level structures become apparent.
More recently, Coecke, together with Mehrnoosh Sadrzadeh, also at Oxford, and Stephen Clark, now at the University of Cambridge, realised this graphical mathematics might also be useful in computational linguistics. The field aims to create a universal "theory of meaning" in which language and grammar are encoded in a set of mathematical rules.
Computers could, in principle, use the rules to make sense of language. In practice, most existing models of human language focus either on the meaning of individual words, allowing search engines to work out the general context of a web page, or on the rules of grammar, but not both.
To produce a model that uses the rules of grammar to encode the meaning of sentences, Coecke and his colleagues had to combine the existing model types. To do this, they adopted the graphical approach Coecke had developed for use in quantum mechanics.
A graphical approach developed for quantum mechanics combines words and grammar
Existing models for word meanings define words as vectors in a high-dimensional space, in which each dimension represents some key attribute. So the vector for "dog" might include the vectors for "eat", "sleep" and "run". "Cat" might be generated by a combination of similar words to "dog", but "banker" would be built from quite different words, such as "money" and "work". Defining words in this way allows a dictionary to be represented as a "neighbourhood" of words, with the distances between residents in the high-dimensional space defined by their vectors. The vector representations of "dog" and "cat" would ensure that these words live much closer to each other than either does to "banker".
Now Coecke's team has created a similar neighbourhood for sentences. To create a vector for a sentence, Coecke has devised an algorithm to connect individual words, using the graphical links that were developed to model the flow of quantum information. In this case, the links embody basic grammatical rules, such as the way the word "likes" can be linked to "John" or "Mary", and the different way it can be linked to the word "not" (see diagram).
The team has already shown that the method allows the two sentences "John likes Mary" and "John does not like Mary" to be represented as vectors and placed at the appropriate location. That's no small feat: while anyone who can read English knows that these sentences are directly opposite, to a computer this isn't obvious. The work will be published in the journal Linguistic Analysis.
Most sentences have more nuanced relationships than these two examples. The next stage of Coecke's work allows more complex sentences to be represented as vectors, with the vectors that represent verbs taking into account the meaning of their subject and object nouns. This ensures that "dogs chase cats" gets assigned a vector placing it closer in sentence space to "dogs pursue kittens" than to "cats chase dogs". This work will be presented next month at the International Conference on Computational Semantics.
The team plans to train the new system on a billion pieces of text, starting with formal, carefully written legal or medical documents which should be relatively easy to parse. From there they will work their way up to more challenging extracts such as ambiguous sentences or sloppily written pages on the web.
It is not yet clear whether the insights gained so far can deal with all the nuances of language. Sebastian Pado, who studies computational linguistics at Heidelberg University in Germany, says that Coecke's team needs to show its method working on text from the real world, rather than specially prepared examples. Coecke agrees: "We have shown many proof-of-concept examples which have been crafted by hand, but to really convince the whole world this is the way to do things, you need a huge experiment."[/quote]
[url=http://www.newscientist.com/article/mg20827903.200-quantum-links-let-computers-understand-language.html]Source, requires registration[/url]
this is pretty cool and a great step forward for computer rights, because as we all know, to the vector goes the spoils.
[quote=OP]AS YOU read this article, your brain not only takes in individual words, but also combines them to extract the meaning of each sentence.[/quote]
Call me retarded if necessary, but I genuinely never considered that my brain does this.
[QUOTE=Hellduck;26670467]Call me retarded if necessary, but I genuinely never considered that my brain does this.[/QUOTE]
Well if it didn't "not bad" and "terrible" would seem to mean the same thing to you.
[QUOTE=DainBramageStudios;26670416][url=http://www.newscientist.com/article/mg20827903.200-quantum-links-let-computers-understand-language.html]Source, requires registration[/url][/QUOTE]
That's fine, nobody reads the source anyway.
[QUOTE=not_Morph53;26670578]That's fine, nobody reads the source anyway.[/QUOTE]
I'm tempted to put meatspin or something as the source and watch as I get perma'd out of the blue by some mod who was reading old articles years later
Thanks for posting FORBIDDEN NEWS from New Scientist, man. I was pissed when they made their site subscription-only.
[QUOTE=Turnips5;26670672]Thanks for posting FORBIDDEN NEWS from New Scientist, man. I was pissed when they made their site subscription-only.[/QUOTE]
I actually have a subscription for it but whenever I enter the details it rejects it
so now I'm restricted to looking at articles that are less than a week old with my free account
[QUOTE=DainBramageStudios;26670848]I actually have a subscription for it but whenever I enter the details it rejects it
so now I'm restricted to looking at articles that are less than a week old with my free account[/QUOTE]
I live at university. Everything is free.
Except the university. :v:
[QUOTE=Nerts;26670512]Well if it didn't "not bad" and "terrible" would seem to mean the same thing to you.[/QUOTE]
Oh, I understand it. It's just that I never considered it as such.
[QUOTE=Hellduck;26671069]Oh, I understand it. It's just that I never considered it as such.[/QUOTE]
yeah, the things that computers find easy we find incredibly hard and vice-versa
Didn't Microsoft do this like half a decade ago and shit didn't work?
I've always been puzzled by why programming language recognition would be so hard, considering the fact that it is an almost entirely mechanical process. If not completely. Computers should *excel* at this.
[QUOTE=BmB;26672963]Didn't Microsoft do this like half a decade ago and shit didn't work?
I've always been puzzled by why programming language recognition would be so hard, considering the fact that it is an almost entirely mechanical process. If not completely. Computers should *excel* at this.[/QUOTE]
Sure, analyzing the sentences probably isn't too difficult, but we humans use humor, sarcasm, figures of speech, all kinds of stuff that makes it extremely difficult.
[QUOTE=noctune9;26673319]Sure, analyzing the sentences probably isn't too difficult, but we humans use humor, sarcasm, figures of speech, all kinds of stuff that makes it extremely difficult.[/QUOTE]
No we don't.
We do.
[QUOTE=not_Morph53;26673674]No we don't.[/QUOTE]Uuh, what? Did his point fly right over your head?
[QUOTE=not_Morph53;26673674]No we don't.[/QUOTE]
[i]SURE[/i] we don't...
[QUOTE=HeatPipe;26673770]We do.[/QUOTE]
[QUOTE=Sgt Doom;26673776]Uuh, what? Did his point fly right over your head?[/QUOTE]
Mods, these are computers, ban them.
If I say to my friend "Nice watch!"..
"[good watch item]!"
"[good], [watch friend]!"
"[obsolete watch]!"
"[obsolete], [watch friend]!"
The list gets more complicated once you introduce context. I did a bit by assuming the computer knows about the friend and the friend's watch. The thing is, even about all that, you can still have sarcasm or a joke.
You know what's really crazy? You're doing something right now that's easiest to explain in terms of extra-dimensional vectors.
[QUOTE=ASmellyOgreV2;26674162]You know what's really crazy? You're doing something right now that's easiest to explain in terms of extra-dimensional vectors.[/QUOTE]
I'm sure i do.
[QUOTE=BmB;26672963]I've always been puzzled by why programming language recognition would be so hard, considering the fact that it is an almost entirely mechanical process. If not completely. Computers should *excel* at this.[/QUOTE]
I'm always puzzled by why teaching humans to calculate advanced mathematics would be so hard, considering the fact that it is an almost entirely methodical process. If not completely. Organics should *excel* at this.
:science:: it works, bitches.
[QUOTE=BmB;26672963]Didn't Microsoft do this like half a decade ago and shit didn't work?
I've always been puzzled by why programming language recognition would be so hard, considering the fact that it is an almost entirely mechanical process. If not completely. Computers should *excel* at this.[/QUOTE]
Quite simple, langauge isn't mechanical at all. Language is very much about nuance. You can't actually believe language is as simple as "mechanics". Language is the most complicated thing I can really think of when you start breaking it down beyond a grade school level, it devolves into thousands of assumptions and syntax and allegories and symbolisms, it's really not simple at all.
Sentience here we come!
Does this mean google will finally be able to translate my kawaii otaku-chan japanese hentai discussion forums properly without any inaccuracy?
[QUOTE=NickFury6;26680441]Does this mean google will finally be able to translate my kawaii otaku-chan japanese hentai discussion forums properly without any inaccuracy?[/QUOTE]
yes, we can rest easy now with our kawaii anime girl pillow xD
[QUOTE=Hellduck;26670467]Call me retarded if necessary, but I genuinely never considered that my brain does this.[/QUOTE]
your brain does a lot of things that anything else besides human kind has no chance of ever doing, there's a reason why we're so much more intelligent than even the 2nd most intelligent animal
[editline]13th December 2010[/editline]
[QUOTE=HumanAbyss;26677702]Quite simple, langauge isn't mechanical at all. Language is very much about nuance. You can't actually believe language is as simple as "mechanics". Language is the most complicated thing I can really think of when you start breaking it down beyond a grade school level, it devolves into thousands of assumptions and syntax and allegories and symbolisms, it's really not simple at all.[/QUOTE]
Slang and culture affect language heavily, and that's what computers would be hardpressed to account for
Sorry, you need to Log In to post a reply to this thread.