People use the five senses — sight, sound, touch, etc. —to send or receive information. Humans are able to do this in innumerable formats— the written/ spoken language, facial expressions, intonations, physical language like hugs, etc. Each of them is highly nuanced. For example, it is the slightest twitch in the eye that differentiates surprise from sarcasm. This is why it takes a computing system with 100 billion nodes i.e. our brain, to successfully communicate between two people.
This kind of processing power is not yet available outside nature.So we engineers do what we are best at — pick a niche and assume that nothing else is important.
If we focus on the simplest of these formats, the language, we are still faced with insane complexity. Language seems the simplest because it is consensus-based. Two persons cannot exchange information unless they agree on a language. Hence language has been codified. This is very useful for engineers who are trying to use structured, quantitative computing science for emulating human behavior.
The trouble is that this consensus-based codification is highly local — geographically, temporally, professionally, demographically, or by any other axis you cut it. For example a sentence like “Today is a good day.” means completely different things to people living in San Francisco than to people living in Byrd Station, Antarctica. For people living in Washington DC, it means different things in August than in February. On the same day in Washington DC, say on the day of Supreme Court’s decision on DACA, this sentence means different things to people who are politically to the extreme right than to those to the left.
To ~80% of earth’s population that does not understand English, the sentence does not mean anything at all, but let us engineers do the niche magic again and ignore them for a while.
The same idea can be expressed using different words. For example, “launch”, “unveil”, “release”, “unlock”, “announce”, “beta test” and “introduce” may mean the same event in some contexts. The same fact can also be colored in different sentiments or related to multiple other facts in multiple ways based on opinions of different authors.
So, any decent solution to understand natural language (Natural Language Understanding or NLU — engineers are not that creative in making acronyms either) must be able to capture the ideas expressed by a give set of keywords and vice versa — the almost uncountable set of permutations of keywords that express the same idea. This mapping is different for a multi-variate context — geography, time of day, what not — as we saw earlier. In short, to be able to emulate even the simplest of human forms of communication, we need to capture Big Data out of every small page.
Many of the current solutions try to make a good guess as soon as they come across these tokens. For example, Stanford’s NLP kit assigns part of speech based on complex sentence parsing models built over statistical dictionaries. Another approach, like with Coseer’s Calibrated Quantum Mesh, is to take every possibility as a hypothesis. As these hypotheses are processed with each other in increasingly constrained steps, multiple permutations simply perish. For example, in the sentence “A report is due.”, we can make two hypotheses about “report” — verb or noun. When we inter-relate these with hypotheses on other words in the sentence, the verb hypothesis is voided. In either case, the simplest decisions require immense computing power.
Let’s make this real. At 9:30 pm Pacific Time on June 6, 2020, we ran a basic Finder module on www.nytimes.com. This module only focuses on the spatial position of text elements and creates hypotheses on basic facts encountered in text. The output is then used by multiple sophisticated algorithms to solve real problems. When we printed the JSON for NY Times, the basic mode output for single page ran into 1.7 million lines. For our solutions for Finance and Pharma sectors, our servers process 1–1.5m documents every day. In other words, for simplest of our solutions, we need to process data points running into trillions.
With the developments in Big Data and cloud technologies it has become possible to process this kind of information using commoditized hardware, so now we can attempt to emulate our brain and get computers to do things that have been exclusive human till now. Managing this ambiguity using explosive data permutations within latencies that are sensible for real time applications, is what NLU is all about.