A Glimpse Into the NLP Innovation at Hypersonix
A Glimpse Into the NLP Innovation at Hypersonix
Hypersonix has been identified as one of the hottest startups in the big data and AI space. One of the main USPs of Hypersonix is an internal metric called Time-To-Insight (TTI) which measures the amount of time taken for the business user to reach an actionable insight from the moment he/she lands in our product’s landing page. We aim for the TTI metric to be in seconds or in low-single digit minutes. This is enabled by a key aspect of our NLP Innovation – our natural language engine – which allows our users to get to insights at the speed of thought.
We have invested (and continue to invest) considerable resources in researching the fastest ways in which users can gain actionable insights through seamless inputs and interactions (User Experience – UX) – and have found that a combination of natural language UX and visual UX help us in achieving this fast. When a user needs an insight, the first thought in the user’s mind is in the form of a natural language query i.e formulation of the question that needs to be answered (the “what”) followed by how he/she can get that answer (the “how”). The latter is usually in the form of dashboards which are constructed in a visual medium in existing BI tools by drag-and-drop of necessary measures, dimensions and filters. At Hypersonix, we enable the user to engage with the platform right from the “what” and still offer the visual flexibility and familiarity of existing BI UX.
At Hypersonix, we have developed a custom NLP/NLU engine (internally codenamed Mozart) to handle a variety of NLP tasks outlined below.
Entity Extraction and Entity Resolution
While entity extraction is a well-known and a well-researched problem, Mozart implements this module in a custom algorithmic approach to cater to our enterprise-level accuracy and determinism requirements of our customers. While Named Entity Recognizers (NER) implementing CRFs or language models are the go-to solutions for this task, they do not scale well at the multi-tenancy scale and the sheer number of dimensions that Hypersonix operates at.
- Hypersonix enables customers to search and filter against hundred of dimensions that can take upto a million unique values
- Being a multi-tenant SaaS company, we cater to multiple customers from across industries and each of these customers have multiple unique dimensions and filters.
Entity Extraction
Mozart identifies multiple entities from any input query in real-time. For example:
Entity Resolution
Once the entities are identified, Mozart needs to resolve these entities to their intended entity types.
- “sales” is resolved to the “net_sales” measure
- “avg check” is resolved to the “avg_check” measure
- “store” is resolved to the “store_name” dimension
- “last month” is resolved to a datetime entity and is parsed
- “California” is resolved to a filter that needs to be applied on the “state_name” dimension. While this is straightforward, we also give the power to the user to map this filter to other related dimensions such as “region” if available.
- “soda” is also a filter that needs special ambiguity resolution to be able to map to a dimension. Most customers have items such as “Kids Soda”, “16Oz Soda” across dimensions in their product hierarchy that don’t exactly match the input.
Semantic Dependency Parsing
Even though our NLP interface is positioned as a “search bar” (to evoke UX familiarity with the widely-used search engines such as Google and Bing), it’s internal workings are far from how a normal search engine works. A standard search engine need not “understand” the query to bring up relevant results since they bring up the closest matching documents/websites. At Hypersonix, we need to fully “understand” the query i.e truly understand what the user is asking for since we need to be able to run either an SQL or Intelligence Engines in order to provide the answer. While search engines can provide multiple closely matching results to the users, we need to provide a single, accurate result that answers the users’ queries. This is where Mozart’s Dependency Parsing module comes into picture.
Consider the following question:
“Show me top fifteen gross sales of all stores in california last quarter whose net sales is below $10000”
Short Range Dependency Parsing
Queries that require a short range dependency parsing.
“Stores with sales above 50k last year”
Long Range Dependency Parsing
Queries that require a long range dependency parsing.
“Show me transaction count of all stores last fiscal quarter under 50k”
Intent Identification
Intent identification, in NLP, refers to classification of an input natural language sentence into target classes and is usually used to determine the intent of the user. Unlike standard NLP intent classifiers that use the natural language input in raw form, Mozart uses the parse tree to determine the users’ intent.
“User Intent” has multiple meanings within Hypersonix.
- Firstly, Hypersonix’s customers tend to have multiple entities (and data sources) and we need to accurately determine what entity answers the users’ query accurately from the natural language input.
- Secondly, while most queries can be directly answered using a standard SQL, some queries require us to run custom Data Science microservices in order to arrive at the answer.
Mozart’s Intent Classification module uses the parse tree generated in the previous module in order to identify the entity (and data source) that needs to be queried as well as the type of Answering Module that needs to execute to answer a user’s query.
Smart Suggestions – Butler
Using a natural language interface to query BI interface is fairly novel with most of the users unfamiliar with the UX. Hypersonix is blazing the path of a natural language UX for BI by leveraging a sophisticated suggestions module that educates the users on the type of queries that can be run and helping them explore the full capabilities of our product. We achieve this by our in-house custom suggestions module codenamed Butler.
Suggest Query Completion
We cannot expect the user to always query a “complete” query that translates to a full SQL statement. Hence, while the user is typing the query, if Butler determines that it cannot construct a full SQL statement, it suggests the user to add components that would lead to a meaningful answer
Suggest Customer Data
Given that our customers deal with millions of products across hundreds of stores and other dimensions, it is impractical to expect users to remember the values present in their own database. Hence, Butler surfaces values from the customer’s DB to help complete a query based on the natural language context.
Conversational AI
The conversational agent at Hypersonix, nicknamed Jarvix, is our personal digital analyst that is capable of extracting insights from the query results and presenting it in a succinct, digestible and actionable manner. Jarvix performs multiple activities throughout our product. From dialogue management, where Jarvix helps guide the user through a conversation to arrive at an actionable insight to Natural Language Generation (NLG) where Jarvix converts complex insights and tabular data into easily-digestible natural language snippets that enable the user to see what happened, why it happened and what the user can do about it. This specific area is an ongoing research focus at Hypersonix.
Visit our website to learn more about Hypersonix. Or, you can meet Hypersonix in 90 seconds!
✎ by Sujay S Kumar, Architect and Lead Engineer at Hypersonix, Inc.