Category: Education

  • Humans — Search for Things Not for Strings

    Humans — Search for Things Not for Strings

    Information Retrieval (IR) systems are a vital component in the core of successful modern web platforms. The main goal of IR systems is to provide a communication layer that enables customers to establish a retrieval dialogue with underlying data. However, the immense explosion of unstructured data and new ways to interact with IR systems (voice, image…) drives modern search applications to go beyond just fuzzy string matching, to invest in a deep understanding of user queries through the interpretation of user intention (we like the term Query-Understanding) in order to respond with a relevant result set. In short: humans search for things, not for strings.

    How to Master the Challenge of locating Things not Strings

    Since the problem is not always obvious, I spent some time to find relevant real-life examples that do not require a lot of domain knowledge to judge the result quality.

    The good thing, here at searchHub, is that we currently have access to more than 15,000,000 unique search queries from the e-commerce domain. Why is it good? — well, by having access to this kind of information I was able to pick a very representative query — “women red dress size s” — which made it into our top-1000 fashion queries in terms of frequency and into the top-100 queries in terms of value per search. In essence, this query (and it’s variants) drives almost 1 Million USD in revenue per day for our customers!

    Having found an economically relevant query example I went on to check the results for this query, and it’s top-variant on one of the biggest retail-sites on the planet walmart.com and this is what I received as feedback:

    search result for “women red dress size s” on walmartcom

    search result for “red dress size s” on walmart.com

    I’m not going to criticize these results in detail, but I’m pretty confident that these results do not represent the best possible feedback the system could come up with in terms of economic performance. Why am I saying that? — Well, firstly, there is and will always be a strong correlation between result-size and search conversion rate— the reason for this lies in the “paradox of choice” with search-effort. The impact of result-size vs. search conversion rate increases even more the more specific a query gets.

    Unfortunately, still most of the domain owners in search think that this challenge can be solved by a learning-to-rank approach. But you can’t — learning-to-rank will not solve the underlying precision problem. I agree that you might be able to reduce the negative impact by learning to push more relevant products to the top or to the first page, but you’ll still miss 70–80% of the underlying potential!

    Let’s be honest: when somebody is willing to spend the effort to articulate a quite specific query like — “women red dress size s” — he has a specific intent and wants to be understood. — Essentially, the visitor is looking for a “red colored dress in size s.

    Time for an experiment 🙂

    Since Walmart.com is unfortunately not one of our customers yet 🙂 we don’t have access to qualitative query performance data. But to test our hypothesis, we did a simple experiment with real-life users.

    We used the faceted navigation result representing the intent of the given query “red dress size s” on walmart.com. We think that from a precision perspective, this would be the most relevant set of products Walmart could reply to the given query. Then we rebuild the page so that it matches perfectly a search results page in terms of look and feel.

    original faceted navigation result representing the intent for the given query “red dress size s” on walmart.com

     fake search result for the given query “red dress size s” on walmart.com based on the faceted navigation result

    We did the same for our top-3 variant queries, whilst again maintaining the same look and feel. Afterwards, we established a crowdFlower Human-In-The-Loop Search Relevance experiment with 100 users to receive some relevancy judgments for the different variations.

    Test results for variation 1 compared to test results for variation 2

    Test results for variation 3

    Results

    The results speak for themselves. Understanding the intent behind a search query is superior to string matching in isolation. And the reason for this is quite obvious — “people search for things rather than strings” —

    The Solution — Query Understanding (We Know “Things”)

    Search can and should be a gateway to open-ended exploration and discovery. Sometimes searchers want to be inspired, but most likely come to search with a specific need in mind. Here at searchHub, we work towards understanding user intent with the aim of bridging the gap between user intent and relevant results. We think a search engine should encourage searchers to express their specific needs by driving searchers to create search queries that are more specific. Our goal is to make the leap from query-to-intent less tedious to a more seamless search approach.

    Our Approach to Help Humans find Things

    Deciphering users’ ‘intent’ is the mind-reading game we have to continuously play with our customers. But how do we do that?

    searchHub’s stages of Query Understanding

    1. In the first stage, we start by improving the user queries themselves.

    Here we take advantage of our superior contextual query correction service which uses our concept of controlled precision reduction control. It automatically handles typos, misspellings, term decomposition, stemming and lemmatization at a yet unmatched level of accuracy and speed.

    2. In the second stage, we try to understand the intent/meaning of the query

    at this stage, the system tries to extract the entities and concepts mentioned in a query. As a result, search|hub predicts one or more possible intentions with a certain probability, which is particularly important for ambiguous queries.

    Most other solutions that attempt to tackle the same problem are using predefined knowledge bases / ontologies to represent dependencies and invoking meaning into the system. We spend a lot of time evaluating existing models and decided to do it differently because we wanted to build a system that learns at scale (different domains), can solve the language gap between visitors and catalog domain experts (wording and language) and even more essential automatically reshapes / optimizes and adapts itself based on explicit and implicit feedback. Therefore, we have developed an automated entity relationship extraction and management solution based on reinforced learning. Another benefit of this solution is that it does not waste computation time and money on building knowledge we don’t need to have 🙂

    3. Continuously test and learn which query/queries perform best for a given intent/meaning

    Another big differentiator of search|hub is that it is entirely data-driven. For quite several queries, there is no single best intent / meaning. In these cases, the right intent / meaning might be dependent on context (time, region, etc.) Therefore, we automatically multivariate test ambiguous queries and query intentions.

    The technology behind searchHub is specifically designed to enhance our customer’s existing search, not replace it. With just two API calls, searchHub integrates as a proxy between your frontend application and your existing search engine(s) injecting its deep knowledge.

    If you’re excited about advancing searchHub technology and enabling companies to create meaningful search experiences for the people around us, join us! We are actively hiring for senior Java DEVs and Data Scientists to work on next-generation API technology.

    www.searchhub.io proudly built by www.commerce-experts.com

  • The Future of Ecommerce Site Search

    The Future of Ecommerce Site Search

    Which On-Site Search Changes to Watch Most Closely

    E-commerce site search has been evolving dramatically recently, moving away from a legacy keyword-driven approach, that often yielded inaccurate and confusing results, towards a leaner, more intuitive approach that puts user experience at its heart.

    The adoption of several worthy to mention technologies played a fundamental role in this evolution.

    NLP and Self-Learning Technologies

    Natural language processing (NLP) has already started to make a big impact in search, and its take-up is likely to accelerate in the coming years. NLP takes search away from the simplistic “is this keyword present in the title or description” approach, and starts asking “what is the customer really asking for?”. By taking a semantic angle to evaluating search results, the results produced are far more accurate and relevant to the customer’s actual search intent. Factor in the self-learning capabilities of many NLP-based engines, and suddenly site search starts to look very different.

    In the past, keyword-based search algorithms have often led to something of a ‘chicken and egg’ situation, with website visitors trying to ‘guess’ what terms they need to enter in the search box, in order to get the results they want from their search.

    Clearly, this is not an ideal approach, and, historically, it has led to high levels of frustration for information seekers.

    Natural language processing (NLP) is a component of artificial intelligence (AI) that enables computer programs and functions to understand human speech as it is spoken. In commerce-oriented websites and apps, NLP supports meaning-based search, allowing shoppers to search for items in their own language while still producing relevant results, even if the search terms do not directly match keywords in product records.

    All of this is possible because of NLP-based search, switches the focus from keywords to the actual meaning. Taking a semantic approach means that search results have a ‘connection’ to the search terms, rather than having to actually contain those search terms.

    Therefore NLP search can deliver accurate results and a successful user experience, where traditional text-based searches would typically fail.

    NLP and Linguistic Nuances, Synonyms, Misspellings

    NLP in site search must be able to recognize similar (though not identical) search terms, based on individual searchers unique lexicon preferences and context. NLP must be able to identify these items as being one and the same thing, producing the same relevant results regardless of the exact terminology being used.

    Most sites can’t afford to lose conversions to human errors. NLP can help ensure that a misspelled search for “red jacket” will deliver the same relevant results as a correctly typed search.

    NLP search can also outperform traditional search by learning about common miss-spellings, mixed-up brand names and other potential issues that could be entered into the search box. With self-learning abilities, NLP search never stands still, and continually becomes more accurate and better able to ‘understand’ customer intent.

    Conclusion

    Applying NLP to search can be incredibly powerful because it switches the focus from keywords to actual meaning, allowing humans to be humans while a machine does the work of accurate, intent-based interpretation. When applied to text-based site search, the NLP capabilities described above can play a key role in creating the kind of frictionless, seamless interactions that drive conversions, in a way that antiquated text-based searches simply can’t hope to.

    We are hiring

    If you’re excited about advancing our search|hub API and strive to enable companies to create meaningful search experiences, join us! We are actively hiring for Data Scientists to work on next-generation API & SEARCH technology.

    www.searchhub.io proudly built by www.commerce-experts.com