Blog

  • Farewell 2020 – What’s Next in 2021?

    Farewell 2020 – What’s Next in 2021?

    Farewell to 2020! As exciting as it was game-changing, this year too now draws to an end. Like many of us – we at CXP searchHub.io had to adjust to the extraordinary situation. More than one vacation, to faraway lands, boasting paradisiacal beaches, had to be postponed. We found local alternatives instead. Gardens were tended to, apartments and homes painted, or, simply more time spent with family than usual. So what’s next in 2021?

    To be true, this Corona Year has galvanized the vocational day-to-day with the historically private family life more than ever. However, what took some getting used to the beginning of March, has revealed new and exciting possibilities.

    Nevertheless, the sun is slowly rising on the horizon.

    Notable Achievements in This Corona Year 2020

    In addition to revenue growth, customer acquisition, and technology development, it’s also important for an E-Commerce Start-Up to attend to remedial tasks like cleaning out the basement; minding the costs; and time for investing in each other. That goes for both inward and outward relationship building.

    This was an impressively, successful year for CXP SearchHub. And I’m quite proud of that. Thanks to our team, our customers, and partners.

    My Personal Goal Accomplished in 2020

    Every year my long-time friends and I meet and hike for a week. This year, due to the Corona situation, we stayed in our local area and walked from Freiburg to Lake Constance. A wild and fun hike for almost 180 km. And we never once got lost. In fact, the only thing I lost was a few kilos. 🙂

    180 km hike with my best mates.

    SearchHub Software Expansion

    We reached our goals alright. Yeah, we clearly surpassed them. As an E-Commerce SaaS (Software as a Service) provider, our most decisive KPI (key performance indicator) is ARR (annual recurring revenue). With growth over 50%, we were able to confirm our business focus set, last year, at the end of 2019. As a result, searchHub.io has played a central role in our business strategy for 2020 and will be expanded upon in the coming year 2021.

    Additionally, as part of searchHub, we also launched our searchInsights tool. This is an overview of all relevant Search KPIs with a first of its kind Findability Score (find out more about this KPI and why it’s important for sustainable search optimization in our blog post here) for daily optimizations and reporting.

    New Business Expansion

    I’m overjoyed about the many customers who chose to place their trust in our business this year. They play a definitive role in the ongoing success of CXP SearchHub.io. Alongside new customers like Cyberport.de, Steg-Electronics.ch, and INTERSPORT.de, our journey continued with ALL our long-standing customers as well. In addition to all the growth, it’s these of which I’m especially proud. Our long-term, trusted customer relationships are the fuel behind the motivation, innovation, and continued development at CXP. Thank you, for the valuable feedback and insightful ideas.

    New Customers in Their Words

    searchHub.io enables us to develop our own eCommerce search solution based on Elasticsearch with a data-driven approach. The search experience we deliver to our clients is important for us, and searchHub supports us with their unique expertise in this area.

    Malte Polzin, CEO – STEG Electronics AG

    In CXP, we have not only found a technological vendor, but we also gained a partner who actively participates in our daily on-site search optimizations with an incredible depth of experience. The speed at which we connected to searchHub was a sprint, not a marathon, going from 0 to 100 in nothing flat. A short time later, increased revenue was proof of success.

    Carsten Schmitz, Chief Digital Officer – INTERSPORT Deutschland e.G.

    With CXP searchHub.io, we can guide our customers even more effectively into our product assortment. The intelligent clustering of long-tail search traffic enables us to control our search solution even better. The maintenance effort for search merchandising campaigns has been reduced, and search analytics are even more transparent. With searchHub, we have been able to increase the value per session ad-hoc by more than 20%

    Dominik Brackmann, Managing Director – Ecommerce at POCO.de

    What’s in Development

    With searchHub.io, we were able to release new features every month. In addition to the myriad technological developments within our Clustering-Algorithm, and new integrations (e.g., for Spryker), we have also added redirects, mapping-statistics, search-insights, as well as, an Online-Chat and On-Boarding Videos, to both improve usability and make ramping up with our software easier.

    In 2020 SmartSuggest became part of our standard offering

    Jonathan Ross, joined our CXP Team this last summer

    Along with the continued development of our OCSS – “Open Commerce Search Stack” – a special highlight has to be our SmartSuggest. This new technology runs on the same SearchHub knowledge base and is available to every customer as part of our standard offering.

    CXP Family Business

    There isn’t just one highlight this year. In sum, the entire TEAM is the highlight. But I know you won’t let me off that easy, so…

    This year presented us, once again, with the opportunity to take our place at the MICES 2020 – in connection with the Berlin BUZZWORDS AND HAYSTACK! Here’s a link to the lecture from our Andreas Wagner on the topic “Diversification of Search”.

    This year also heralded the arrival of Jonathan Ross, whom we gladly welcomed to our team. Jon will continue supporting us in 2021 with his experience of many years.

    Covid wasn’t gettin’ us down!

    Team Event was moved to September

    Gabriel Bauer became our youngest shareholder

    A new Shareholder was born. Yet another member of our staff became part-owner of the business. A core tenant of our company philosophy is built on the premise of employee ownership. Since 2014, long-standing employees participate in the success of CXP. This year, we were happy to welcome Gabriel Bauer as the newest shareholder. Discovering our knack for canoeing. It’s on our list for next year. Team Event was moved to September.

    A further highlight was most definitively our global Team-Meeting. Time to get to know knew colleagues and old friends. COVID-19 wasn’t gettin’ us down!

    Due to Corona, we were forced to reschedule to September, and find a Team-Meeting worthy event, while conforming to all necessary restrictions. What began as kind of a downer for the staff, quickly turned into a sunny, and athletic Team-Event, which strengthened our solidarity for one another. It goes without saying, that we will be repeating this type of event the next chance we get.

    What’s On the Horizon?

    First off, I hope that everyone makes it, healthily and comfortably into the new year, enjoying time together with their families. Once the lockdown is over, we’ll be starting the year 2021 with a few well-known new customers – more about this later on LinkedIn and here on our blog, just as soon as we can officially say more.

    Notwithstanding Brexit, or maybe even as a result of, we are taking our first dip across the pond to set foot on the island this coming year. Together with our new UK consultants, UK customers will soon be able to take advantage of searchHub.

    In that vein: our partner ecosystem continues to expand. We’re excited to tell you more about stronger relationships with leading E-Commerce agencies, as well as, 3rd Party Systems. With the development of necessary system interfaces, we are building strong synergies.

    And… what kind of SaaS company would we be without technological innovations? We are continuing development on things like “Kihon” – a relevant milestone in our Clustering-Algorithmic or “Structured Queries”. Truly, search has never been more simple.

    Stay-Tuned – 2021 will be suspenseful.

    Sunsets, like these, are beautiful endings

    Last but not least – I’m still hoping to see such a sunset in the year to come 🙂

    Merry Christmas, Happy Holidays, and see you soon. Stay positive – test negative 🙏

    Sincerely, Mathias & Team CXP searchHub

  • Ain‘t No Autoscaling People

    Ain‘t No Autoscaling People

    2020 has been an awkward year. Probably you already noticed. Just next to our office there is a great restaurant called Steg7. Marco and his team have hosted many of our business lunch meetings throughout the last years, as well as most of our Christmas events. This year, Marco suffered a dramatic revenue loss, as many other businesses did. At the same time, our customers experienced (at least for their online businesses) significant growth above the rates of previous years. searchHub also grew this year much more than the last. We managed to integrate with a number of leading e-commerce players. Cyberport, JAKO-O, Lampenwelt, INTERSPORT – just to name a few of them.

    Steg7 – in Pforzheim – Marco and his team aren’t helped by autoscaling

    This discrepancy makes me think. It also gives me a bad conscience. Did Marco do anything wrong? Not at all. His burgers still are world-class, as is his pasta, his vegetarian tables. And his solution: he started a pickup service early in the pandemic, and we love it! So, who did what wrong?

    I have the impression that we sometimes look at our world as if it were a Kubernetes cluster (don’t know what this is? Watch this video!, and come back). We love growth. We love scaling. Even more so, we love scaling and growing businesses. Autoscaling – what a beautiful word created by some really clever kubernetes marketing division.

    When discussing the growing numbers of Covid-19 infected people I often hear: Our government has totally failed, they should have increased resources for the healthcare system so we would now be able to handle the situation appropriately. While I totally agree that our healthcare system has suffered a lot in terms of “make it cheaper”, “be more efficient”, “treat patients as cases with fixed cost compensation”, I mostly disagree that short-term autoscaling is a solution to the current crisis. Why is that, you ask?

    Ain’t No Autoscaling for People.

    Maybe you have already read my previous blog post- “Hire or Higher to Go Further”. What happens if you hire a new senior developer capable of doing excellent AI, ML, CI, CD, XY stuff? You’ll probably buy a premium dev laptop with a huge, and fast SSD, a number-crunching CPU along with loads of RAM. Then you’ll buy a height-adjustable desk, a high-quality chair, noise reduction headsets, and maybe even rent some new office space. And after that, you’ll even provide a Nespresso machine (with sustainable capsules from Vitaboni). And not until after the fourth week of ramp-up and first tasks with your new developer do you finally realize that in fact, you hired a totally gifted, but inexperienced junior who somehow made it to convince you during the interview. You can scale hardware easily. But you can’t simply scale those senior developers and spawn a replica. Good news: in the software business, you can simply revert the bugs and bring your stack back to stable.

    Imagine the same situation for an ICU department in a hospital. You can buy new ICU beds, new ventilation machines, you can even build new hospitals. But you can’t simply scale the experienced healthcare workers and doctors. So you need to rely on the newbies. But be careful: the ICU doesn’t have anything like a “git revert”, if a totally gifted and dedicated but inexperienced healthcare worker treats an intensive ventilation patient incorrectly. “kubectl scale” does not work in the real world. Healthcare workers are not “replica”. And patients are not pods that will simply respawn after being “deleted”.

    It’s Only Autoscaling if You Fully Meet Your SLA

    With our SearchHub SLAs, we constantly keep a close eye on the growth of our customers. We love to see those lines pointing to the upper right corner in a traffic chart. And, of course, we prepare well for high-traffic during peak seasons like 2020’s black week. SearchHub even gets better with more traffic – the more data points, the more knowledge, the better the search results. Our hybrid SaaS solution can handle significantly more than 10,000 requests per second. And our background data processing that is generating new AI models auto-scales. Our SLA guarantees that we deliver stable performance and quality of service. Only when this guarantee is met, does autoscaling deserve its name.

    Have you ever experienced a classic standalone server running a critical process at 80% load? This will run almost as smoothly as with 20% load and you won’t see a significant impact on request processing and stability. Have you ever experienced the same machine at 101% load? It will hardly manage to answer a single request without a timeout or other kind of failure, most likely it will become totally unstable. The only way to get it up and running again is to: stop the traffic.

    Our healthcare system is like a classic server before kubernetes. Like it or not: you can’t autoscale it without lowering your SLA. But lowering the SLA will cause fatal errors.

    We MUST stop the traffic.

    Let’s do whatever it takes. Wear masks. Avoid contacts. Maybe even celebrate the second most silent night in Christmas history. Use the vaccine. Get the world up and running again. Do it for Marco. Do it for healthcare workers. And do it for the thousands of people that would otherwise die way too early.

    Merry Christmas and have a good and healthy (I mean it more than ever this year) new year

    Siegfried Schüle CEO

  • Introducing Open Commerce Search Stack – OCSS

    Introducing Open Commerce Search Stack – OCSS

    Why Open-Source (also) Matters in eCommerce

    There are plenty of articles already out there that dig into this question and list the different pros and cons. But as in most cases, the honest answer is “it depends”. So, I want to keep it short and pick – from my perspective – the biggest advantage and the main disadvantage of using open source in the context of eCommerce – or more specifically when it comes to a search solution. Along the way, I’m introducing the Open Commerce Search Stack (OCSS) and show, how it leverages that advantage and reduces the disadvantage. Let’s dig in!

    Pro: Don’t Reinvent the Wheel

    Search is quite a complex topic. Even for bigger players, it requires a lot of time to build something new. There are already outstanding open-source solutions available. No matter if you’re eager to use some fancy AI or just a standard search solution. However, your solution won’t make a difference as long as it hasn’t solved the basic issues.

    In the case of e-commerce search, these are things like data indexation, synonym handling, and faceting. Not to forget operational topics like high availability and scalability. Even companies with a strong focus on search have failed in this area. So why bother with that stuff, when you can get it for free?

    Solutions like Solr and Elasticsearch offer a good basis to get started with the essentials. In this way, you can implement the nice ideas and special features that differentiate your solution. In my opinion this is what matters in the end, and where SaaS solutions come to their limit: you can only ever get as good as the SaaS service you’re using.

    Con: Steep learning curve

    In contrast to a paid SaaS solution, an open-source solution requires you to take care of everything on your own. Without the necessary knowledge and experience, it will be hard to come to a comparable or competitive result. In most cases, it takes time to fully understand the technology and to get it up and running. And even after you have understood what you’re doing, you need take a long hard path to create an outstanding solution. Not to mention the operational side of things, which needs to be taken care of – like forever.

    Where we see demand for a search solution

    So, why are we building the next search solution? A few years ago, we started a proof of concept to see if and how we can build a product search solution with Elasticsearch. We found a very nice guideline and implemented most of it. But even with that guideline and some years of experience, it took us quite a few months to get to a feasible solution.

    The most significant difference to most SaaS solutions is the complex API of Elasticsearch. To get at least some relevant results, you have to build the correct elasticsearch-queries respective of the search query. The same applies to getting the correct facets and to implement filtering correctly and so on. It’s mostly the same case for Solr. As a result, someone unfamiliar with these topics, is going to need more time to get it right. In comparison, proprietary solutions come with impressive REST APIs that only require basic search and filter information.

    We are introducing Open Commerce Search Stack into this gap: a slim layer between your platform and existing open-source solutions. It comes with a simple API for indexation and searching. This way it hides all the complexity of search. Instead of reinventing the wheel, we care about building a nice tire – so to speak – for existing wheel rims out there. At the same time, we lower the learning curve. The result is a solution to get you up and running more quickly without having to mess with all the tiny details. Of course, it also comes with all the other advantages of open source, like flexibility and extendibility, so you always have the option to dive deeper.

    Our Goals for Open Commerce Search Stack

    To sum it up, these are the main goals we focused on when building the OCSS:

    • Extend what’s there: To this end, we take Elasticsearch off the shelf and use best practices to focus only on filling the gaps.
    • Lower the learning curve: With a simple API on top of our solution we hide the complexity of building the correct queries to achieve relevant results. We also prepared a default configuration, that should fit 80% of all use-cases.
    • Keep it flexible: All the crucial parts are configurable. But with batteries included: the stack already comes with a proved and tested default configuration.
    • Keep it extendible: We plan to implement some minimal plugin mechanics to run custom code for indexation, query creation, and faceting.
    • Open for change: With separated components and the API-first approach, we don’t bind to the usage of Elasticsearch. For example we used pure Lucene to build the Auto-Suggest functionality. So it is easy to adopt other search solutions (even proprietary ones) using that API.

    Open Commerce Search Stack – Architecture Overview

    We’re just at the start, so there are only basic components in place. But more are on the horizon. Already, it’s possible to fulfill the major requirements for a search solution.

    • Indexer Service: Takes care of transforming standard key-value data into the correct structure, perfectly prepared for the search service. All controlled by configuration – even some data wrangling logic.
    • Search Service: Hidden behind the simple Search API (you can start with “q=your+term”) a quite complex logic cares about the results. It analyzes the passed search terms and, depending on their characteristic, it uses different techniques to search the indexed data. It also contains “fallback queries” that try some query relaxation in case the first try didn’t succeed.
    • Auto-Suggest: With a data-pull approach, it’s independent of Elasticsearch and still scalable. We use the same service to build our SmartSuggest module, but with cleansed and enriched searchHub data.
    • Configuration Service: Since the Indexer and Search Service are built with Spring Boot, we use Spring Cloud Config to distribute the configuration to these services. However, we’re already planning to build a solution that also allows changing the configuration – of course with a nice REST API. 🙂

     

    You are welcome to take a look at the current state. In the next installment of this series, I will present a simple “getting started”, so you can get your hands dirty – well, only as much as necessary.

  • Use Site Search to Optimize Your Customer Journey

    Use Site Search to Optimize Your Customer Journey

    Largely, it remains, the neglected stepchild of e-commerce optimization. Site-search optimization has the potential to catapult your customer journey strategy to a new level. The success of an E-commerce shop is tightly coupled with the quality of its site search. Customers cannot physically enter the store and look around. Instead, they interact with the shop’s search. Whether via the navigation, if they haven’t a clear idea of what to buy, or via a search query, if they have something specific in mind. Google states that 64% of people in the midst of an, “I want to buy moment”, use search. 71% of these actually visit a retailer’s website. And from all purchases on retailers’ websites, 39% were influenced by a relevant search.

    How-To leverage site-search to Optimize your Customer Journey

    Of course, search volumes of an online shop do not come close to those of the Google search, but you can learn a lot about visitor search behavior from this analysis. In fact, if you are using Google Analytics, and you haven’t already, you can check it out yourself by navigating to Behavior> Behavior Flow> Site Search> Overview

    Site-Search Analytics of a searchHub customer

    Site Search Reveals Your Customer Journey

    These figures, impressively, show how important a well-functioning search is. What do you think the worst thing is that can happen to a retailer? A customer, who is willing to buy, cannot find what he’s looking for. And this happens, every day, even though the shop has products in the range that match the search. The problem often goes deeper than merely one missed transaction. In fact, not finding what they are looking for can be the very thing that causes him to jump to a competitor and never come back. According to research done by Algolia, 12% of customers will go to a competitor’s site if they are dissatisfied with the search result.

    • Do you know how many shop visitors have had bad experiences with your search?
    • Do you have a dedicated resource responsible for optimizing your site search?

    Chances are… you don’t.

    Econsultancy report on Site Search Administration

    You may think e-commerce has progressed beyond this statistic, however, around 42% of online shops still neglect site search completely. For another 42%, it’s just a side topic. The bottom line: start focussing on it! It’s easy to get started. Almost every search provider on the market offers built-in analytics, some more, some less. If your solution includes analytics, please use it! Your site-search analytics will help you determine next actions and improve the shopping experience of your customers. They will be thankful and buy from you again the next time. This point is driven home most recently in the book marketing metrics, by Bendle, Ferris, Pfeifer, and Rebstein. In it they speak to the importance of getting the customer journey right:

    The probability of selling to an existing customer measures between 60-70%, whereas the probability of selling to a new customer is only 5-20%. Bendle, N. Marketing Metrics – 2016

    Humans are creatures of habit. If a shopping experience is positive, meaning: the search quickly found relevant results, the product(s) arrive quickly, and in good quality, there is no reason why you should not purchase again in the future. You see, search, plays an integral role in the complete service a shop offers. Unfortunately, in the majority of cases, the customer journey will start – and sadly sometimes end – with a search!

    Optimize Search, Capitalize on Customer Lifetime Value

    If you are not completely convinced yet, ask a trusted source to begin optimizing your current conditions. After all, costs for a one-time site-search optimization are considerably lower compared to the expensive customer acquisition marketing campaigns. Not only that, this kind of customer search journey optimization is more sustainable than a marketing campaign.

    Bain and Company underpin this with the following figures:

    Acquiring new customers is 5-25x more expensive than retaining existing customers.

    and

    Company profits increase by 75% by increasing customer retention by 5%

    Understanding these kinds of business cases has the potential to be quite compelling to a CFO on the fence, about whether to invest in site-search optimization. The main reason companies fail to optimize site-search is due to lack of budget. In 2019 companies claimed, in 42.7% cases, that there was no budget for e-commerce search. Furthermore, in 38.8% of cases, there was no budget for employees to manage the search.

    Why Site-Search is Still a Neglected Stepchild

    These figures align perfectly with those of the somewhat antiquated Econsultancy report and reveal a fundamental problem: Businesses are yet unaware of the dire significance, and subsequent consequence, of an optimally tuned site-search. As a result, the ultimate impact it has on cost savings and increased profits are blind to them as well!

    I’ll end this post with one last quote from Gartner Group

    80% of your company’s future revenue will come from just 20% of your existing customers. – Gartner Group

    Watch this space to learn more about what we are doing, on a practical level, to make your site-search analytics more profitable and transparent than anything you’ve seen to date. #searchHub #searchCollector

    Let’s go out and Make Search great again!

  • Three Pillars of Search Relevancy. Part 1: Findability

    Three Pillars of Search Relevancy. Part 1: Findability

    One of the biggest causes of website failure is when users simply can’t find stuff on your website. The first law of e-commerce states, “if the user can’t find the product, the user can’t buy the product.

    Why is Measuring and Continuously Improving Site-Search so Tricky?

    This sentence seems obvious and sounds straightforward. However, what do “find” and “the product” mean in this context? How can we measure and continuously improve search? It turns out that this task isn’t easy at all.

    Current State of Search Relevancy in E-Commerce:

    When I talk to customers I, generally, see the following two main methods to measure and define KPIs against the success of search: relevancy, and interaction and conversion.

    However, both have flaws in terms of bias and interpretability.

    We will begin this series: Three Pillars of Search Relevancy, by developing a better understanding of “Findability”. But first, let’s begin with “Relevancy”.

    Search Relevancy:

    Determining search result relevance is a massive topic in and of itself. As a result, I’ll only cover this topic with a short, practical summary. In the real world, even in relatively sophisticated teams, I’ve only ever seen mainly three unique approaches to increase search relevancy.

    1. Explicit Feedback: Human experts label search results in an ordinal rating. This rating is the basis for some sort of Relevance Metric.
    2. Implicit Feedback: Various user activity signals (clicks, carts, …) are the basis for some sort of Relevance Metric.
    3. Blended Feedback: The first two types of feedback combine to form the basis for a new sort of Relevance Metric.

     

    In theory, these approaches look very promising. And in most cases, they are superior to just looking at Search CR, Search CTR, Search bounce, and Exit rates. However, these methods are heavily biased with suboptimal outcomes.

    Explicit Feedback for Search Relevancy Refinement

    Let’s begin with Explicit Feedback. There are two main issues with explicit feedback. First: asking people to label search results to determine relevance, oversimplifies the problem at hand. Relevance is, in fact, multidimensional. As a result, it needs to take many factors into account, like user context, user intent, and timing. Moreover, relevance is definitely not a constant. For example, the query “evening dress”, may offer good, valid results for one customer, and yet, the very same list of results can be perceived as irrelevant for another.

    Since there is no absolute perception for relevancy, it can’t be used as a reliable or accurate search quality measurement.

    Not to mention, it is almost impossible to scale Explicit Feedback. This means only a small proportion of search terms can be measured.

    Implicit Feedback for Search Relevancy Refinement

    Moving on to Implicit Feedback. Unfortunately, it doesn’t get a lot better. Even if a broad set of user activity signals are used, as a proxy for Search Quality, we still have to deal with many issues. This is because clicks, carts, and buys don’t take the level of user commitment into account.

    For example, someone who had an extremely frustrating experience may have made a purchase out of necessity and that conversion would be counted as successful. On the other hand, someone else may have had a wonderful experience and found what he was looking for but didn’t convert because it wasn’t the right time to buy. Perhaps they were on the move, on the bus let’s say. This user’s journey would be counted as an unsuccessful search visit.

    But there is more. Since you only receive feedback on what was shown to the user, you will end up at a dead-end. This is not the case, however, if you have some sort of randomization in the search results. This means that all other results for a query, that have yet to be seen, will have a zero probability of contributing to a better result.

    Blended Feedback for Search Relevancy Refinement

    In the blended scenario, we combine both approaches and try to even out their short-comings. This will definitely lead to more accurate results. It will also help to measure and improve a larger proportion of search terms. Nevertheless, it comes with a lot of complexity and induced bias. This is the logical outcome, as you can only improve results that have been seen by your relevancy judges or customers.

    Future State — Introducing Findability as a Metric

    I strongly believe that we need to take a different approach to this problem. Then “relevance” alone is not a reliable estimator for User Engagement and even less for GMV contribution.

    In my humble opinion, the main problem is that relevance is not a single dimension. What’s more, relevance should instead be embedded in a multidimensional feature space. I came up with the following condensed feature space model, to make interpreting this idea somewhat more intuitive.

    Once you have explored the image above, please let it sink in for a while.

    Intuitively, findability is a measure of the ease with which information can be found. However, the more accurately you can specify what you are searching for, the easier it might be.

    Findability – a Break-Down

    I tried to design the Findability feature (measure) to do exactly one thing extremely well. Pointedly, to measure the clarity, effort, and success in the search process. Other important design criteria for the Findability score were that it should:

    a) not only provide representative measures for the search quality of the whole website, but also

    b) for specific query groups and even single queries to be able to optimize and analyze them.

    Findability not only tries to answer, but goes a step further to quantify the question.

    Findability as it Relates to Interaction, Clarity, and Effort
    • “Did the user find what he was looking for?” — INTERACTION

    it also tries to answer and quantify the questions

    • “Was there a specific context involved when starting the search process?”. “Was the initial search response a perfect starting point for further result exploration?” — CLARITY

    and

    • “How much effort was involved in the search process?” — EFFORT

     

    Appropriately, instead of merely considering whether a result was found and if a product was bought, we also consider whether the searcher had a specific or generic interest. Additionally, things like whether he could easily find what he was looking for, and if the presented ranking of products was optimal, provide valuable information for our findability score.

    Intuitively, we would expect that for a specific query, the Findability will be higher if the search process is shorter. In other words, there is less friction to buy. The same applies to generic or informational queries, but with less impact upon Findability.

    We do this to ensure seasonal, promotional, and other biasing effects are decoupled from the underlying search system and its respective configuration. Only by trying to decouple these effects, is it possible to continuously optimize your search system in a systematic, continuous, and efficient way respective to our goals to:

    • increase the customer experience (to increase conversions and CLTV)
    • increase the probability of interaction with the presented items
    • increase the success rate of purchase through search

    Building the Relevance and Findability Puzzle

    To quantify the three different dimensions clarity, effort, and interaction we are going to combine the following signals or features.

    Clarity – as it Relates to Findability:

    In this context, clarity is used as a proxy for query intent type. In other words, information entropy. For example, in numerous instances, customers issue quite specific queries. For example: “Calvin Klein black women’s evening dress in size 40”. This query describes what they are looking for. For this type of query, the result is pretty clear. However, there is a significant number of examples where customers are either unable or unwilling to formulate such a specific query. On the other hand, the query: “black women’s dress”, leaves many questions open. Which brand, size, price segment? As a result, this type of query is not clear at all. That’s why clarity tries to model the query and deliver specificity.

    Features

    Query Information Entropy

    Result Information Entropy

    Effort – as it Relates to Findability:

    Effort, on the other hand, attempts to model the exertion, or friction, necessary for the customer to find the information or product for the complete search process. Essentially, every customer interaction throughout the search journey, adds a bit of effort to the overall search process, until he finds what he is looking for. We must try to reduce the effort needed as much as possible, as it relates to clarity, since every additional interaction could potentially lead to a bounce or exit.

    Features (Top 5)

    Dwell time of the query

    Time to first Refinement

    Time to first Click

    Path Length (Query Reformulation)

    Click Positions

    Based on these features, it is necessary, in our particular case, to formulate an optimization function that reflects our business goals. Our goal is to maximize the expected search engine result page interaction probability while minimizing the needed path length (effort).

     

    The result of our research is the Findability metric (a percentage value between 0-100%), where 0% represents the worst possible search quality and 100% the perfect one. The Findability metric is part of our upcoming search|hub Search Insights Product, which is currently in Beta-Testing.

    I’m pretty confident that providing our customers easier to understand and more resilient measures about their site search, will allow them to improve their search experiences in a more effective, efficient, and sustainable way. Therefore, the Findability should provide a solid and objective foundation for your daily and strategic optimization decisions. Simultaneously, it should give you an overview of whether your customers can, efficiently, interact with your product and service offerings.

  • How to Achieve Ecommerce Search Relevance

    How to Achieve Ecommerce Search Relevance

    Framing the Hurdles to Relevant Ecommerce Search

    Every e-commerce shop owner wants to achieve ecommerce search relevance with the search results on their website.

    But what does that really mean?

    As the German singer Herbert Grönemeyer once stated: “It could all be so easy, but it isn’t”.

    It could all be so easy, but it isn’t

    It may mean that they look for products matching their search intent.

    All of them.

    New ones on top.

    Or the top-sellers.

    Or the ones with the best ratings.

    Or the ones now available.

    Or maybe the cheap ones, where they can save the most money right now?

    What is Search Relevance for the Shop Owner

    It may even mean something entirely different, like having the products on top with the best margin. Or the old ones, which should free space in the warehouse.

    It’s evident that the goals are not the same; sometimes even contradictory.

    How to Overcome Hurdles to Ecommerce Search Relevance?

    As with most things, the solution is an even blend of several strategies. These will allow both a strong foundation to reach a broad audience, while simultaneously retaining enough focus to meet individual customer intent.

    But even with the perfect ranking cocktail, you will still have to do your homework concerning the basic mechanics of finding all relevant products in the first place.

    So let’s start with that.

    Step #1 — Data Retrieval is Key in Making Search Relevant:

    Ask yourself what kind of data you need and if you are making use of all its potential yet.

    Don’t forgo the basics!

    It’s easiest if you begin this exercise with the following analogy top-of-mind: Imagine you are building a skyscraper!

    Data is the Foundation for Ecom Relevance

    If the basement is not level, you can try as hard as you want, the construction will fall apart.

    Or, to borrow another analogy: painting a wrecked ship in a fancy orange color will still leave you with a ship wreck. So don’t try to use fancy stuff like machine learning to compensate for crappy data.

    Achieving ecommerce search relevance is just as much about, wisely, using every available piece of data you have throughout your databases, as it is conceiving a relevant structure to support it.

    Keep in mind details like the findability of terms. Having many technical specifications is great. Having them in a normalized matter is even better.

    Create Relevant Search Results – Not New Paint on a Rusty Ship!

    A simple example of this are colors. The products of brands tend to use fancy names like “space gray” or “midnight green”.

    But that is not what your customers will search for. At least not the majority of customers.

    As a result, for the purposes of searchability and facetability, it is necessary to map all brand-specific terms to the generally used terms like black and green.

    Keep it simple!

    Further to normalization: if your customers are searching for sizes in different ways, e.g., 1 TB vs. 1000 GB, you need to make it convenient for customers to find both.

    Key to the success of this kind of approach, is structurally separating facet data from search data. All variations must be findable, but only the core values used for faceting.

    True, there are several software vendors out there who can help you normalize your product data. However, a few simple processing steps, that you plug into your data processing pipeline, will improve your data enough to considerably increase both findability, and facetability.

    Step #2: Data Structuring – for Ecommerce Search Relevance

    Assuming you are satisfied with your general data quality, the next important step is, to think about the database structure. This structure will support you and your customers not only to find all related products to a given query, but also to ensure they are returned in the right order. At least more or less the right order, but we’ll get to that later.

    Naturally, part of your data structure needs to be weighting the different pieces of information you declare searchable. This means the product name is more important than the technical features. However, features still take precedence over the long description when describing your product.

    Actually, an often missed piece of the relevancy puzzle is doing the necessary work to determine which parts of your data structure are essential for relevant intent-based results.

    In fact, in many cases, It has proven more lucrative to eliminate long descriptions all together, as they unnaturally bloat your search results. Random hits are most likely not adding value to the overall experience.

    As mentioned previously, it’s always a tradeoff between “total-recall” (return everything that could be relevant, and live with additional false results) and precision (return the right stuff, albeit not every item).

    What About Stemming to Increase Relevance?

    Some search engines allow you to influence the algorithm in detail on a “per field” level.

    Stemming is useful on fields with a lot of natural language. But please — don’t use stemming on brand names!

    On a similar note, technical features can have units, e.g., “55 inch” or “1.5 kg”. Making this kind of stuff findable can be tricky because people tend to search for it in different ways (1.5 kg vs. 1.5 kg).

    For this reason, it’s important to:

    1. normalize it in your data and,
    2. make sure to do the same steps during query time.

    How Best to Structure Multi-Language Product Data Feeds for Optimal Relevance?

    If you sell into multiple countries with different languages, set up your indexes to use the correct type of normalization for special characters like Umlauts or different writings for the same character.

    Recently, I ran into a case that illustrates this problem quite well, when I noticed people searching for iPhone with characters like í or ì instead of the normal i. Needless to say, it’s imperative these cases are handled correctly. And it’s not as if you have to configure everything on your own. There are ready to configure libraries available for a variety of search engines.

    Ecommerce Product ranking

    As stated previously, in the introduction, due to the contradictory nature between a user’s intent and the goals of the shop manager, ranking of found items can be tricky.

    However, under normal circumstances, you need simply to apply a few basic rules like de-boosting accessory articles, to get the desired results. To achieve this, you must, first, be able to identify what an accessory item is. This means that you, ideally, have a flag you can set in your data. If there is no flag, and you have no way of marking articles in the database, you may get lucky and have a well-maintained category structure. In this case, you can utilize an alternative method and de-boost articles from specific categories instead.

    You may also find it helpful to attempt to reconstruct accessory items by identifying “joining words” like “for” (case for smartphone XY, cartridge for printer YZ).

    If neither is the case (haha), I strongly suggest you start flagging your items now. Otherwise, it will be much harder to achieve ecommerce search relevance.

    The remainder of the ranking rules depend on your audience and your preferences. Be sure you have ample data within your database to pull from! Things like “margin”, or “sold items count”. This will give you flexibility to utilize different approaches and even be able to A/B test them. Please don’t hesitate to add more values to your data, which you deem relevant for scoring your products!

    These types of rankings are applied globally, completely query-agnostic.

    Tracking and Search Term Boosting

    Now, we come to the part, where you let your customers do the work for you.

    How, you ask? By making easy use of customer behavior within the shop to enhance the results. To do this, simply take the queries, clicks, add to carts and buying events and combine them at session level.

    Why bother with the session? Isn’t it possible to just use the distinct “click path”? Let me take you through an example. Imagine your customer is searching for something, but doesn’t find it because of a typo or different naming in the shop. As a result, he might leave or try to find the right product via the shop’s category navigation. If he finds what he’s looking for, you both get lucky. You now have a link between the former query and the correct product.

    This may even result in you learning new synonyms. Nevertheless, be careful. Should your thresholds be too low to filter out random links, you may end up with many false results.

    Now that you have a link between queries and products, you can attach the query to the products and use that for boosting at query time.

    Keep in mind that boosting is pretty safe, as long as your engine emphasizes precision over recall. You may want to stick to tracking click paths, if you are returning large result sets with blurry matches. For this reason, it’s essential to make sure the query truly belongs to the subsequent actions to not confuse every action within a given session.

    These optimizations will already be visible in better results. At least for your most popular products that is. To mitigate a positive feedback loop (popular products get all the attention) ensure new products get a fair chance of being shown. This is simple enough by adding a boost, to new products, for a short time after their release.

    But How Do I Achieve Search Relevance for the Rest of my Products?

    Let’s expand this one level further and generalize the links we created in the last stage.

    If, for example, for the search term “galaxy”, some real phones are being interacted with, we can insinuate, this behavior could also apply for the rest of the products from that category or product type. As mentioned previously, it is imperative that you have clean data as not to mix up stuff like “smartphones & accessories”. Good luck, if you’re using this type of key to generalize your tracking links! Don’t do it — clean your data first!

    In the example at hand, we want to achieve a link between the query and all products of the type “smartphone”. Subsequently, we can add a boosting for all the smartphones found and voilà…

    You get a result with smartphones on top. The most relevant ones getting an extra punch from the direct query relation.

    And finally, the relevancy of the products is a stack of boostings:

    1. First by the field weight
    2. Then by ranking criteria
    3. And in the end by the tracking events.

    If you got this far, you might also be interested in the more advanced techniques like “learning to rank”.

    This method applies the principles of machine learning to the product ranking mechanism. However, it will require some supervision to, successfully, learn the right things.

    Or perhaps you want to integrate personalization for individual visitors. Wait a minute… maybe that topic is so comprehensive, it would be better left for another blog post…

    So, now we’re done, right?

    Well, not so fast 😉

    Query Preprocessing for Ecommerce Search Relevance

    The whole data part is only one side of the coin. Your customers may still need some help finding what they look for.

    For this reason, you should implement some preprocessing of the incoming queries before forwarding them to your search engine.

    Preprocessing can be as simple as creating tasks to remove so-called stop words, i.e., filler-words words like a, the, at, also, etc.

    Remove Obstacles to Relevant Search

    If your engine does not come with a list of stop words, you can search the internet and adapt a list to meet your needs. In addition, counting the words from your data and checking which word most frequently qualify as a stop word for you can be very effective.

    Some search engines even allow reducing the value of those words to a bare minimum. This method can help you to better rank the one product where, actually, the whole phrase matches (e.g., “live at Wembley” instead of “live … Wembley”).

    We also mustn’t forget the need to support your customers should their language differ from the one used describing your products. For this reason, you need to establish a set of synonyms for the cases where you would, otherwise, end up with no match results.

    Please keep in mind, if your search engine also provides a way to define antonyms for similar words with diverging meaning, e.g., “bad” and “bat”, make sure you fully understand how this cleans/shapes the results. In some cases, products containing both words will be kicked out of the result for triggering antonyms on both sides of the spectrum.

    If you’re able to, use deboosting for the antonym instead of completely removing it. It can save your day!

    And finally, your customers might misspell words like I just did… did you notice? Well, your search engine will notice.

    Or, what about the scenario when your search just won’t find the right things? Or anything at all, for that matter. Or, maybe the result varies negatively because the boosting for one frequent query is better than the ranking for an alternative spelling.

    In this case, you could add a preprocessing rule. And for some frequently used queries, it might work out.

    But eventually, you will get lost in the long tail of queries, completely. Tools like our searchHub can help you in matching all variations of a query to the perfect master query. This master query is then sent to your search engine —whatever flavor search engine you might have.

    searchHub identifies any types of misspellings, or even ambiguous albeit, correct, spellings (playstation 5, ps 5, PS5, play-station 5) or typos (plystation 5, plyastation 5, etc.).

    We know which query performs best so that you don’t have to!

    If you want to see your shop’s queries clustered around top-performing master queries, and get to know the full potential this has on your conversions, please feel free to contact us!

  • Search Orchestration with Dumb Services

    Search Orchestration with Dumb Services

    You may think the word “dumb services” in connection with software orchestration in the title is clickbait. However, I can assure you, the aforementioned “Orchestration with Dumb Services”, is a real and simple software orchestration concept certainly to improve your sleep.

    Any engineer, DevOps, or software architect can relate to the stress of running a loaded production system. To do so well, it’s necessary to automate, provision, monitor, provide redundancy and fail-over to hit those SLAs. The following paragraphs cut to the chase. You won’t see any fancy buzzwords. I aim to help avoid pitfalls into which many companies stumble, when untangling monolithic software projects. Or, for that matter, even when building small projects from scratch. While the concept is not applicable for every use case, it does fit perfectly into the world of e-commerce search. It’s even applicable for full-text search. Generally, wherever the search index read and writes are separate pipelines, this is for you! So, what are we waiting for? Let’s start orchestrating with dumb services.

    What is the Difference Between Regular vs. Dumb services

    To begin, let’s define the term “service

    A service is a piece of software that performs a distinct action.

    Nowadays, a container running as part of a kubernetes cluster is a good example of a service. This container can spin-up multiple instances of the service to meet demand. The configuration of a so-called regular service points it to other services it may need. These could be things like connections to databases, and so on.

    Regular Services in action are seen illustrated in the diagram to the right. As they grow, companies run many such hierarchically organized services.

    Regular Service Hierarchy

    Dumb Services

    Now, let’s clarify what “dumb service” means. In this context, a dumb service is a service which knows nothing about its environment. Its configuration is reduced to performance related aspects (ex. memory limits). When you start such a service, it does nothing — no connection to other services, no joining of clusters, just waits to be told what to do.

    Orchestrator Services

    To create a full system composed of dumb services, you deploy another service type called an “orchestrator”. The orchestrator is the “brain”, the dumb services are the “muscle” — the brain tells the muscles what to do.

    The orchestrator sends tasks to each service. Additionally, it directs the data exchange between services. Finally, it pushes data and configurations to the client facing services. Furthermore, the orchestrator initiates all service state changes.

    Dumb Service Orchestration

    Let’s review our “regular vs. dumb” services in light of two key aspects of a software system — fault tolerance and scalability.

    Fault Tolerance

    Fault Tolerance with Regular Services

    In the regular case diagram we illustrate a typical flow during a user request. The client facing services at level 1 (labeled with L1 in the diagram) need to call the internal services at levels 2 and 3 to complete the request. Naturally, in a larger system, this call hierarchy goes much deeper. To meet the SLA, all services must be up all time as any incoming request could call a service further down the hierarchy. This is obviously a hard task, combining N services with uptime of 99.95% does not result in 99.95% for the entire system — in the worst case, for a request that hits 5 services you’d go down to 99.7% (99.95 to the power of N).

    Fault Tolerance with dumb services.

    Let’s compare this to the system composed of dumb services. The client facing services on level 1, serve the request without any dependency to the services at level 2 and 3. We only need to ensure the SLA of the L1 services to ensure the SLA of the entire client facing part of the system — services at levels 2 and 3 could go down without affecting user requests.

    Scaleability

    Scaling Regular Services

    Scaling the system composed of regular services, necessarily means scaling the entire system. If only one layer is scaled, it could result in overloading the lower layers of the system as user requests increase. The process of scaling also means more automation as you need to correctly wire the services together to scale.

    Scaling Dumb Services Architecture

    Let’s take a look again at our dumb services architecture. Each service can be scaled independently as it has no direct dependencies on any other services. You can spin up as many client facing services as you like to meet increased user requests without scaling any of the internal systems. And vice versa, you can increase the number of nodes for internal services on demand to meet a heavy indexing task and then easily spin it down. Again, all this without affecting user requests.

    What about Testing?

    Finally, testing your service is simple: you start it and pass a task to it — no dependencies you need to consider.

    Wrapping it up

    In conclusion, you can simplify your architecture significantly by deploying this simple concept. However, as mentioned previously, this does not apply to all use cases. Situations, where your client facing nodes are part of both the reading and writing data pipelines, are harder to organize in this way. Even still, any time you’re faced with designing a system composed of multiple services, think about these patterns — it may save you a few sleepless nights.

  • How to Deploy Elasticsearch in Kubernetes Using the cloud-on-k8s Elasticsearch-Operator

    How to Deploy Elasticsearch in Kubernetes Using the cloud-on-k8s Elasticsearch-Operator

    Many businesses run an Elasticsearch/Kibana stack. Some use a SaaS-Service for Elastic — i.e., the AWS Amazon Elasticsearch Service; the Elastic in Azure Service from Microsoft; or the Elastic Cloud from Elastic itself. More commonly, Elasticsearch is hosted in a proprietary environment. Elastic and the community provide several deployment types and tips for various platforms and frameworks. In this article, I will show how to deploy Elasticsearch and Kibana in a Kubernetes Cluster using the Elastic Kubernetes Operator (cloud-on-k8s) without using Helm (helm / helm-charts). Overview of Elastic Deployment Types and Configuration:

    The Motivation for Using the Elasticsearch-Operator:

    What might be the motivation for using the Elasticsearch-Operator instead of using any other SaaS-Service?

    The first argument is, possibly, the cost. The Elastic Cloud is round about 34% pricier than hosting your own Elasticsearch on the same instance in AWS. Furthermore, the AWS Amazon Elasticsearch Service is even 50% more expensive than the self-hosted version.

    Another argument could be that you already have a Kubernernetes-Cluster running with the application which you would like to use Elasticsearch with. For this reason, you want to avoid spreading one application over multiple environments. So, you are looking to use Kubernetes as your go-to standard.

    Occasionally, you may also have to build a special solution with many customizations that are not readily deployable with a SaaS provider.

    An important argument for us was the hands-on experience hosting Elasticsearch, to give the best support to our customers.

    Cluster Target Definition:

    For the purposes of this post, I will use a sample cluster running on AWS. Remember to always include the following features:

    • 6 node clusters (3 es-master, 3 es-data)
    • master and data nodes are spread over 3 availability zones
    • a plugin installed to snapshot data on S3
    • dedicated nodes where only elastic services are running on
    • affinities that not two elastic nodes from the same type are running on the same machine

     

    Due to this article’s focus on how to use the Kubernetes Operator, we will not provide any details regarding necessary instances, the reason for creating different instance groups, or the reasons behind several pod anti affinities.

    In our Kubernetes cluster, we have two additional Instance Groups for Elasticsearch: es-master and es-data where the nodes have special taints.

    (In our example case, the instance groups are managed by kops. However, you can simply add the labels and taints to each node manually.)

    The Following is an example of how a node of the es-master instance group looks like:

    				
    					apiVersion: v1
    kind: Node
    metadata:
      ...
      labels:
        failure-domain.beta.kubernetes.io/zone: eu-north-1a
        kops.k8s.io/instancegroup: es-master
        kubernetes.io/hostname: ip-host.region.compute.internal
        ...
    spec:
      ...
      taints:
      - effect: NoSchedule
        key: es-node
        value: master
    				
    			

    As you may have noticed, there are three different labels:

    1. The failure-domain.beta.kubernetes.io/zone contains the information pertaining to the availability zone in which the instance is running.
    2. The kops.k8s.io/instancegroup contains the information in which instance the group resides. This will be important later to allow both master and data nodes to run on different hardware for performance optimization.
    3. The kubernetes.io/hostname acts as a constraint to ensure only one master node is running the specified instance.

     

    Following is an example of an es-data instance with the appropriate label keys, and respective values:

    				
    					apiVersion: v1
    kind: Node
    metadata:
      ...
      labels:
        failure-domain.beta.kubernetes.io/zone: eu-north-1a
        kops.k8s.io/instancegroup: es-data
        kubernetes.io/hostname: ip-host.region.compute.internal
        ...
    spec:
      ...
      taints:
      - effect: NoSchedule
        key: es-node
        value: data
    				
    			

    As you can see, the value of the es-node taint and the kops.k8s.io/instancegroup label differs. We will reference these values later to decide between data and master instances.

    Now that we have illustrated our node structure, and you are better able to grasp our understanding of the Kubernetes and Elasticsearch cluster, we can begin installation of the Elasticsearch operator in Kubernetes.

    Let’s Get Started:

    First: install the Kubernetes Custom Resource Definitions, RBAC rules (if RBAC is activated in the cluster in question), and a StatefulSet for the elastic-operator pod. In our example case, we have RBAC activated and can make use of the all-in-one deployment file from Elastic for installation.

    (Notice: If RBAC is not activated in your cluster, then remove line 2555 – 2791 and all service-account references in the file):

    				
    					kubectl apply -f https://download.elastic.co/downloads/eck/1.2.1/all-in-one.yaml
    				
    			

    This creates four main parts in our Kubernetes cluster to operate Elasticsearch:

    • All necessary Custom Resource Definitions
    • All RBAC Permissions which are needed
    • A Namespace for the Operator (elastic-system)
    • A StatefulSet for the Elastic Operator-Pod

    Now perform kubectl logs -f on the operator’s pod and wait until the operator has successfully booted to verify the Installation. Respond to any errors, should an error message appear.

    				
    					kubectl -n elastic-system logs -f statefulset.apps/elastic-operator
    				
    			

    Once confirmed that the operator is up and running we can begin with our Elasticsearch cluster. We begin by creating an Elasticsearch resource with the following main structure (see here for full details):

    				
    					apiVersion: elasticsearch.k8s.elastic.co/v1
    kind: Elasticsearch
    metadata:
      name: blogpost # name of the elasticsearch cluster
      namespace: blog
    spec:
      version: 7.7.0 # elasticsearch version to deploy
      nodeSets: # nodes of the cluster
      - name: master-zone-a
        count: 1 # count how many nodes should be deployed
        config: # specific configuration for this node type
          node.master: true
        ...
      - name: master-zone-b
      - name: master-zone-c
      - name: data-zone-a
      - name: data-zone-b
      - name: data-zone-c
    				
    			

    In the listing above, you see how easily the name of the Elasticsearch cluster, as well as, the Elasticsearch version and different nodes that make up the cluster can be set. Our Elasticsearch structure is clearly specified in the array nodeSets, which we defined earlier. As a next step, we want to take a more in-depth look into a single nodeSet entry and see how this must look to adhere to our requirements:

    				
    					- name: master-zone-a
        count: 1
        config:
          node.master: true
          node.data: false
          node.ingest: false
          node.attr.zone: eu-north-1a
          cluster.routing.allocation.awareness.attributes: zone
        podTemplate:
          metadata:
            labels:
              component: elasticsearch
              role: es-master
          spec:
            volumes:
              - name: elasticsearch-data
                emptyDir: {}
            affinity:
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms: 
                  - matchExpressions: # Kniff mit Liste
                    - key: kops.k8s.io/instancegroup
                      operator: In
                      values:
                      - es-master
                    - key: failure-domain.beta.kubernetes.io/zone
                      operator: In
                      values:
                      - eu-north-1a
              podAntiAffinity:
                preferredDuringSchedulingIgnoredDuringExecution:
                - weight: 100
                  podAffinityTerm:
                    labelSelector:
                      matchExpressions:
                      - key: role
                        operator: In
                        values:
                        - es-master
                    topologyKey: kubernetes.io/hostname
            initContainers:
            - name: sysctl
              securityContext:
                privileged: true
              command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
            - name: install-plugins
              command:
              - sh
              - -c
              - |
                bin/elasticsearch-plugin install -b repository-s3
            tolerations:
            - key: "es-node"
              operator: "Equal"
              value: "master"
              effect: "NoSchedule"
            containers:
            - name: elasticsearch
              resources:
                requests:
                  memory: 1024Mi
                limits:
                  memory: 1024Mi
    				
    			

    The count key specifies, for example, how many pods Elasticsearch nodes should create with this node configuration for the cluster. The config object represents the untyped YAML configuration of Elasticsearch (Elasticsearch settings). The podTemplate contains a normal Kubernetes Pod template definition. Notice that here we are controlling the affinity and tolerations of our es-node to a special instance group and all pod affinities. In the initContainers section, we are handling kernel configurations and also the Elasticsearch repository-s3 plugin installation. One note on the nodeSelectorTerms: if you want to use the logical and condition instead of, or, you must place the conditions in a single matchExpressions array and not as two individual matchExpressions. For me, this was not clearly described in the Kubernetes documentation.

    Once we have created our Elasticsearch deployment, we must create a Kibana deployment. This can be done with the Kibana resource. The following is a sample of this definition:

    				
    					apiVersion: kibana.k8s.elastic.co/v1
    kind: Kibana
    metadata:
      name: blogpost
      namespace: blog
    spec:
      version: 7.7.0
      count: 1
      elasticsearchRef:
        name: blogpost
      podTemplate:
        metadata:
          labels:
            component: kibana
    				
    			

    Notice that the elasticsearchRef object must refer to our Elasticsearch to be connected with it.

    After we have created all necessary deployment files, we can begin deploying them. In our case, I put them in one big file called elasticseach-blog-example.yaml, you can find a complete list of the deployment files at the end of this blogpost.

    				
    					kubectl apply -f elasticsearch-blog-example.yaml
    				
    			

    After deploying the deployment file you should have a new namespace with the following pods, services and secrets (Of course with more resources, however this is not relevant for our initial overview):

    				
    					(⎈ |blog.k8s.local:blog)➜  ~ kubectl get pods,services,secrets 
    NAME                              READY  STATUS   RESTARTS  AGE
    pod/blogpost-es-data-zone-a-0     1/1    Running  0         2m
    pod/blogpost-es-data-zone-b-0     1/1    Running  0         2m
    pod/blogpost-es-data-zone-c-0     1/1    Running  0         2m
    pod/blogpost-es-master-zone-a-0   1/1    Running  0         2m
    pod/blogpost-es-master-zone-b-0   1/1    Running  0         2m
    pod/blogpost-es-master-zone-c-0   1/1    Running  0         2m
    pod/blogpost-kb-66d5cb8b65-j4vl4  1/1    Running  0         2m
    NAME                               TYPE       CLUSTER-IP     PORT(S)   AGE
    service/blogpost-es-data-zone-a    ClusterIP  None           <none>    2m
    service/blogpost-es-data-zone-b    ClusterIP  None           <none>    2m
    service/blogpost-es-data-zone-c    ClusterIP  None           <none>    2m
    service/blogpost-es-http           ClusterIP  100.68.76.86   9200/TCP  2m
    service/blogpost-es-master-zone-a  ClusterIP  None           <none>    2m
    service/blogpost-es-master-zone-b  ClusterIP  None           <none>    2m
    service/blogpost-es-master-zone-c  ClusterIP  None           <none>    2m
    service/blogpost-es-transport      ClusterIP  None           9300/TCP  2m
    service/blogpost-kb-http           ClusterIP  100.67.39.183  5601/TCP  2m
    NAME                                        DATA  AGE
    secret/default-token-thnvr                  3     2m
    secret/blogpost-es-data-zone-a-es-config    1     2m
    secret/blogpost-es-data-zone-b-es-config    1     2m
    secret/blogpost-es-elastic-user             1     2m
    secret/blogpost-es-http-ca-internal         2     2m
    secret/blogpost-es-http-certs-internal      3     2m
    secret/blogpost-es-http-certs-public        2     2m
    secret/blogpost-es-internal-users           2     2m
    secret/blogpost-es-master-zone-a-es-config  1     2m
    secret/blogpost-es-master-zone-b-es-config  1     2m
    secret/blogpost-es-master-zone-c-es-config  1     2m
    secret/blogpost-es-remote-ca                1     2m
    secret/blogpost-es-transport-ca-internal    2     2m
    secret/blogpost-es-transport-certificates   11    2m
    secret/blogpost-es-transport-certs-public   1     2m
    secret/blogpost-es-xpack-file-realm         3     2m
    secret/blogpost-kb-config                   2     2m
    secret/blogpost-kb-es-ca                    2     2m
    secret/blogpost-kb-http-ca-internal         2     2m
    secret/blogpost-kb-http-certs-internal      3     2m
    secret/blogpost-kb-http-certs-public        2     2m
    secret/blogpost-kibana-user                 1     2mm
    				
    			

    As you may have noticed, I removed the column EXTERNAL from the services and the column TYPE from the secrets. I did this due to the formatting in the code block.

    Once Elasticsearch and Kibana have been deployed we must test the setup by making an HTTP get request with the Kibana-Dev-Tools. First, we have to get the elastic user and password which the elasticsearch-operator generated for us. It’s saved in the Kubernetes Secret -es-elastic-user in our case blogpost-es-elastic-user.

    				
    					(⎈ |blog.k8s.local:blog)➜  ~ kubectl get secret/blogpost-es-elastic-user -o yaml 
    apiVersion: v1
    data:
      elastic: aW8zQWhuYWUyaWVXOEVpM2FlWmFoc2hp
    kind: Secret
    metadata:
      creationTimestamp: "2020-10-21T08:36:35Z"
      labels:
        common.k8s.elastic.co/type: elasticsearch
        eck.k8s.elastic.co/credentials: "true"
        elasticsearch.k8s.elastic.co/cluster-name: blogpost
      name: blogpost-es-elastic-user
      namespace: blog
      ownerReferences:
      - apiVersion: elasticsearch.k8s.elastic.co/v1
        blockOwnerDeletion: true
        controller: true
        kind: Elasticsearch
        name: blogpost
        uid: 7f236c45-a63e-11ea-818d-0e482d3cc584
      resourceVersion: "701864"
      selfLink: /api/v1/namespaces/blog/secrets/blogpost-es-elastic-user
      uid: 802ba8e6-a63e-11ea-818d-0e482d3cc584
    type: Opaque
    				
    			

    The user of our cluster is the key, located under data. In our case, elastic. The password is the corresponding value of this key. It’s Base64 encoded, so we have to decode it:

    				
    					(⎈ |blog.k8s.local:blog)➜  ~ echo -n "aW8zQWhuYWUyaWVXOEVpM2FlWmFoc2hp" | base64 -d
    io3Ahnae2ieW8Ei3aeZahshi
    				
    			

    Once we have the password we can port-forward the blogpost-kb-http service on port 5601 (Standard Kibana Port) to our localhost and access it with our web-browser at https://localhost:5601:

    				
    					(⎈ |blog.k8s.local:blog)➜  ~ kubectl port-forward service/blogpost-kb-http 5601      
    Forwarding from 127.0.0.1:5601 -> 5601
    Forwarding from [::1]:5601 -> 5601
    				
    			

    Elasticsearch Kibana Login Screen

    After logging in, navigate on the left side to the Kibana Dev Tools. Now perform a GET / request, like in the picture below:

    Getting started with your Elasticsearch Deployment inside the Kibana Dev Tools

    Summary

    We now have an overview of all officially supported methods of installing/operating Elasticsearch. Additionally, we successfully set up a cluster which met the following requirements:

    • 6 node clusters (3 es-master, 3 es-data)
    • we spread master and data nodes over 3 availability zones
    • installed a plugin to snapshot data on S3
    • has dedicated nodes in which only elastic services are running
    • upholds the constraints that no two elastic nodes of the same type are running on the same machine

     

    Thanks for reading!

    Full List of Deployment Files

    				
    					apiVersion: v1
    kind: Namespace
    metadata:
      name: blog
    ---
    apiVersion: elasticsearch.k8s.elastic.co/v1
    kind: Elasticsearch
    metadata:
      name: blogpost
      namespace: blog
    spec:
      version: 7.7.0
      nodeSets:
      - name: master-zone-a
        count: 1
        config:
          node.master: true
          node.data: false
          node.ingest: false
          node.attr.zone: eu-north-1a
          cluster.routing.allocation.awareness.attributes: zone
        podTemplate:
          metadata:
            labels:
              component: elasticsearch
              role: es-master
          spec:
            volumes:
              - name: elasticsearch-data
                emptyDir: {}
            affinity:
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms: 
                  - matchExpressions:
                    - key: kops.k8s.io/instancegroup
                      operator: In
                      values:
                      - es-master
                    - key: failure-domain.beta.kubernetes.io/zone
                      operator: In
                      values:
                      - eu-north-1a
              podAntiAffinity:
                preferredDuringSchedulingIgnoredDuringExecution:
                - weight: 100
                  podAffinityTerm:
                    labelSelector:
                      matchExpressions:
                      - key: role
                        operator: In
                        values:
                        - es-master
                    topologyKey: kubernetes.io/hostname
            initContainers:
            - name: sysctl
              securityContext:
                privileged: true
              command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
            - name: install-plugins
              command:
              - sh
              - -c
              - |
                bin/elasticsearch-plugin install -b repository-s3
            tolerations:
            - key: "es-node"
              operator: "Equal"
              value: "master"
              effect: "NoSchedule"
            containers:
            - name: elasticsearch
      - name: master-zone-b
        count: 1
        config:
          node.master: true
          node.data: false
          node.ingest: false
          node.attr.zone: eu-north-1b
          cluster.routing.allocation.awareness.attributes: zone
        podTemplate:
          metadata:
            labels:
              component: elasticsearch
              role: es-master
          spec:
            volumes:
              - name: elasticsearch-data
                emptyDir: {}
            affinity:
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms:
                  - matchExpressions:
                    - key: kops.k8s.io/instancegroup
                      operator: In
                      values:
                      - es-master
                    - key: failure-domain.beta.kubernetes.io/zone
                      operator: In
                      values:
                      - eu-north-1b
            initContainers:
            - name: sysctl
              securityContext:
                privileged: true
              command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
            - name: install-plugins
              command:
              - sh
              - -c
              - |
                bin/elasticsearch-plugin install -b repository-s3
            tolerations:
            - key: "es-node"
              operator: "Equal"
              value: "master"
              effect: "NoSchedule"
            containers:
            - name: elasticsearch
      - name: master-zone-c
        count: 1
        config:
          node.master: true
          node.data: false
          node.ingest: false
          node.attr.zone: eu-north-1c
          cluster.routing.allocation.awareness.attributes: zone
        podTemplate:
          metadata:
            labels:
              component: elasticsearch
              role: es-master
          spec:
            volumes:
              - name: elasticsearch-data
                emptyDir: {}
            affinity:
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms:
                  - matchExpressions:
                    - key: kops.k8s.io/instancegroup
                      operator: In
                      values:
                      - es-master
                    - key: failure-domain.beta.kubernetes.io/zone
                      operator: In
                      values:
                      - eu-north-1c
            initContainers:
            - name: sysctl
              securityContext:
                privileged: true
              command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
            - name: install-plugins
              command:
              - sh
              - -c
              - |
                bin/elasticsearch-plugin install -b repository-s3
            tolerations:
            - key: "es-node"
              operator: "Equal"
              value: "master"
              effect: "NoSchedule"
            containers:
            - name: elasticsearch       
      - name: data-zone-a
        count: 1
        config:
          node.master: false
          node.data: true
          node.ingest: true
          node.attr.zone: eu-north-1a
          cluster.routing.allocation.awareness.attributes: zone
        podTemplate:
          metadata:
            labels:
              component: elasticsearch
              role: es-worker 
          spec:
            volumes:
              - name: elasticsearch-data
                emptyDir: {}
            affinity:
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms:
                  - matchExpressions:
                    - key: kops.k8s.io/instancegroup
                      operator: In
                      values:
                      - es-data
                    - key: failure-domain.beta.kubernetes.io/zone
                      operator: In
                      values:
                      - eu-north-1a
            initContainers:
            - name: sysctl
              securityContext:
                privileged: true
              command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
            - name: install-plugins
              command:
                - sh
                - -c
                - |
                  bin/elasticsearch-plugin install -b repository-s3
            tolerations:
            - key: "es-node"
              operator: "Equal"
              value: "data"
              effect: "NoSchedule"
            containers:
            - name: elasticsearch
      - name: data-zone-b
        count: 1
        config:
          node.master: false
          node.data: true
          node.ingest: true
          node.attr.zone: eu-north-1b
          cluster.routing.allocation.awareness.attributes: zone
        podTemplate:
          metadata:
            labels:
              component: elasticsearch
              role: es-worker 
          spec:
            volumes:
              - name: elasticsearch-data
                emptyDir: {}
            affinity:
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms:
                  - matchExpressions:
                    - key: kops.k8s.io/instancegroup
                      operator: In
                      values:
                      - es-data
                    - key: failure-domain.beta.kubernetes.io/zone
                      operator: In
                      values:
                      - eu-north-1b
            initContainers:
            - name: sysctl
              securityContext:
                privileged: true
              command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
            - name: install-plugins
              command:
                - sh
                - -c
                - |
                  bin/elasticsearch-plugin install -b repository-s3
            tolerations:
            - key: "es-node"
              operator: "Equal"
              value: "data"
              effect: "NoSchedule"
            containers:
            - name: elasticsearch
      - name: data-zone-c
        count: 1
        config:
          node.master: false
          node.data: true
          node.ingest: true
          node.attr.zone: eu-north-1c
          cluster.routing.allocation.awareness.attributes: zone
        podTemplate:
          metadata:
            labels:
              component: elasticsearch
              role: es-worker 
          spec:
            volumes:
              - name: elasticsearch-data
                emptyDir: {}
            affinity:
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms:
                  - matchExpressions:
                    - key: kops.k8s.io/instancegroup
                      operator: In
                      values:
                      - es-data
                    - key: failure-domain.beta.kubernetes.io/zone
                      operator: In
                      values:
                      - eu-north-1c
            initContainers:
            - name: sysctl
              securityContext:
                privileged: true
              command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
            - name: install-plugins
              command:
                - sh
                - -c
                - |
                  bin/elasticsearch-plugin install -b repository-s3
            tolerations:
            - key: "es-node"
              operator: "Equal"
              value: "data"
              effect: "NoSchedule"
            containers:
            - name: elasticsearch
    ---
    apiVersion: kibana.k8s.elastic.co/v1
    kind: Kibana
    metadata:
      name: ui
      namespace: ui
    spec:
      version: 7.7.0
      count: 1
      elasticsearchRef:
        name: ui
      podTemplate:
        metadata:
          labels:
            component: kibana
    				
    			
  • Business at the Speed of the Atom

    Business at the Speed of the Atom

    Recently, I’ve been thinking about the speed at which we conduct business. Business at the speed of the atom, you could say. Advancements in computer technology, specifically, artificial intelligence, afford us more momentary physical comfort. It’s always been that way. So, what’s different — now?

    Since the dawn of the First Industrial Revolution, we have consistently been using technology to enhance our level of personal comfort, and luxurious lifestyle. Or, if you prefer: get more done in less time.

    So far, so good.

    What happens, however, whenever our work is providing us with enormous comfort, and grand luxury, albeit dishonestly? What happens when digital software development becomes about lining my pockets, instead of the value I’m bringing to my customers?

    Why Should Business Speed Concern Me?

    Psychologists see a positive correlation between honest work, and reward for that work (whether monetary, or social), and psychological well-being.

    Genuine happiness is impossible without authentic concern for and corresponding behavior towards the well-being of others. Nicole Torka (2019)

    It follows, that if I am going to ensure healthy customers, and employees, I need first to design my business accordingly. Business must consider the smallest of atomic structures. As I result, I must take pains to authentically care for the needs of my customers, as well as, my workforce. The operative word here is authentic.

    Because, they suffer a disproportionate risk of becoming self-serving, virtual forms of labor, and services are in a dangerous position. With that I mean that we focus on creating software simply to get rich.

    What’s the Solution to Atomic Business Speeds?

    Honest Digital Development. Simply because something can be made and brought to market to make a buck, doesn’t mean it should be. Try performing the following litmus test before launching your next product or service. Make sure you’re in a private place whenever you do this. Otherwise, there’s great potential for, shameless, disingenuity: i.e., to simply lie your pants off:

    The Litmus Test

    Ask yourself: “who gets the most out of this?” If the answer is “me” — you will end your business with customers who don’t trust you and a workforce who can’t wait to get away from you.

    What Does Honest Digital Business Look Like?

    Digital solutions offer a unique opportunity to give an honest answer for how best to close the gap between technology, or the immaterial, and the tangible sides of our lives.

    The following are some top-of-mind examples of the kinds of things I’m talking about:

    Make it Easier to Purchase Online.

    Not ads, or #bullshitmarketing Remember this is about honesty. Information is key to the buying process. It follows that digital solutions will consistently reveal in context information necessary for me to more quickly, and confidently purchase what I’m looking for. How does your solution help with that?

    Supplying People with Information

    so they can chart new courses never dreamed of before. This field has been exploding for years, and I’m happy about what I see.

    And Thousands More Solutions …

    These types of solutions may, logically, provide value to all parties involved — a zero-sum-game. Nevertheless, popular culture will tend to err toward the path of least resistance. Presently, this has created a whole generation of stars grabbing microphones, and getting in front of a camera to tell the world about the latest thing which is „sooo cool“.

    And they’re right: there are a lot of cool new things out there every day (and I confess: I enjoy watching them too :-)). But who is all those hours of video providing value to? Yep… them (and the publishers and advertisers). It’s a non-zero-sum-game. They get value you get none. Nuff said.

    How Do We Stay Honest?

    I think all the right pieces are at our fingertips and there are certainly several forays making a difference at the crossroads of the digital to tangible value discussion. However, the ease at which digital content is consumed, the amount of increased free-time we have (machines helping us get things done more efficiently and faster), and the low cost of entry to highly professional equipment creates a greater need for business to take an honest look in the mirror and self-correct.

    Be Ready to Let Things Die

    We want to leave a world for those who come after us to build upon. To do so, we must make an honest effort to wield the weapon of business to increase the quality of LIFE. Not merely the quality of mine.

    This influences the way we work at searchHub and the kind of software and features we ultimately choose to introduce to the market. Daily, we find ourselves at a crossroads, asking: “this is cool but is it really going to help people communicate better with their customers?” If, after much deliberation, and analyses of the numbers, no compelling proof is found that our wonderful new toy will bring any significant return, it’s shelved. Simple.

    We move on.

    Let it die.

    Think for a moment about the myriad things you have been sold, content you have consumed, which ultimately provide zero value. There is a place for this type of content; for amusement and entertainment, to come down after work. I get it. However, this constant infatuation with entertainment, creates a greater need to restore balance within the order of healthy living, and work-life relationships.

    Let’s take this a step further: being able to understand and acknowledge our value, as humans, is essential to our success. We must prioritize time in our day to disconnect from all things digital, look at our loved ones, and honestly share our joy in our productivity. For this to be true, my days’ work must be useful, helpful, and valuable not only to those around me, but to myself as well.

    All We Have

    Real value is coveted and rare. The speed at which modern business develops grants us a kind of superpower to harness, process and analyze information like never before. We can use it to keep people glued to their screens, or to differentiate and pave the way for something more valuable for everyone involved. TIME. Time is best invested in quality relationships, sustainability, patience, perseverance, long-suffering, love. As society, at large, fills time with more and more screen time, we begin to lust after these age-old qualities.

    Find a way to package these things into your next product, release, or service. You won’t regret it. We don’t.

  • Hire or Higher to go Further?

    Hire or Higher to go Further?

    Social Equity and Economic Cohesion within Teams

    I love my team. Every one of them is exceptionally skilled and motivated. They have created an ingenious product providing immediate value for online experts within minutes. It provides a never seen quality and precision of search phrase matching, using an AI component that can easily be trained and optimized directly by users through a top of the line UI. Of course, we now want to grow and scale. But how do we hire to go higher?

    If you have a team of outstandingly responsible and efficient colleagues (and who does not want to have such a team?) there is ONE thing you can really do wrong: start hiring people of average skill level at higher salaries only to sacrifice to the God of Growth. As Joel Spolsky stated long before he launched stackoverflow.com:

    “It’s not just a matter of ‘10 times more productive.’ It’s that the ‘average, productive’ developer never hits the high notes that make great software.”

    Joel Spolsky – Stack Overflow

    Growing Pains – Getting Higher

    Don’t get me wrong: I consider growth one of the most important motivations when running a company. But growth is not an end in itself. Growth should increase, your business stability (being larger than your competitors can help a lot), efficiency, (the effect of scale generates more profit), and general profitability. In the end growth should increase the happiness of the team. And if happiness can be achieved better by hiring new people than by raising salaries, we will do that.

    Here at searchhub.io we regularly discuss whether we should hire new developers or pay higher salaries as soon as we realize that our profit has increased sustainably. Everyone is involved in this discussion and, naturally, has a profound interest in which direction we should go. Each team member involved in the daily business (development, devops, research, sales) must judge between increased fiscal freedom for the individual vs. less individual workload.

    Growth is a Social Responsibility – Hire Wisely

    But there is another component as well. We are intimately attached to this industry. When I look around our office during our morning „daily“, I see the faces of my colleagues, some of whom have worked together with me for over 20 years building this industry from the ground up. Because of this team of developers, ecommerce across Europe is more profitable, and makes more educated decisions about optimizations to their customers’ search journeys. It makes me proud to play my part!

    So expanding our team of developers, and researchers, Sales and Marketing is not simply about scaling our business. More importantly it’s about maintaining a certain ambition, an ethos, a secret knowledge if you will. My team creates real solutions for real problems that make a user’s or customer’s day more efficient and profitable. This is our secret sauce. This is what allows our team such a close connection to our customers.

    Site-Search is not a trivial topic. Few understand it in detail. Even fewer have spent their careers building this space. This is who my team is. And, again, I’m proud of them!

    Going Further – Together – to take us Higher

    To this end, we aim to move forward together. Expanding our influence in ecommerce search. To go “higher” – creating more financial freedom for each individual at searchHub along the way. To achieve this, each employee we hire has the marked responsibility to take our software and our customers, even higher.

    This is the backdrop that provides the framework for how we understand efficient business management and Fair Play salary design.