TABLE OF CONTENTS
- What is BM25?
- Example 1: Searching for “Blue Shirt”
- Example 2: Searching for “1 Inch Pipe”
- Why This Matters in Znode
Znode’s search functionality is powered by Elasticsearch, and by default, Elasticsearch uses the Okapi BM25 algorithm for ranking results. Understanding how BM25 works helps explain why some products appear higher in search results than others.
What is BM25?
BM25 is a ranking formula that decides which products best match a shopper’s search query. Instead of just checking if a keyword exists, BM25 scores each product by considering three important factors:
- Term Frequency (TF):
- The more times a search word appears in a product’s title or description, the more relevant that product seems.
- However, BM25 avoids overvaluing repetition. Mentioning “blue shirt” three times is better than once, but twenty times isn’t much better than three.
- Inverse Document Frequency (IDF):
- Rare words are given more importance than common ones.
- For example, in a product catalog, “shirt” appears often, so it carries less weight. “Blue,” being more specific, carries more weight in ranking.
- Document Length:
- Long descriptions naturally contain more words, so BM25 adjusts scores to avoid unfairly boosting them.
- A short product title like “Men’s Blue Cotton Shirt” may rank higher than a long description that only briefly mentions “blue shirt.”
Example 1: Searching for “Blue Shirt”
Here’s how BM25 would handle a simple ecommerce search query in Znode:
- Product A: “Blue Oxford Shirt”
- Short, exact match of both words.
- High score.
- Product B: “Blue Shirt for Men”
- Also contains both keywords, but “for” and “men” dilute the strength slightly since they don’t match the search terms.
- Still scores high, but usually below “Blue Oxford Shirt”.
- Product C: “This men’s blue shirt is made from soft cotton. The blue shirt design is versatile for casual or formal wear.”
- Contains “blue” and “shirt” multiple times.
- Strong score, but BM25 reduces the benefit of repetition after a few mentions.
- Product D: “Shirt available in black, white, and blue.”
- Both words appear, but less emphasis.
- Medium score.
- Product E: “Our store offers clothing like jackets, pants, and shirts in many colors including blue.”
- Very long, with only one mention each of “blue” and “shirt.”
- Low score due to diluted relevance.
Example 2: Searching for “1 Inch Pipe”
BM25 also works well for technical or numeric queries, such as when a shopper searches for “1 inch pipe.”
- Product A: “1 Inch Pipe”
- Exact match of all three terms, short and direct.
- Highest score.
- Product B: “1 Inch PVC Pipe”
- Matches all three terms, plus the extra word “PVC.”
- Still very strong, but slightly lower than “1 Inch Pipe” because the extra word dilutes the match a little.
- Product C: “1 1/2 Inch Pipe”
- Contains “1,” “inch,” and “pipe,” but “1 1/2” changes the meaning.
- BM25 doesn’t understand measurements — it only matches words and numbers — so this still ranks, but lower than the exact “1 inch” products.
- Product D: “1 1/2 Inch by 3/4 Inch Pipe”
- Contains “1,” “inch,” and “pipe,” but with several other numbers and words.
- Long and less focused, so it ranks lowest of the group.
Why This Matters in Znode
Because Znode uses Elasticsearch, and Elasticsearch uses BM25 as its standard ranking algorithm, every search in Znode benefits from this well-proven method of scoring.
This means product search results in Znode are:
- Relevant: Products that closely match the shopper’s query rise to the top.
- Balanced: Search isn’t tricked by keyword stuffing or long-winded product descriptions.
- Efficient: BM25 is fast and reliable, making it ideal for ecommerce search.