Elasticsearch Definitions

Navigate to: Marketing & Site Search > Site Search

The following definitions are provided to help in understanding Znode Search.

Analyzer:

Used to share instructions that are given to Elasticsearch on the nitty-gritty of how the data should be indexed/stored.

Tokenizer: Use to decide how Elasticsearch will take a set of words and divide it into separated terms called “tokens”. Token generation depends on the type of token filter used

Character Filters:

Used to preprocess the stream of characters before it is passed to the tokenizer. A character filter receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters.

Ex: If the Character mapping is done as "- => &" and a user enters a text 3-3 it will be converted to 3&3 while displaying the results

Stemmer:

Used to reduce a word to its root form to ensure variants of a word match during a search.

Ex: walking and walked can be stemmed to the same root word: walk

Fuzziness:

Used to identify two elements of text, strings, or entries that are approximately similar but are not exactly the same

Ex: If a text is entered = the Cat result will also search for Mat, Bat, Rat, Sat, etc.

Ex: If a text entered = Black result will also search for Lack, Slack, etc.

Token Filter:

A token filter is an operation done on tokens that modifies them in some way or another

The following token filters would be used in Elasticsearch for Znode (base)

Lowercase:

Used to change token text to lowercase

Ex: If a text entered = “The Quick Brown Fox” it will be converted to “the quick brown fox”

Synonyms:

Used to create a list of words that can be used as an alternative and is saved in the synonym list, when a user enters a search keyword it is then compared with the synonym list to find the match before displaying the search results

Ex: If a text entered = “Brown Fox” and the synonym list has Fox= “coyote, dingo” then the search results would display products that have fox, coyote, dingo

Stopwords:

Used to ignore all stop words if found in the search keyword. Stop words are usually words like “to”, “I”, “has”, “the”, “be”, “or”, etc. They are filler words that help sentences flow better, but provide a very little context on their own.

Ex: If a text entered = “The Quick Brown Fox” it will be considered as “quick brown fox”

Shingle:

Used when tokens are to be generated using the concatenation of the adjacent tokens. Shingles are generally used to help speed up phrase queries

Ex: If a text entered = “The Quick Brown Fox” then tokens generated would be “the,” “the quick”, “quick” “quick brown“, “brown fox” and “fox”

Ngram:

Used to break a text into words, this is helpful to create tokens(set of words) that can be used to search the desired output in search results

Ex: If a text entered =“Quick fox” then tokens generated would be [ Q, Qu, u, ui, i, ic, c, ck, k, f, fo, o, ox, x ]