Quick and easy way to support stop words in Elasticsearch

A quick and easy way to support searching both with and without stop words in your Elasticsearch index is to use the Search Quote Analyzer in combination with Elastics built in stop words list.

The search_quote_analyzer setting allows you to specify an analyzer for phrases, this is particularly useful when dealing with disabling stop words for phrase queries.

In order to do this we need to define the stop word filter and two analyzers in our index settings; one that will keep the stop words and one that will remove them.

First you need to set up your index settings. I have created a stop words filter (stopwords_no) that is using the standard Norwegian stop words provided by Elasticsearch. I have also defined two analyzers; default is the default analyzer for my index that will be used at index time (this is indexing the stop words), and default_search which is the default search analyzer for my index (this will remove stopwords).

{
 "index": {
 "analysis": {
 "filter": {
 "stopwords_no": {
 "type": "stop",
 "stopwords": "_norwegian_"
 },
 "mainstemmer_no": {
 "type": "stemmer",
 "language": "norwegian"
 }
 },
 "analyzer": {
 "default": {
 "filter": [
 "standard",
 "lowercase",
 "mainstemmer_no"
 ],
 "char_filter": [
 "html_strip"
 ],
 "type": "custom",
 "tokenizer": "standard"
 },
 "default_search": {
 "filter": [
 "standard",
 "lowercase",
 "stopwords_no",
 "mainstemmer_no"
 ],
 "char_filter": [
 "html_strip"
 ],
 "type": "custom",
 "tokenizer": "standard"
 }
 }
 },
 "number_of_shards": "2",
 "number_of_replicas": "1"
 }
}

Then you need to set up your types mappings. In the following mapping of my type I have added the search_quote_analyzer on the title property and the content property. The search_quote_analyzer is using the default analyzer that I defined in the index settings, since this is the analyzer that will keep the stop words. Even though the analyzers are named “default” and “default_search” in the index settings, when defining a search_quote_analyzer on a property you have to define the analyzer and search_analyzer as well.

 

{
 "properties": {
 "keywords": {
 "type": "string"
 },
 "title": {
 "type": "string",
 "search_quote_analyzer": "default",
 "analyzer": "default",
 "search_analyzer": "default_search"
 },
 "content": {
 "type": "string",
 "search_quote_analyzer": "default",
 "analyzer": "default",
 "search_analyzer": "default_search"
 },
 "id": {
 "type": "string"
 },
 "productId": {
 "norms": {
 "enabled": true
 },
 "index": "not_analyzed",
 "type": "string"
 },
 "created": {
 "format": "dateOptionalTime",
 "type": "date"
 }
 }
}

The final step is to alter your queries. This is very simple; by default stop words are being removed and ignored when searching in your index, but if you enclose the query in quotes, the stop words will not be removed. In the example below you ignore the stop words in the match query; meaning that if someone is searching for “the brown fox” the search engine will strip away “the” and only search for “brown” and “fox”, but you keep the stop words in the match_phrase query; meaning that searching for “the brown fox” will have to match the whole phrase since no stop words are removed.

{
 "query": {
 "bool": {
 "should": [
 {
 "match": {
 "title": {
 "query": "the brown fox",
 "operator": "and"
 }
 }
 },
 {
 "match_phrase": {
 "title": {
 "query": "\"the brown fox\""
 }
 }
 },
 {
 "match": {
 "content": {
 "query": "the brown fox",
 "operator": "and"
 }
 }
 },
 {
 "match_phrase": {
 "content": {
 "query": "\"the brown fox\""
 }
 }
 }
 ]
 }
 }
}

This will ensure that if you have any documents where the exact phrase “the brown fox” is present they will appear in the search result, but you will also get documents that contains the words “brown” and “fox” in your search result. If the document matches the match_phrase query it will also match the match query, but not the other way around. This will result in a higher score for documents matching both queries, than for documents matching only the match query.

Leave a comment