Elasticsearch – Creating an english analyzer with asciifolding

By | August 3, 2016

When using the english analyzer with my Elasticsearch index I also wanted to use the asciifolding filter to remove accents. I couldn’t find a simple way to do this so I ended up just replicating the existing english analyzer, and then adding asciifolding to the list of filters. See below:

module ItemIndex
  settings index: {
    # Replicates the english analyzer but simply adds asciifolding
    analysis: {
      filter: {
        english_stop: {
          type:       "stop",
          stopwords:  "_english_",
        },
        english_stemmer: {
          type:       "stemmer",
          language:   "english",
        },
        english_possessive_stemmer: {
          type:       "stemmer",
          language:   "possessive_english",
        },
      },
      analyzer: {
        english_with_folding: {
          tokenizer:  "standard",
          filter: [
            'asciifolding',
            'english_possessive_stemmer',
            'lowercase',
            'english_stop',
            'english_stemmer',
          ],
        },
        folding: {
          tokenizer: "standard",
          filter:  [ "asciifolding" ],
        }
      },
    }
  } do
    mapping do
      indexes :name, type: :string, analyzer: 'english_with_folding'
    end
  end
end

Leave a Reply

Your email address will not be published. Required fields are marked *