When using the english analyzer with my Elasticsearch index I also wanted to use the asciifolding filter to remove accents. I couldn’t find a simple way to do this so I ended up just replicating the existing english analyzer, and then adding asciifolding to the list of filters. See below:
module ItemIndex
settings index: {
# Replicates the english analyzer but simply adds asciifolding
analysis: {
filter: {
english_stop: {
type: "stop",
stopwords: "_english_",
},
english_stemmer: {
type: "stemmer",
language: "english",
},
english_possessive_stemmer: {
type: "stemmer",
language: "possessive_english",
},
},
analyzer: {
english_with_folding: {
tokenizer: "standard",
filter: [
'asciifolding',
'english_possessive_stemmer',
'lowercase',
'english_stop',
'english_stemmer',
],
},
folding: {
tokenizer: "standard",
filter: [ "asciifolding" ],
}
},
}
} do
mapping do
indexes :name, type: :string, analyzer: 'english_with_folding'
end
end
end