Elasticsearch – Creating an english analyzer with asciifolding
When using the english analyzer with my Elasticsearch index I also wanted to use the asciifolding filter to remove accents. I couldn’t find a simple way to do this so I ended up just replicating the existing english analyzer, and then adding asciifolding to the list of filters. See below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
module ItemIndex settings index: { # Replicates the english analyzer but simply adds asciifolding analysis: { filter: { english_stop: { type: "stop", stopwords: "_english_", }, english_stemmer: { type: "stemmer", language: "english", }, english_possessive_stemmer: { type: "stemmer", language: "possessive_english", }, }, analyzer: { english_with_folding: { tokenizer: "standard", filter: [ 'asciifolding', 'english_possessive_stemmer', 'lowercase', 'english_stop', 'english_stemmer', ], }, folding: { tokenizer: "standard", filter: [ "asciifolding" ], } }, } } do mapping do indexes :name, type: :string, analyzer: 'english_with_folding' end end end |