WARNING: Version 6.2 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
HTML Strip Char Filteredit
The html_strip
character filter strips HTML elements from the text and
replaces HTML entities with their decoded value (e.g. replacing &
with
&
).
Example outputedit
POST _analyze { "tokenizer": "keyword", "char_filter": [ "html_strip" ], "text": "<p>I'm so <b>happy</b>!</p>" }
The |
The above example returns the term:
[ \nI'm so happy!\n ]
The same example with the standard
tokenizer would return the following terms:
[ I'm, so, happy ]
Configurationedit
The html_strip
character filter accepts the following parameter:
|
An array of HTML tags which should not be stripped from the original text. |
Example configurationedit
In this example, we configure the html_strip
character filter to leave <b>
tags in place:
PUT my_index { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "keyword", "char_filter": ["my_char_filter"] } }, "char_filter": { "my_char_filter": { "type": "html_strip", "escaped_tags": ["b"] } } } } } POST my_index/_analyze { "analyzer": "my_analyzer", "text": "<p>I'm so <b>happy</b>!</p>" }
The above example produces the following term:
[ \nI'm so <b>happy</b>!\n ]