I think my problem has been solved using <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/> (for whitespaces and html tag and
<filter class="solr.PatternReplaceFilterFactory" pattern="([^a-zA-Z0-9])" replacement="" replace="all" /> (for all non alphanumeric chars) it's true? ________________________________ Da: Antonio Zippo <[EMAIL PROTECTED]> A: solr-user@lucene.apache.org Inviato: Venerdì 28 novembre 2008, 17:27:30 Oggetto: PatternReplaceFilterFactory and html tag Hi all, i've a text field with some html code ex. "blablabla <p>hi this is a paragraph</p> aaaa bbb" i need to exclude theese tag into the index or query so i think i need to use a PatternReplaceFilterFactory this filter is to exclude all chars different from a-zA-Z0-9 (so i can exclude punctuation, etc.) <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-zA-Z0-9])" replacement="" replace="all" /> but i need to add a replace for "<p>", "</p>", "<br/>", "<br />", etc... could anyone help me to use the right pattern? thanks in advance Zippo