Searching for escaped characters

Paul Thu, 28 Apr 2011 09:10:59 -0700

I'm trying to create a test to make sure that character sequences like
"&egrave;" are successfully converted to their equivalent utf
character (that is, in this case, "è").


So, I'd like to search my solr index using the equivalent of the
following regular expression:

&\w{1,6};

To find any escaped sequences that might have slipped through.

Is this possible? I have indexed these fields with text_lu, which
looks like this:

   <fieldtype name="text_lu" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldtype>

Thanks,
Paul

Searching for escaped characters

Reply via email to