I'm trying to create a test to make sure that character sequences like "è" are successfully converted to their equivalent utf character (that is, in this case, "รจ").
So, I'd like to search my solr index using the equivalent of the following regular expression: &\w{1,6}; To find any escaped sequences that might have slipped through. Is this possible? I have indexed these fields with text_lu, which looks like this: <fieldtype name="text_lu" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldtype> Thanks, Paul