Re: Searching for escaped characters

2011-04-28 Thread Mike Sokolov
StandardTokenizer will have stripped punctuation I think. You might try searching for all the entity names though: (agrave | egrave | omacron | etc... ) The names are pretty distinctive. Although you might have problems with greek letters. -Mike On 04/28/2011 12:10 PM, Paul wrote: I'm tr

Searching for escaped characters

2011-04-28 Thread Paul
I'm trying to create a test to make sure that character sequences like "è" are successfully converted to their equivalent utf character (that is, in this case, "è"). So, I'd like to search my solr index using the equivalent of the following regular expression: &\w{1,6}; To find any escaped seque