I won't guarantee this is the 'best algorithm', but here's what we use. (This is in a final class with only static helper methods):
// Set of characters / Strings SOLR treats as having special meaning in a query, and the corresponding Escaped versions. // Note that the actual operators '&&' and '||' don't show up here - we'll just escape the characters '&' and '|' wherever they occur. private static final String[] SOLR_SPECIAL_CHARACTERS = new String[] {"+", "-", "&", "|", "!", "(", ")", "{", "}", "[", "]", "^", "\"", "~", "*", "?", ":", "\\"}; private static final String[] SOLR_REPLACEMENT_CHARACTERS = new String[] {"\\+", "\\-", "\\&", "\\|", "\\!", "\\(", "\\)", "\\{", "\\}", "\\[", "\\]", "\\^", "\\\"", "\\~", "\\*", "\\?", "\\:", "\\\\"}; /** * Escapes all special characters from the Search Terms, so they don't get confused with * the Solr query language special characters. * @param value - Search Term to escape * @return - escaped Search value, suitable for a Solr "q" parameter */ public static String escapeSolrCharacters(String value) { return StringUtils.replaceEach(value, SOLR_SPECIAL_CHARACTERS, SOLR_REPLACEMENT_CHARACTERS); } Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com > -----Original Message----- > From: Bill Bell [mailto:billnb...@gmail.com] > Sent: Sunday, September 25, 2011 12:22 AM > To: solr-user@lucene.apache.org > Subject: Best Solr escaping? > > What is the best algorithm for escaping strings before sending to Solr? > Does > someone have some code? > > A few things I have witnessed in "q" using DIH handler > * Double quotes - " that are not balanced can cause several issues from > an > error (strip the double quote?), to no results. > * Should we use + or %20 and what cases make sense: > > * "Dr. Phil Smith" or "Dr.+Phil+Smith" or "Dr.%20Phil%20Smith" - also > what is > > the impact of double quotes? > * Unmatched parenthesis I.e. Opening ( and not closing. > > * (Dr. Holstein > > * Cardiologist+(Dr. Holstein > Regular encoding of strings does not always work for the whole string > due to > several issues like white space: > * White space works better when we use back quote "Bill\ Bell" > especially > when using facets. > > Thoughts? Code? Ideas? Better Wikis? > >