I won't guarantee this is the 'best algorithm', but here's what we use.  (This 
is in a final class with only static helper methods):

    // Set of characters / Strings SOLR treats as having special meaning in a 
query, and the corresponding Escaped versions.
    // Note that the actual operators '&&' and '||' don't show up here - we'll 
just escape the characters '&' and '|' wherever they occur.
    private static final String[] SOLR_SPECIAL_CHARACTERS = new String[] {"+", 
"-", "&", "|", "!", "(", ")", "{", "}", "[", "]", "^", "\"", "~", "*", "?", 
":", "\\"};
    private static final String[] SOLR_REPLACEMENT_CHARACTERS = new String[] 
{"\\+", "\\-", "\\&", "\\|", "\\!", "\\(", "\\)", "\\{", "\\}", "\\[", "\\]", 
"\\^", "\\\"", "\\~", "\\*", "\\?", "\\:", "\\\\"};


    /**
     * Escapes all special characters from the Search Terms, so they don't get 
confused with
     * the Solr query language special characters.
     * @param value - Search Term to escape
     * @return - escaped Search value, suitable for a Solr "q" parameter
     */
    public static String escapeSolrCharacters(String value)
    {
        return StringUtils.replaceEach(value, SOLR_SPECIAL_CHARACTERS, 
SOLR_REPLACEMENT_CHARACTERS);
    }

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com

> -----Original Message-----
> From: Bill Bell [mailto:billnb...@gmail.com]
> Sent: Sunday, September 25, 2011 12:22 AM
> To: solr-user@lucene.apache.org
> Subject: Best Solr escaping?
> 
> What is the best algorithm for escaping strings before sending to Solr?
> Does
> someone have some code?
> 
> A few things I have witnessed in "q" using DIH handler
> * Double quotes - " that are not balanced can cause several issues from
> an
> error (strip the double quote?), to no results.
> * Should we use + or %20 ­ and what cases make sense:
> > * "Dr. Phil Smith" or "Dr.+Phil+Smith" or "Dr.%20Phil%20Smith" - also
> what is
> > the impact of double quotes?
> * Unmatched parenthesis I.e. Opening ( and not closing.
> > * (Dr. Holstein
> > * Cardiologist+(Dr. Holstein
> Regular encoding of strings does not always work for the whole string
> due to
> several issues like white space:
> * White space works better when we use back quote "Bill\ Bell"
> especially
> when using facets.
> 
> Thoughts? Code? Ideas? Better Wikis?
> 
> 


Reply via email to