Synonym mappings are an easy way to handle specific cases like these... C++ => cplusplus C# => csharp
-Yonik http://www.lucidimagination.com On Thu, Mar 26, 2009 at 9:27 AM, Jana, Kumar Raja <kj...@ptc.com> wrote: > Hi Leonardo, > 1. U can change the fieldtype to "string" in which case no tokenizers > will act on ur data and the content will be stored as is. > 2. If u are using Solr 1.4 (latest) then there is a provision to mention > protected words for WordDelimiterFilterFactory which will take care of > your issue. > > -Kumar > > -----Original Message----- > From: Leonardo Dias [mailto:leona...@catho.com.br] > Sent: Thursday, March 26, 2009 6:53 PM > To: solr-user@lucene.apache.org > Subject: How to search for "C++"? > > Hello there! > > Currently we're having a problem in here and we're looking for some > solutions. Right now we use the Standard Tokenizer to separate tokens > and we just found out that we cannot search for "c++" in our index > because it is not considered a word. > > Since we need this search to work properly (including a search for C#) > we'd like to know what are you guys doing when people search for words > that have symbols, like these programming languages. I thought there > could be a list of "protected words" in the standard tokenizer, so that > we could protect these tokens. Another possibility would be using the > Pattern Tokenizer, but it seems it is kinda slow when it comes to index > a huge amount of data, which is our case. > > What do you think the best solution would be? > > Best, > > Leonardo > > -- > > >