RE: How to index and query "C#" as whole term?

Robert Petersen Mon, 16 May 2011 08:44:54 -0700

I have always just converted terms like 'C#' or 'C++' into 'csharp' and
'cplusplus' before indexing them and similarly converted those terms if
someone searched on them.  That always has worked just fine for me...
:)


-----Original Message-----
From: Jonathan Rochkind [mailto:[email protected]] 
Sent: Monday, May 16, 2011 8:28 AM
To: [email protected]
Subject: Re: How to index and query "C#" as whole term?

I don't think you'd want to use the string type here. String type is 
almost never appropriate for a field you want to actually search on (it 
is appropriate for fields to facet on).

But you may want to use Text type with different analyzers selected.  
You probably want Text type so the value is still split into different 
tokens on word boundaries; you just don't want an analyzer set that 
removes punctuation.

On 5/16/2011 10:46 AM, Gora Mohanty wrote:
> On Mon, May 16, 2011 at 7:05 PM, Gnanakumar<[email protected]>  wrote:
>> Hi,
>>
>> I'm using Apache Solr v3.1.
>>
>> How do I configure/allow Solr to both index and query the term "c#"
as a
>> whole word/term?  From "Analysis" page, I could see that the term
"c#" is
>> being reduced/converted into just "c" by
solr.WordDelimiterFilterFactory.
> [...]
>
> Yes, as you have discovered the analyzers for the field type in
> question will affect the values indexed.
>
> To index "c#" exactly as is, you can use the "string" type, instead
> of the "text" type. However, what you probably want some filters
> to be applied, e.g., LowerCaseFilterFactory. Take a look at the
> definition of the fieldType "text" in schema.xml, define a new field
> type that has only the tokenizers and analyzers that you need, and
> use that type for your field. This Wiki page should be helpful:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>
> Regards,
> Gora
>

RE: How to index and query "C#" as whole term?

Reply via email to