Re: Is there any special meaning for # symbol in solr.

Oliver Schihin Tue, 04 Sep 2012 01:05:30 -0700

You are not using a string type, but a TextField. And in your analysis chain,
standardtokenizer strips the number sign (or #). You can check this in the 
"analysis" part
of the solr backend.


You can either use a string type for seaches like C#, C++ and the like, or map 
the
characters to something textual *before* tokenizing. My solution goes something 
like this:

<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-chars.txt"/>
while mapping-chars.txt is:
*****************
# ########
# Specials
# ########

# C+ => Cplus
# C++ => Cplusplus
"\u0043\u002B" => "Cplus"
"\u0063\u002B" => "Cplus"
"\u0043\u002B\u002B" => "Cplusplus"
"\u0063\u002B\u002B" => "Cplusplus"

# C#, C♯ => Csharp
"\u0043\u0023" => "Csharp"
"\u0063\u0023" => "Csharp"
"\u0043\u266f" => "Csharp"
"\u0063\u266f" => "Csharp"

# F#, F♯ => Fsharp
"\u0046\u0023" => "Fsharp"
"\u0066\u0023" => "Fsharp"
"\u0046\u266f" => "Fsharp"
"\u0066\u266f" => "Fsharp"

# J#, J♯ => Jsharp
"\u004A\u0023" => "Jsharp"
"\u006A\u0023" => "Jsharp"
"\u004A\u266f" => "Jsharp"
"\u006A\u266f" => "Jsharp"

# ♭ => b
"\u266d" => "b"

# @ => at
"\u0040" => "at"
*******************************

Then use any tokenizer



-------- Original-Nachricht --------
Betreff: Re: Is there any special meaning for # symbol in solr.
Von: veena rani <veenara...@gmail.com>
An: solr-user@lucene.apache.org
CC: te <t...@statsbiblioteket.dk>
Datum: 04.09.2012 09:49

this is the field type i m using for techskill,

 <field name="techskill"   type="text_general"  indexed="true"
 stored="true" />

<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>


On Tue, Sep 4, 2012 at 1:16 PM, veena rani <veenara...@gmail.com> wrote:

No, # is not a stop word.


On Tue, Sep 4, 2012 at 12:59 PM, 李赟 <liyun2...@corp.netease.com> wrote:

Is "#" in your stop words list ?


2012-09-04



Li Yun
Software Engineer @ Netease
Mail: liyun2...@corp.netease.com
MSN: rockiee...@gmail.com




发件人： veena rani
发送时间： 2012-09-04  12:57:26
收件人： solr-user; te
抄送：
主题： Re: Is there any special meaning for # symbol in solr.

if i use this link ,
http://localhost:8080/solr/select?&q=(techskill%3Ac%23)
, solr is going to display techskill:c result.
But i want to display only techskill:c#  result.
On Mon, Sep 3, 2012 at 7:23 PM, Toke Eskildsen <t...@statsbiblioteket.dk

wrote:
On Mon, 2012-09-03 at 13:39 +0200, veena rani wrote:

 I have an issue with the # symbol, in solr,
 I m trying to search for string ends up with # , Eg:c#, it is

throwing

 error Like, org.apache.lucene.queryparser.classic.ParseException:

Cannot

 parse '(techskill:c': Encountered "<EOF>" at line 1, column 12.

Solr only received '(techskill:c', which has unbalanced parentheses.
My guess is that you do not perform a URL-encode of '#' and that you
were sending something like
http://localhost:8080/solr/select?&q=(techskill:c#)
when you should have been sending
http://localhost:8080/solr/select?&q=(techskill%3Ac%23)

--
Regards,
Veena.
Banglore.



--
Regards,
Veena.
Banglore.

Re: Is there any special meaning for # symbol in solr.

Reply via email to