Just use fieldType="string", and send them to solr in a multivalued fashion:

<doc><field name="blah">1</field><field name="blah">133</field><field name="blah">999</field></doc>

Search:

blah:133
+blah:999 +blah:1 [both must match]

Just treat the numbers as untokenized text.

-Mike


On 4-May-08, at 2:30 AM, [EMAIL PROTECTED] wrote:

Ok, thanks. However I am still abit confused. Since I know that these are only integers, can't I somehow make solr to use solr.IntField or solr.SortableIntField, but still tokenize like this? I tried the configuration below but changed TextField to IntField and indexed the document again, but then the search didn't work...

This is what I use now (after your suggestion):

   <fieldtype name="ids" class="solr.TextField">
     <analyzer type="query">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.WordDelimiterFilterFactory"/>
     </analyzer>
     <analyzer type="index">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.WordDelimiterFilterFactory"/>
     </analyzer>
   </fieldtype>

This works great when searching. But when I get the document back, I see that the stored value is still the comma separated values. ie:

...
<str name="articleCategory">3,5</str>
...

I would have liked it like this instead:

...
<str name="articleCategory">3</str>
<str name="articleCategory">5</str>
...

Is this possible with solr by some configuration? Am I really the only one that would like this behaivor?

/Jimi

Quoting Otis Gospodnetic <[EMAIL PROTECTED]>:

I think you are after  
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089

Otis

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Saturday, May 3, 2008 11:57:37 PM
Subject: Tokenize integers?

Hi,

What is the recommended way to configure a fieldtype for a field that
looks like this in the source system?

categoryIds=1,325,488

The order of these id's are not important. I want to be able to fetch all the id's, separately, ie I want them to be stored as multivalue, I guess... And I also want to be able to search on the individual id's, or combinations (for example search for all articles with category id
1 and 488).

I know I can index this as multiple categoryId fields (and have them
as int or sint type), but that means I need to write preprocessing on
the "client" side. I would prefer a server side fix, so that the
client can send the xml like this:

...
1,325,488
...

And then the server (ie solr) will transform this into a multivalue
int/sint field, using tokenizing or whatever it is called (or is
tokenizing not performed on the stored value?).

What are your suggestions? Maybe this is already documented in the
wiki or someplace else? I have searched for this, but not found
anything that helps.

Regards
/Jimi







Reply via email to