Just use fieldType="string", and send them to solr in a multivalued
fashion:
<doc><field name="blah">1</field><field name="blah">133</field><field
name="blah">999</field></doc>
Search:
blah:133
+blah:999 +blah:1 [both must match]
Just treat the numbers as untokenized text.
-Mike
On 4-May-08, at 2:30 AM, [EMAIL PROTECTED] wrote:
Ok, thanks. However I am still abit confused. Since I know that
these are only integers, can't I somehow make solr to use
solr.IntField or solr.SortableIntField, but still tokenize like
this? I tried the configuration below but changed TextField to
IntField and indexed the document again, but then the search didn't
work...
This is what I use now (after your suggestion):
<fieldtype name="ids" class="solr.TextField">
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"/>
</analyzer>
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"/>
</analyzer>
</fieldtype>
This works great when searching. But when I get the document back, I
see that the stored value is still the comma separated values. ie:
...
<str name="articleCategory">3,5</str>
...
I would have liked it like this instead:
...
<str name="articleCategory">3</str>
<str name="articleCategory">5</str>
...
Is this possible with solr by some configuration? Am I really the
only one that would like this behaivor?
/Jimi
Quoting Otis Gospodnetic <[EMAIL PROTECTED]>:
I think you are after
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Saturday, May 3, 2008 11:57:37 PM
Subject: Tokenize integers?
Hi,
What is the recommended way to configure a fieldtype for a field
that
looks like this in the source system?
categoryIds=1,325,488
The order of these id's are not important. I want to be able to
fetch
all the id's, separately, ie I want them to be stored as
multivalue, I
guess... And I also want to be able to search on the individual
id's,
or combinations (for example search for all articles with category
id
1 and 488).
I know I can index this as multiple categoryId fields (and have them
as int or sint type), but that means I need to write preprocessing
on
the "client" side. I would prefer a server side fix, so that the
client can send the xml like this:
...
1,325,488
...
And then the server (ie solr) will transform this into a multivalue
int/sint field, using tokenizing or whatever it is called (or is
tokenizing not performed on the stored value?).
What are your suggestions? Maybe this is already documented in the
wiki or someplace else? I have searched for this, but not found
anything that helps.
Regards
/Jimi