Ok, thanks. However I am still abit confused. Since I know that these
are only integers, can't I somehow make solr to use solr.IntField or
solr.SortableIntField, but still tokenize like this? I tried the
configuration below but changed TextField to IntField and indexed the
document again, but then the search didn't work...
This is what I use now (after your suggestion):
<fieldtype name="ids" class="solr.TextField">
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"/>
</analyzer>
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"/>
</analyzer>
</fieldtype>
This works great when searching. But when I get the document back, I
see that the stored value is still the comma separated values. ie:
...
<str name="articleCategory">3,5</str>
...
I would have liked it like this instead:
...
<str name="articleCategory">3</str>
<str name="articleCategory">5</str>
...
Is this possible with solr by some configuration? Am I really the only
one that would like this behaivor?
/Jimi
Quoting Otis Gospodnetic <[EMAIL PROTECTED]>:
I think you are after
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Saturday, May 3, 2008 11:57:37 PM
Subject: Tokenize integers?
Hi,
What is the recommended way to configure a fieldtype for a field that
looks like this in the source system?
categoryIds=1,325,488
The order of these id's are not important. I want to be able to fetch
all the id's, separately, ie I want them to be stored as multivalue, I
guess... And I also want to be able to search on the individual id's,
or combinations (for example search for all articles with category id
1 and 488).
I know I can index this as multiple categoryId fields (and have them
as int or sint type), but that means I need to write preprocessing on
the "client" side. I would prefer a server side fix, so that the
client can send the xml like this:
...
1,325,488
...
And then the server (ie solr) will transform this into a multivalue
int/sint field, using tokenizing or whatever it is called (or is
tokenizing not performed on the stored value?).
What are your suggestions? Maybe this is already documented in the
wiki or someplace else? I have searched for this, but not found
anything that helps.
Regards
/Jimi