A more interesting use case: Analyzing text and finding a number, like the mean word length or the mean number of repeated words. These are standard tools for spam detection. To create these, we would want to shovel text into a text processing chain that creates an integer. We then want to both store that integer and index it. We don't want to store the shoveled text.
Solr does not now do this. I don't know if the Solr processing stack has this flexibility, or if it is worth adding it. Lance -----Original Message----- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Thursday, January 17, 2008 6:11 PM To: solr-user@lucene.apache.org Subject: Re: copyField limitation : But, the <copyField> directive in the schema has a limitation. It will only : copy data between fields with the same type. If the two fields are a : different type, the copy is ignored. This example would require <copyField> : to translate 'sint' to 'integer'. i can't reproduce this problem. with the following additions to the example schema... <field name="popularityI" type="integer" indexed="true" stored="true" default="0"/> ... <copyField source="popularity" dest="popularityI"/> ...i was able to see, sort, and search on the popularityI field with no problems. : Another case is days (not times): ... : This would express the date as a string 2008-xx-xxT00:00:00Z and store that : into the day field. It is not as optimal as using '2008-xx-xx' but is still : useful for wildcards. ... I'm not entirely sure i understand wht you are asking ... but i believe your point is that there is no easy way to do a copyFiled that reformats the data (ie: changing date formats, or converting the date to an int) In my opinion, this class of situations isn't a limitation of copyField as much as it is a silly restriction in the way FieldTypes are handled by IndexSchema ... currently "TextField" is a special case because it's hte only FieldType that can have an analyzer (i'm not even sure where this special case logic is ... i thought it was when the INdexSchema is initialized, but i can't find it now) It would be nice if any FieldType could have an analyzer, and as long as th token(s) produced by that analyzer met the neccessary conditions for the data type, things would go on their merry way ... DateReFormatFilter's could be used to convert from any arbitray date format to the one Solr expects, etc.... you could have have a detailedDate field and <copyField> from that to a justDate string field that used a PatternReplaceFilter to strip off the time. This still wouldn't help change the "stored" value of those fields though so that the data would look right when retrieving stored values. Perhaps we should add an optional hook for mutating the "stored" value of a fieldtype as well? ... it could be an Analyzer (ie: tokenizer+filterchain) so that we get reuse of existing concepts, with each resulting token being treated as a seperate multivalue (for the common case of rejoining all the tokens into a single string, we can add a StringBufferConcatTokenFilter or something) ? -Hoss