Re: copyField limitation

Grant Ingersoll Wed, 23 Jan 2008 14:40:24 -0800

This may be possible to do with Lucene's new SinkTokenizer/TeeTokenFilter functionality. You might find http://www.mail-archive.com/[EMAIL PROTECTED]/msg06863.htmluseful in that context. Also, search the Lucene dev list fordiscussion.


-Grant


On Jan 22, 2008, at 3:13 PM, Lance Norskog wrote:

A more interesting use case:
Analyzing text and finding a number, like the mean word length orthe meannumber of repeated words. These are standard tools for spamdetection. Tocreate these, we would want to shovel text into a text processingchain thatcreates an integer. We then want to both store that integer andindex it. We
don't want to store the shoveled text.
Solr does not now do this. I don't know if the Solr processing stackhas
this flexibility, or if it is worth adding it.

Lance

-----Original Message-----
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 17, 2008 6:11 PM
To: [email protected]
Subject: Re: copyField limitation
: But, the <copyField> directive in the schema has a limitation. Itwill
only
: copy data between fields with the same type. If the two fields are a
: different type, the copy is ignored. This example would require
<copyField>
: to translate 'sint' to 'integer'.
i can't reproduce this problem. with the following additions to theexample
schema...
<field name="popularityI" type="integer" indexed="true"stored="true"
default="0"/>
  ...
  <copyField source="popularity" dest="popularityI"/>
...i was able to see, sort, and search on the popularityI field withno
problems.

: Another case is days (not times):
        ...
: This would express the date as a string 2008-xx-xxT00:00:00Z andstore
that
: into the day field. It is not as optimal as using '2008-xx-xx' butis
still
: useful for wildcards.
        ...
I'm not entirely sure i understand wht you are asking ... but ibelieve yourpoint is that there is no easy way to do a copyFiled that reformatsthe data
(ie: changing date formats, or converting the date to an int)
In my opinion, this class of situations isn't a limitation ofcopyField as
much as it is a silly restriction in the way FieldTypes are handled by
IndexSchema ... currently "TextField" is a special case because it'shte
only FieldType that can have an analyzer (i'm not even sure where this
special case logic is ... i thought it was when the INdexSchema is
initialized, but i can't find it now)
It would be nice if any FieldType could have an analyzer, and aslong as thtoken(s) produced by that analyzer met the neccessary conditions forthedata type, things would go on their merry way ...DateReFormatFilter's couldbe used to convert from any arbitray date format to the one Solrexpects,etc.... you could have have a detailedDate field and <copyField>from thatto a justDate string field that used a PatternReplaceFilter to stripoff the
time.
This still wouldn't help change the "stored" value of those fieldsthough so
that the data would look right when retrieving stored values.
Perhaps we should add an optional hook for mutating the "stored"value of a
fieldtype as well?  ... it could be an Analyzer (ie:
tokenizer+filterchain) so that we get reuse of existing concepts, with
each resulting token being treated as a seperate multivalue (for thecommon
case of rejoining all the tokens into a single string, we can add a
StringBufferConcatTokenFilter or something)

        ?


-Hoss


--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Re: copyField limitation

Reply via email to