RE: Getting field information inside a Tokenizer

Steven A Rowe Tue, 03 May 2011 11:35:21 -0700

Hi FMC,

On 5/3/2011 at 12:37 PM, FatMan Corp wrote:
> Hi, I would like to get another's field information for the same document
> within a Tekonizer class.
> How can this be achieved?


Use <copyField>s in your schema 
<http://wiki.apache.org/solr/SchemaXml#Copy_Fields>, and associate different 
analysis pipelines with each field.  Each field's analysis pipeline will be fed 
the original raw text.

Presently Lucene's analysis pipeline is single-field only: you have to create 
separate analysis pipelines for each field, with an extra pass over the 
original text for each field. I personally think Lucene should provide 
multi-field analysis capabilities, but this would not be a simple change.  Even 
if Lucene does eventually gain this capability, modifying Solr to expose it 
would be an added layer of complexity, and given that <copyField> already 
exists as a workaround, there may be little motivation to do so.

Some of the use cases full multi-field analysis could serve are already handled 
in Lucene (but not yet in Solr) by TeeSinkTokenFilter 
<http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/analysis/TeeSinkTokenFilter.html>.
  An enterprising Lucene user could write a single-pass tokenizer that emits 
tokens with one type per target field, then employ one TeeSinkTokenFilter per 
field to approximate full multi-field analysis.  Adding TeeSinkTokenFilter 
support to Solr, though, would require substantial changes to Solr's code and 
schema format (schema schema?).

Steve

> -----Original Message-----
> From: FatMan Corp [mailto:fatmanc...@gmail.com]
> Sent: Tuesday, May 03, 2011 12:37 PM
> To: solr-user@lucene.apache.org
> Subject: Getting field information inside a Tokenizer
> 
> Hi, I would like to get another's field information for the same document
> within a Tekonizer class.
> How can this be achieved?
> 
> Thanks

RE: Getting field information inside a Tokenizer

Reply via email to