On 6/19/2014 4:51 PM, Huang, Roger wrote:
> If I have documents with a person and his email address: 
> u...@domain.com<mailto:u...@domain.com>
>
> How can I configure Solr (4.6) so that the email address source field is 
> indexed as
>
> -          the user part of the address (e.g., "user") is in Lucene index X
>
> -          the domain part of the address (e.g., "domain.com") is in a 
> separate Lucene index Y
>
> I would like to be able search as follows:
>
> -          Find all people whose email addresses have user part = "userXyz"
>
> -          Find all people whose email addresses have domain part = 
> "domainABC.com"
>
> -          Find the person with exact email address = "user...@domainabc.com"
>
> Would I use a <copyField> declaration in my schema?
> http://wiki.apache.org/solr/SchemaXml#Copy_Fields

I don't think you actually want the data to end up in entirely different
indexes.  Although it is possible to search more than one separate
index, that's very likely NOT what you want to do, and it comes with its
own challenges.  What you most likely want is to put this data into
different fields within the same index.

You'll need to write custom code to accomplish this, especially if you
need the stored data to contain only the parts rather than the complete
email address.  A copyField can get the data to additional fields, but
I'm not aware of anything built-in to the schema that can trim the
unwanted information from the new fields, and even if there is, any
stored data will be the original data for all three fields.  It's up to
you whether this custom code is in a user application that does your
indexing or in a custom update processor that you load as a plugin to
Solr itself.  Extending whatever user application you are already using
for indexing is very likely to be a lot easier.

Thanks,
Shawn

Reply via email to