RE: A schema inside a Solr Schema (Schema in a can)

Dyer, James Fri, 17 Dec 2010 09:45:50 -0800

There's also one "gotcha" we've experienced when searching acrosse multi-valued 
fields:  SOLR will match across field occurences.  In the example below, if you 
were to search q=contrib_name:(james AND smith), you will get this record back. 
 It matches one name from one contributor and another name from a different 
contributor.  This is not what our users want.


As a work-around, I am converting these to phrase queries with slop:  "james 
smith"~50 ... Just use a slop # smaller than your positionIncrementGap and 
bigger than the # of terms entered.  This will prevent the cross-field matches 
yet allow the words to occur in any order.  

The problem with this approach is that Lucene doesn't support wildcards in 
phrases.  Unlucky for us, because our app automatically adds a wildcard to 
every term entered in Contributor searching.  So when we convert to SOLR we 
will have to disable this "feature" for multi-word queries.  I experimented 
with the double metaphone filter (too many false positive matches) and edge 
n-gram filter (could make the index very big) to alleviate this loss of 
functionality.  Currently I have it set up to index each name as the full name 
plus the first initial.  (so "j dyer" would match but not "ja dyer") If this is 
considered not-good-enough, we can probably see about doing the edge n-grams 
several characters out...  

If anyone else has any other ideas I should try, please do speak up.  Thank you.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Dyer, James 
Sent: Friday, December 17, 2010 10:59 AM
To: solr-user@lucene.apache.org
Subject: RE: A schema inside a Solr Schema (Schema in a can)

Dennis,

I may be misunderstanding your question, but think I've just worked through 
something similar.  We're indexing book metadata, and a book can have more than 
one Contributor.  We want to store both the contributor's name, their Role and 
their id (from our rel db).  With our old system, we had to do something like 
this:

contrib:  dyer, james|author|123
contrib:  smith, sam|editor|456

But Lucene/Solr will guanantee that multivalued fields return in exactly the 
same order you put them in.  So with SOLR we can do this:

contrib_name: dyer, james
contrib_name: smith, sam
contrib_role: author
contrib_role: editor
contrib_id:123
contrib_id:456

The trick is to be very careful you put everything in the same order (its easy 
if it is all from the same SQL query from an relational database).  If one of 
the data elements is a NULL you have to use a placeholder (like an empty string 
or a zero).

Another option is use a dynamic field:

contrib_123: dyer, james
contrib_456: smith, sam

The problem here is if you want to display and use a fieldlist (fl=), you 
cannot use wildcards (ex: fl=contrib_* doesn't work).  Same for searching (q=, 
qf=).  You can only use dynamic fields if you know the fieldname at runtime you 
need to deal with.

Both of these options might be more work for your app to deal than the 
delimiter approach.  And, in our case, we could stick with the delimiter field 
and store it and then have a separate indexed field that just has the name (as 
this is all we search on).  You could even just have 1 field if you used a 
fancy analysis sequence that would only index the element(s) you wanted 
indexes...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Dennis Gearon [mailto:gear...@sbcglobal.net] 
Sent: Friday, December 17, 2010 12:43 AM
To: solr-user@lucene.apache.org
Subject: A schema inside a Solr Schema (Schema in a can)

Is it possible to put name value pairs of any type in a native Solr Index field 
type? Like JSON/XML/YML?

The reason that I ask, since you asked, is I want my main index schema to be a 
base object, and another multivalue column to be the attributes of base object 
inherited descendants. 

Is there any other way to do this?

What are the limitations in searching and indexing documents with multivalue 
fields?

Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.

RE: A schema inside a Solr Schema (Schema in a can)

Reply via email to