Re: Bug ? unique id

Maarten . De . Vilder Mon, 19 Mar 2007 01:50:55 -0800

thanks for your reply... it kind of solved our problem !

we were in fact using Tokenizers that produce multiple tokens ... 
so i guess there is no other way for us then to use the copyfield 
workaround.


it would maybe be a good idea to have Lucene check the *stored* value for 
duplicate keys ... that seems so much more logical to me !
(imho, it makes no sense to check the *indexed* value for duplicate keys, 
but maybe there is a reason ?)
or maybe give us the option to choose wether Lucene should check the 
*stored* or *indexed* value for duplicate keys.

it is really confusing to get duplicate unique key *stored* values back 
from the server .... (and kind of frustrating)

since we now use a copyfield to perform searches on the IDs, there is no 
more reason to index our unique key field ....
what would happen if I set indexed=false on my unique id field ??

Maarten :-)





Chris Hostetter <[EMAIL PROTECTED]> 
16/03/2007 19:14
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: Bug ? unique id







: but can someone please answer my question :'(
: is it illegal to put filters on the unique id ?
: or is it a bug that we get duplicate id's?
: or is this a know issue (since everybody is using copyfields?)

there's nothing illegal about using an Analyzer on your uniqueKey, but you
have to ensure that your Analyzer:
  1) never produces multiple tokens (ie: KeywordTokenizer is fine)
  2) never produces duplicate output for differnet (legal) input.

...so if your dataset can legally contain two different documnets
whose keys are "foo bar" and "Foo Bar" you certianly wouldn't want
to use a Whitspace or StandardTokenizer -- but you also wouldn't ever want
to use the LowerCaseFilter.

If however you really wanted to ignore all punctuation in keys when
clients upload documents to you, and trust that doc "1234-56-7890" is the
same as doc "1234567890" then something liek hte pattern striping filter
would be fine.


the thing to understnad is that it's the *indexed* value of the uniqueKey
that must be unique in order for Solr to do things properly ... it has to
be able to search on that uniqueKey term to delete/replace a doc properly.


-Hoss

Re: Bug ? unique id

Reply via email to