Re: uniqueKey and custom fieldType

Erick Erickson Sun, 15 Aug 2010 09:21:45 -0700

The short answer is that unique keys should be s single
term. String types are guaranteed to be single, since they
aren't analyzed. Your SplitUpStuff type *does* analyze
terms, and can make multiple tokens out of single strings
via WordDelimterFactory.

A common error when thinking about the "string" the type is
not understanding that it is NOT analyzed. It's indexed as
a single term. So whey you define UniqueKey of type string,
it behaves as you expect. That is documents are updated if
the ID field matches exactly, case, spaces, order and all.

By introducing your "SplitUpStuff" type as UniqueKey, Well,
I don't even know what behavior I'd expect. And whatever
behavior I happened to observe would not be guaranteed to
be the behavior of the next release.

Consider what you're asking for and you can see why you
don't want to analyze your uniquekey field. Consider
the following simple text type (where each word is a term).
You have two values from two different docs
doc1: "this is a nice unique key"
doc2: "My Keys are Unique and Nice"

It's quite possible, with combinations of analyzers and stemmers
to index the exact same tokens, namely "nice", "unique" and "key"
for each document. Are these equivalent? Does order count?
Capitalization? It'd just be a nightmare to try to
explain/predict/implement.

Likely whatever behavior you do get is just whatever falls out of the
code. I'm not even sure any attempt is made to enforce uniqueness
on an analyzed field.

HTH
Erick

On Sun, Aug 15, 2010 at 11:59 AM, j <jta...@gmail.com> wrote:

> I guess another way to pose the question is- what could cause
> <uniqueKey>id</uniqueKey>   to no longer be respected?
>
>
> The last chance I made since I noticed the problem of non-unique docs
> was by changing field "title" from "string" to "SplitUpStuff". But I
> dont understand how that could affect the uniqueness of a different
> field called "id".
>
> <fieldType name="splitUpStuff" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer type="index">
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="0" c
>         <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords.txt"
>                 enablePositionIncrements="false"
>                />
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>
> </fieldType>
>
>
>
>
>
>
> In order to make even a guess, we'd have to see your new
> field type. Particularly its field definitions and the analysis
> chain...
>
> Best
> Erick
>
> On Fri, Aug 13, 2010 at 5:16 PM, j <jta...@gmail.com> wrote:
>
> > Does fieldType have any effect on the thing that I specify should be
> > unique?
> >
> > uniqueKey has been working for me up until recently. I change the
> > field that is unique from type "string" to a fieldType that I have
> > defined. Now when I do an update I get a newly created document (so
> > that I have duplicates).
> >
> > Has anyone else had this problem before?
> >
>

Re: uniqueKey and custom fieldType

Reply via email to