Re: Records skipped when using DataImportHandler

Shalin Shekhar Mangar Fri, 05 Aug 2011 00:37:53 -0700

On Fri, Aug 5, 2011 at 3:38 AM, anand sridhar <anand.for...@gmail.com>wrote:


> Ok. After analysis, I narrowed the reduced results set to the fact that the
> zipcode field is not indexed 'as is'. i.e the zipcode field values are
> broken down into tokens and then stored. Hence, if there are 10 documents
> with zipcode fields varying from 91000-91009, then the zipcode fields are
> not stored as 91000, 91001 etc.. instead, the most common recurrences are
> grabbed together and stored as tokens  hence resulting in a reduced
> resultset.
> The net effect is I cannot search for a value like 91000  since its not
> stored as it is.
>
> I suspect this to do something with the type of field the zipcode is
> associated to. Right now , zipcode is a field of type text_general where
> the
> StandardTokenizerFactory may be breakign the values into tokens. However, I
> want to store them without tokenizing. Whats the best field type to do
> this.
> ?
>
> I already explored the String fieldtype which is supposed to store the
> values as is, but I see that the values are still being tokenized.
>
>
A change in the scheme will require re-indexing all documents. String types
are not tokenized. Also, what is your uniqueKey?

-- 
Regards,
Shalin Shekhar Mangar.

Re: Records skipped when using DataImportHandler

Reply via email to