Re: SOLR 4.0 Beta documents being duplicated

Erick Erickson Fri, 05 Oct 2012 07:40:44 -0700

How are you indexing? There was a problem with indexing from SolrJ
if you indexed documents in batches, server.add(doclist) that's fixed in
4.0 RC#. The work-around is to add docs singly, server.add(doc)

Second thing. Bad Things Happen if you don't have a _version_ field
in your schema.xml. Solr 4.0 RC# isn't happy on startup if this field
is missing...

Personally, I think you'd be better off using one of the release candidates.
Robert cut one here:
http://people.apache.org/~rmuir/staging_area/lucene-solr-4.0RC1-rev1391144/solr/

There will be an RC2 sometime, a couple of problems have been found,
but using RC1 should minimize any update to the official 4.0 plus have
a lot of improvements over BETA...

Best
Erick

On Fri, Oct 5, 2012 at 10:25 AM, David Quarterman <da...@corexe.com> wrote:
> Hi,
>
> We've been using V4.x of SOLR since last November without too much
> trouble. Our MySQL database is refreshed daily and a full import is run
> automatically after the refresh and generally produces around 86,000
> products, obviously on unique doc_id's.
>
>
>
> So, we upgraded to 4.0 Beta a few days ago, with only mild difficulty,
> reindexed and all was fine. Except after the next data refresh and
> full-import, we had duplicate products appearing on different unique
> doc_ids. Not all products are being duplicated, just random ones. We've
> just deleted the data directory and reindexed and the product count has
> dropped from 116,711 to 86,543. There'll be another refresh/import early
> tomorrow morning and I fear we'll have more duplicates.
>
>
>
> The call to the import now contains clean=true, commit=true and
> optimize=true but it seems to make no difference.
>
>
>
> Anyone have any ideas?
>
>
>
> Regards,
>
>
>
> David Q
>
>
>

Re: SOLR 4.0 Beta documents being duplicated

Reply via email to