How are you indexing? There was a problem with indexing from SolrJ if you indexed documents in batches, server.add(doclist) that's fixed in 4.0 RC#. The work-around is to add docs singly, server.add(doc)
Second thing. Bad Things Happen if you don't have a _version_ field in your schema.xml. Solr 4.0 RC# isn't happy on startup if this field is missing... Personally, I think you'd be better off using one of the release candidates. Robert cut one here: http://people.apache.org/~rmuir/staging_area/lucene-solr-4.0RC1-rev1391144/solr/ There will be an RC2 sometime, a couple of problems have been found, but using RC1 should minimize any update to the official 4.0 plus have a lot of improvements over BETA... Best Erick On Fri, Oct 5, 2012 at 10:25 AM, David Quarterman <da...@corexe.com> wrote: > Hi, > > We've been using V4.x of SOLR since last November without too much > trouble. Our MySQL database is refreshed daily and a full import is run > automatically after the refresh and generally produces around 86,000 > products, obviously on unique doc_id's. > > > > So, we upgraded to 4.0 Beta a few days ago, with only mild difficulty, > reindexed and all was fine. Except after the next data refresh and > full-import, we had duplicate products appearing on different unique > doc_ids. Not all products are being duplicated, just random ones. We've > just deleted the data directory and reindexed and the product count has > dropped from 116,711 to 86,543. There'll be another refresh/import early > tomorrow morning and I fear we'll have more duplicates. > > > > The call to the import now contains clean=true, commit=true and > optimize=true but it seems to make no difference. > > > > Anyone have any ideas? > > > > Regards, > > > > David Q > > >