DIH deleting documents
I am adding documents with data import handler from a mysql database. I create a unique id for each document by concatenating a couple of fields in the database. Every id is unique. After the import, over half the documents which were imported are deleted again, leaving me with less then half the documents in the database ending up in the Solr index. Is there a way to get a list of the deleted documents, so that I can start troubleshooting what went wrong? thanks, Csaba -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041809.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH deleting documents
Thanks Gora, Sorry I might not have been sufficiently clear. I start with an empty index, then add documents. 9000 are added and 6000 immediately deleted again, leaving 3000. I assume this can only happen with duplicate IDs, but that should not be possible! So I wanted to get a list of deleted documents so that I could try and figure out why they were deleted immediately. thanks, Csaba -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041811p4041887.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH deleting documents
Hi Gora and Arcadius, Thanks for your help. I'll try and answer both your questions here. I am interested in three database tables. "Book" contains information about books, "page" has the content of each book page by page, and "chapter" contains the title of each chapter in every book, and the page on which the chapter begins. It is a bit of a mess because I need the contents of each chapter in every book, but I have to infer which pages each chapter contains by its page number. So there is quite a complex query. There are 8764 rows in the chapter table .. so 8764 unique chapter headings .. and 6870 books. When I import, I get Num Docs: 2784 Max Doc: 9488 Deleted Docs: 6704 Here is the config file (the relevant part): thanks, Csaba -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041811p4041996.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH deleting documents
I should also add that some of the books don't have chapters, so the query won't succeed for these books. But in this case I expected that the document won't be added at all .. rather than first added then deleted (which I am now suspecting is the case). It would be very helpful if I could see a list of deleted documents! I was trying to look in the terminal window (Jetty) but that did not help. I don't know where else Solr might put logs. I looked in /var/log.. but did not find anything useful looking. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041811p4042149.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH deleting documents
Thanks Gora. I do not have optimise automatically enabled, but I am new to Solr so I am not 100% familiar with all the steps that go on. I will try your suggestion, but I was hoping first that I could get the data straight from Solr. thanks, Csaba -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041811p4042426.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH deleting documents
Thanks Arcadius, Excellent suggestion about the view.I'll try to simplify things and see how I go. thanks, Csaba -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041811p4042663.html Sent from the Solr - User mailing list archive at Nabble.com.