DIH deleting documents

2013-02-21 Thread cveres
I am adding documents with data import handler from a mysql database. I
create a unique id for each document by concatenating a couple of fields in
the database. Every id is unique.

After the import, over half the documents which were imported are deleted
again, leaving me with less then half the documents in the database ending
up in the Solr index.

Is there a way to get a list of the deleted documents, so that I can start
troubleshooting what went wrong? 

thanks,

Csaba



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041809.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH deleting documents

2013-02-21 Thread cveres
Thanks Gora,

Sorry I might not have been sufficiently clear.

I start with an empty index, then add documents.
9000 are added and 6000 immediately deleted again, leaving 3000.
I assume this can only happen with duplicate IDs, but that should not be
possible! So I wanted to get a list of deleted documents so that I could try
and figure out why they were deleted immediately.

thanks,

Csaba



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041811p4041887.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH deleting documents

2013-02-21 Thread cveres
Hi Gora and Arcadius,

Thanks for your help. I'll try and answer both your questions here.

I am interested in three database tables. "Book" contains information about
books, "page" has the content of each book page by page, and "chapter"
contains the title of each chapter in every book, and the page on which the
chapter begins. It is a bit of a mess because I need the contents of each
chapter in every book, but I have to infer which pages each chapter contains
by its page number. So there is quite a complex query.

There are 8764 rows in the chapter table .. so 8764 unique chapter headings
.. and 6870 books. 

When I import, I get 

Num Docs:
2784
Max Doc:
9488
Deleted Docs:
6704

Here is the config file (the relevant part):

   
 
 
   


  
   
  
   
   

thanks,

Csaba



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041811p4041996.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH deleting documents

2013-02-21 Thread cveres
I should also add that some of the books don't have chapters, so the query
won't succeed for these books.
But in this case I expected that the document won't be added at all ..
rather than first added then deleted (which I am now suspecting is the
case).
It would be very helpful if I could see a list of deleted documents! I was
trying to look in the terminal window (Jetty) but that did not help. I don't
know where else Solr might put logs. I looked in /var/log.. but did not find
anything useful looking.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041811p4042149.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH deleting documents

2013-02-23 Thread cveres
Thanks Gora. I do not have optimise automatically enabled, but I am new to
Solr so I am not 100% familiar with all the steps that go on.

I will try your suggestion, but I was hoping first that I could get the data
straight from Solr. 

thanks, Csaba



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041811p4042426.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH deleting documents

2013-02-25 Thread cveres
Thanks Arcadius,

Excellent suggestion about the view.I'll try to simplify things and see how
I go.

thanks,

Csaba



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041811p4042663.html
Sent from the Solr - User mailing list archive at Nabble.com.