For (2), look at your admin/stats page. The difference between numDocs and maxDocs is the number of documents that have been deleted from your index...
For (3) I don't have a clue about. Best Erick On Sat, Sep 17, 2011 at 7:20 PM, Pulkit Singhal <pulkitsing...@gmail.com> wrote: > My DIH's full-import logs end with a tailing output saying that 1500 > documents were added, which is correct because I have 16 sources and > one of them was down and each source is supposed to give me 100 > results: > (1500 adds)],optimize=} 0 0 > > But When I check my document count I get only 1384 results: > INFO: [rss] webapp=/solr path=/select params={start=0&q=*:*&rows=0} > hits=1384 status=0 QTime=0 > > 1) I think I may have duplicates based on the primary key for the data > coming in. Is there any other explnation than that? > 2) Is there some way to get a log of how many documents were deleted? > Because an update does a delete then add, this would allow me to make > sure of what is going on. > > The sources I have are URL based, soemtimes they appear to be down > because the request gets denied I suppose: > SEVERE: Exception thrown while getting data > java.io.FileNotFoundException: > http://www.amazon.com/rss/tag/anime/popular/ref=tag_tdp_rss_pop_man?length=100 > Caused by: java.io.FileNotFoundException: > http://www.amazon.com/rss/tag/anime/popular/ref=tag_tdp_rss_pop_man?length=100 > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434) > > 3) Is there some way to configure the datasource to retry 3 time or > something like that? I have increased the values for connectionTimeout > and readTimeout but it doesn't help when sometimes the server simply > denies the request due to heavy load. I need to be able to retry at > those times. The onError has only the abort,skip,continue options, non > of which really let me retry anything. > > Thank You. > - Pulkit >