My DIH's full-import logs end with a tailing output saying that 1500
documents were added, which is correct because I have 16 sources and
one of them was down and each source is supposed to give me 100
results:
(1500 adds)],optimize=} 0 0

But When I check my document count I get only 1384 results:
INFO: [rss] webapp=/solr path=/select params={start=0&q=*:*&rows=0}
hits=1384 status=0 QTime=0

1) I think I may have duplicates based on the primary key for the data
coming in. Is there any other explnation than that?
2) Is there some way to get a log of how many documents were deleted?
Because an update does a delete then add, this would allow me to make
sure of what is going on.

The sources I have are URL based, soemtimes they appear to be down
because the request gets denied I suppose:
SEVERE: Exception thrown while getting data
java.io.FileNotFoundException:
http://www.amazon.com/rss/tag/anime/popular/ref=tag_tdp_rss_pop_man?length=100
Caused by: java.io.FileNotFoundException:
http://www.amazon.com/rss/tag/anime/popular/ref=tag_tdp_rss_pop_man?length=100
        at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)

3) Is there some way to configure the datasource to retry 3 time or
something like that? I have increased the values for connectionTimeout
and readTimeout but it doesn't help when sometimes the server simply
denies the request due to heavy load. I need to be able to retry at
those times. The onError has only the abort,skip,continue options, non
of which really let me retry anything.

Thank You.
- Pulkit

Reply via email to