Hello, 

I've ran into quite the snag and I'm wondering if anyone can help me out
here. So the situation. 
I am using the DataImportHandler to pull from a database and a Linux file
system. The database has the metadata. The file system the document text. I
thought it had indexed all the files I had in the file system just fine.
HOWEVER, when I was trying to filter out bad documents I realized there were
two documents that existed in the file system that the DIH was not index.
Well I guess I shouldn't say that. When I run a faceted query with the
Authors facet enabled and the query as *:* to get all the results. It only
comes out with 279 out of the 281 documents indexed. And at the bottom when
I look at the authors of those two documents it comes out as 

   "Author #278",
        0,
  "Author #279",
        0

They have real names those are just filler. So this got me thinking that

Debug Idea number 1. The documents do not exist in the file system. or the
link is bad. It's pulling the metadata information just not the document
text. But no. The link is right. They both work. They exist and the links
are good. 

Debug Idea number 2. It isn't pulling the text so I can't search it. Okay.
So run a debug query when I use the dataimport handler.

So the debug query only indexes the first ten documents. Which I assume in
the default so let me know if I'm wrong. 

And the strangest part. When I run a query on the debug import session. I
can search for my documents. And it includes them in the faceted search.
There are only 8 documents to search through (it throws out 2 because they
only exist as hardcopies). And I can look for them. 

   "Author #278",
        1,
  "Author #279",
        1

What is going on? Cause I am so very confused. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Not-Indexing-Two-Documents-tp4217546.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to