I rather doubt that it's a Solr issue. Text is text after all. If
some docs display text, then it's probably a matter of not
getting the text in the first place.

My _guess_ is that you're not getting any text at all from
the document. Either the document isn't being found
or it's not a form that Tika expects (perhaps the file's extension has
been changed and it's really an Libre Office file. Or Tika has a bug.
Or your database doesn't have a value for TextContentURL. Or...

So what I'd do, since you know the name of the file in question is
print out what text you get from it to try to put in the Solr doc and go
from there.

Best,
Erick

On Thu, Jul 9, 2015 at 9:59 AM, Paden <rumsey...@gmail.com> wrote:
> Haha no need to reinvent wheels. Especially when you don't know java. Just a
> prototype anyway.
>
> I made a very strong assumption that it was pulling the text as blank
> because I would copy the EXACT same text from one file in the file system
> and put it into another file under a different name, but instead of it show
> as
>
> }
> Author:"Some author"
> text:"blank"
> }
>
> It would show as
>
> }
> Author:"Some author"
> text:"text that should have shown up in the other file but appeared as
> blank"'
> }
>
> But I'm a more familiar with solr now than I was about 4 weeks ago so I'll
> run that debugger and see if I can find something that's a problem. I just
> find it weird that it was ONLY .doc files and when I put it into another
> .doc it actually pulled. Thanks for the post and let me know if there's any
> new info I should know.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrJ-Tika-custom-indexer-not-indexing-CERTAIN-doc-text-tp4216541p4216576.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to