Solr Cell. Seems to be only indexing the first N bytes of a text file.

Ross Sat, 20 Mar 2010 17:40:59 -0700

Hi all

I'm trying to index some text files using Solr Cell. I'm using the
schema from Avi Rappoport's tutorial about indexing html and text
files although I also had the same problem with the example/solr
setup.


My problem is that words past or "below" a certain point in a file are
not being indexed. I must be hitting some limit but I haven't been
able to figure out what. I'm hosting with Tomcat and using cURL to
post files to /update/extract as per Avi's tutorial and other docs. I
don't think it's an http limit during the POST because the whole file
is being successfully stored in Solr. I know that because if I
retrieve the file body with a query that does work, the word that
doesn't work appears lower down in the returned contents. I'm storing
the contents now for testing. Once I have this working, the file
contents will probably be indexed only.

On a test file that I've been editing and moving my unique word
around, it seems to stop working if that word is beyond the 100 KB
point in the file. I think another file earlier gave a different
result.

Hopefully I'm missing something obvious.

Thanks for any help.

Ross

Solr Cell. Seems to be only indexing the first N bytes of a text file.

Reply via email to