"There is no current feature" is what I meant. Yes, it would be very
handy to do this.
I handled this problem in the DIH by creating two documents, both with
the same unique ID. The first doc just had the metadata. The second
document parsed the input with Tika, but had 'skip doc on error' set
On Nov 14, 2010, at 3:02pm, Lance Norskog wrote:
Yes, the ExtractingRequestHandler uses Tika to parse many file
formats.
Solr 1.4.1 uses a previous version of Tika (0.6 or 0.7).
Here's the problem with Tika and extraction utilities in general:
they are not perfect. They will fail on some
Yes, the ExtractingRequestHandler uses Tika to parse many file formats.
Solr 1.4.1 uses a previous version of Tika (0.6 or 0.7).
Here's the problem with Tika and extraction utilities in general: they
are not perfect. They will fail on some files. In the
ExtractingRequestHandler's case, there i
Thanks for all the responses.
Govind: To answer your question, yes, all I want to search is plain text
files. They are located in NFS directories across multiple Solaris/Linux
storage boxes. The total storage is in hundreds of terabytes.
I have just got started with Solr and my understanding is t
Another pov you might want to think about - what kind of search you want.
Just plain - full text search or there is something more to those text
files. Are they grouped in folders? Do the folders imply certain kind of
grouping/hierarchy/tagging?
I recently was trying to help somebody who had files
About web servers: Solr is a servlet war file and needs a Java web
server "container" to run. The example/ folder in the Solr disribution
uses 'Jetty', and this is fine for small production-quality projects.
You can just copy the example/ directory somewhere to set up your own
running Solr; th
Think of the data import handler (DIH) as Solr pulling data to index
from some source based on configuration. So, once you set up
your DIH config to point to your file system, you issue a command
to solr like "OK, do your data import thing". See the
FileListEntityProcessor.
http://wiki.apache.org/s
Hi Lance,
Thank you very much for responding (not sure how I reply to the group, so,
writing to you).
Can you please expand on your suggestion? I am not a web guy and so, don't
know where to start.
What is the difference between SolrJ and DataImportHandler? Do I need to set
up web servers on all
Using 'curl' is fine. There is a library called SolrJ for Java and
other libraries for other scripting languages that let you upload with
more control. There is a thing in Solr called the DataImportHandler
that lets you script walking a file system.
On Thu, Nov 11, 2010 at 8:38 PM, K. Seshadri Iye