Indexing rich documents from websites using ExtractingRequestHandler

ahammad Wed, 08 Jul 2009 07:41:01 -0700

Hello,

I can index rich documents like pdf for instance that are on the filesystem.
Can we use ExtractingRequestHandler to index files that are accessible on a
website?


For example, there is a file that can be reached like so:
http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf

How would I go about indexing that file? I tried using the following
combinations. I will put the errors in brackets:

stream.file=http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The
filename, directory name, or volume label syntax is incorrect)
stream.file=www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The system
cannot find the path specified)
stream.file=//www.sub.myDomain.com/files/pdfdocs/testfile.pdf (The format of
the specified network name is invalid)
stream.file=sub.myDomain.com/files/pdfdocs/testfile.pdf (The system cannot
find the path specified)
stream.file=//sub.myDomain.com/files/pdfdocs/testfile.pdf (The network path
was not found)

I sort of understand why I get those errors. What are the alternative
methods of doing this? I am guessing that the stream.file attribute doesn't
support web addresses. Is there another attribute that does?
-- 
View this message in context: 
http://www.nabble.com/Indexing--rich-documents-from-websites-using-ExtractingRequestHandler-tp24392809p24392809.html
Sent from the Solr - User mailing list archive at Nabble.com.

Indexing rich documents from websites using ExtractingRequestHandler

Reply via email to