Re: indexing rich documents

2010-07-16 Thread Lance Norskog
The libraries are searched in the solr/llib directory, not solr home. If using multicore, solr/core/lib. These are searched automatically. You can also tell Solr to search in other directories with the directive in solrconfig.xml. On Tue, Jul 13, 2010 at 11:48 PM, satya swaroop wrote: > > here

Re: indexing rich documents

2010-07-13 Thread satya swaroop
ya i checked the extraction request handler but couldnt get the info... i installed tika-0.7 and copied the jar files into the solr home library.. i started sending the pdf/html files then i get a lazy error. i am using tomcat and solr 1.4

Re: indexing rich documents

2010-07-13 Thread satya swaroop
hi, yes i followed the wiki and can now tell me the procedure for it regards, swaroop

Re: indexing rich documents

2010-07-13 Thread Markus Jelsma
Hi, Are you sure you followed the wiki [1] on this subject? There is an example there but you need Solr 1.4.0 or higher. I unsure if just patching 1.3.0 will really do the trick. The patch must then also include Apache Tika, which sits under the hood, extracting content and meta data from vario

Re: indexing rich documents

2010-07-13 Thread Nikola Garafolic
On 07/13/2010 02:11 PM, satya swaroop wrote: Hi all, i am new to solr and followed with the wiki and got the solr admin run sucessfully. It is good going for xml files. But to index the rich documents i am unable to get it. I followed wiki to make the richer documents also, but i didnt

Re: Indexing rich documents from websites using ExtractingRequestHandler

2009-07-08 Thread Jay Hill
I haven't tried this myself, but it sounds like what you're looking for is enabling remote streaming: http://wiki.apache.org/solr/ContentStream#head-7179a128a2fdd5dde6b1af553ed41735402aadbf As the link above shows you should be able to enable remote streaming like this: and then something like t

Re: Indexing rich documents from websites using ExtractingRequestHandler

2009-07-08 Thread Glen Newton
Try putting all the PDF URLs into a file, download with something like 'wget' then index locally. Glen Newton http://zzzoot.blogspot.com/ 2009/7/8 ahammad : > > Hello, > > I can index rich documents like pdf for instance that are on the filesystem. > Can we use ExtractingRequestHandler to index f