subject:"Re\: Bulk word document indexing"

Re: Bulk word document indexing

2013-03-05 Thread Dmitry Kan

Probably, the bulk indexing feature is not implemented for tika processing, but you can easily compile a script yourself: Extract in a loop over the word files in a directory: curl " http://localhost:8983/solr/update/extract?literal.id=doc5&defaultField=text"; --data-binary @tutorial.html -H 'Co

Re: Bulk word document indexing

2013-03-05 Thread Dmitry Kan

Hello, Look towards Tika. It can handle these MS Word file formats: http://tika.apache.org/1.3/formats.html#Microsoft_Office_document_formats Solr Wiki: http://wiki.apache.org/solr/ExtractingRequestHandler I don't have a link for a tutorial with example schemas. Dmitry On Tue, Mar 5, 2013 at