I have done a production implementation of this, running for last four months without any issue. Just a resatrt every week of all components.
http://blog.cloudera.com/blog/2015/10/how-to-index-scanned-pdfs-at-scale-using-fewer-than-50-lines-of-code/ Best, Ravion On Tue, Oct 30, 2018, 1:00 PM Erick Erickson <erickerick...@gmail.com> wrote: > All of the above work, but for robust production situations you'll > want to consider a SolrJ client, see: > https://lucidworks.com/2012/02/14/indexing-with-solrj/. That blog > combines indexing from a DB and using Tika, but those are independent. > > Best, > Erick > On Tue, Oct 30, 2018 at 12:21 AM Kamuela Lau <kamuela....@gmail.com> > wrote: > > > > Hi there, > > > > Here are a couple of ways I'm aware of: > > > > 1. Extract-handler / post tool > > You can use the curl command with the extract handler or bin/post to > upload > > a single document. > > Reference: > > > https://lucene.apache.org/solr/guide/7_5/uploading-data-with-solr-cell-using-apache-tika.html > > > > 2. DataImportHandler > > This could be used for, say, uploading multiple documents with Tika. > > Reference: > > > https://lucene.apache.org/solr/guide/7_5/uploading-structured-data-store-data-with-the-data-import-handler.html#the-tikaentityprocessor > > > > You should also be able to do it via the admin page, so long as you > define > > and modify the extract handler in solrconfig.xml. > > Reference: > > > https://lucene.apache.org/solr/guide/7_5/documents-screen.html#file-upload > > > > Hope this helps! > > > > On Tue, Oct 30, 2018 at 3:40 PM adiyaksa kevin <adiyaksake...@gmail.com> > > wrote: > > > > > Hello there, let me introduce my self. My name is Mohammad Kevin Putra > (you > > > can call me Kevin), from Indonesia, i am a beginner in backend > developer, i > > > use Linux Mint, i use Apache SOLR 7.5.0 and Apache TIKA 1.91.0. > > > > > > I have a little bit problem about how to put PDF File via Apache TIKA. > I > > > understand how SOLR or TIKA works, but i don't know how they both > > > integrated. > > > Last thing i know, TIKA can extract the PDF file i upload, and parse it > > > into data/meta data automatically. And i just have to copy & paste it > to > > > the "Documents" tab in core solr. > > > The question is : > > > 1. can i upload PDF File to SOLR via TIKA with GUI mode ? or is it only > > > with CLI mode ? if yes only with CLI mode, can you explain it to me > please > > > ? > > > 2. Is it possible to add a text result in "Query" tab ?. > > > > > > The Background i asking about this is, i want to indexing PDF in my > local > > > system, then i just upload it like "drag & drop" in SOLR (is it > possible ?) > > > then when i type something in search box the result is like this : > > > (Title of doc) > > > blablablabla (yellow stabilo result) blablabla. > > > the blablabla text is like a couple sentences. That's all i need. > > > Sorry for my bad english. > > > Thanks for reading and replying this for me, it will be very helpful > to me. > > > Thanks a lot > > > >