Re: Indexing PDF file in Apache SOLR via Apache TIKA

☼ R Nair Tue, 30 Oct 2018 10:28:40 -0700

I have done a production implementation of this, running for last four
months without any issue. Just a resatrt every week of all components.


http://blog.cloudera.com/blog/2015/10/how-to-index-scanned-pdfs-at-scale-using-fewer-than-50-lines-of-code/


Best, Ravion

On Tue, Oct 30, 2018, 1:00 PM Erick Erickson <erickerick...@gmail.com>
wrote:

> All of the above work, but for robust production situations you'll
> want to consider a SolrJ client, see:
> https://lucidworks.com/2012/02/14/indexing-with-solrj/. That blog
> combines indexing from a DB and using Tika, but those are independent.
>
> Best,
> Erick
> On Tue, Oct 30, 2018 at 12:21 AM Kamuela Lau <kamuela....@gmail.com>
> wrote:
> >
> > Hi there,
> >
> > Here are a couple of ways I'm aware of:
> >
> > 1. Extract-handler / post tool
> > You can use the curl command with the extract handler or bin/post to
> upload
> > a single document.
> > Reference:
> >
> https://lucene.apache.org/solr/guide/7_5/uploading-data-with-solr-cell-using-apache-tika.html
> >
> > 2. DataImportHandler
> > This could be used for, say, uploading multiple documents with Tika.
> > Reference:
> >
> https://lucene.apache.org/solr/guide/7_5/uploading-structured-data-store-data-with-the-data-import-handler.html#the-tikaentityprocessor
> >
> > You should also be able to do it via the admin page, so long as you
> define
> > and modify the extract handler in solrconfig.xml.
> > Reference:
> >
> https://lucene.apache.org/solr/guide/7_5/documents-screen.html#file-upload
> >
> > Hope this helps!
> >
> > On Tue, Oct 30, 2018 at 3:40 PM adiyaksa kevin <adiyaksake...@gmail.com>
> > wrote:
> >
> > > Hello there, let me introduce my self. My name is Mohammad Kevin Putra
> (you
> > > can call me Kevin), from Indonesia, i am a beginner in backend
> developer, i
> > > use Linux Mint, i use Apache SOLR 7.5.0 and Apache TIKA 1.91.0.
> > >
> > > I have a little bit problem about how to put PDF File via Apache TIKA.
> I
> > > understand how SOLR or TIKA works, but i don't know how they both
> > > integrated.
> > > Last thing i know, TIKA can extract the PDF file i upload, and parse it
> > > into data/meta data automatically. And i just have to copy & paste it
> to
> > > the "Documents" tab in core solr.
> > > The question is :
> > > 1. can i upload PDF File to SOLR via TIKA with GUI mode ? or is it only
> > > with CLI mode ? if yes only with CLI mode, can you explain it to me
> please
> > > ?
> > > 2. Is it possible to add a text result in "Query" tab ?.
> > >
> > > The Background i asking about this is, i want to indexing PDF in my
> local
> > > system, then i just upload it like "drag & drop" in SOLR (is it
> possible ?)
> > > then when i type something in search box the result is like this :
> > > (Title of doc)
> > > blablablabla (yellow stabilo result) blablabla.
> > > the blablabla text is like a couple sentences. That's all i need.
> > > Sorry for my bad english.
> > > Thanks for reading and replying this for me, it will be very helpful
> to me.
> > > Thanks a lot
> > >
>

Re: Indexing PDF file in Apache SOLR via Apache TIKA

Reply via email to