Hi Charlie,

Thanks for your suggestion,  but I will have thousands of these files
coming from different sources. It would become very tedious if I have to
first convert them to csv and then run liny by line.

I was hoping if there could be a simpker way to achieve these using DIH
which I thought can be configured to read and ingest MS Excel (xlsx)
files.

I am not too sure of how the configuration file would look like.

Any pointers are welcome. Thanks!

On Fri, 26 Jul, 2019, 1:56 PM Charlie Hull, <char...@flax.co.uk> wrote:

> Convert the Excel file to a CSV and then write a teeny script to go
> through it line by line and submit to Solr over HTTP? Tika would
> probably work but it's a lot of heavy lifting for what seems to me like
> a simple problem.
>
> Cheers
>
> Charlie
>
> On 26/07/2019 09:19, Vipul Bahuguna wrote:
> > Hi Guys - can anyone suggest how to achieve this?
> > I have understood how to insert json documents. So one alternative that
> > comes to my mind is that I can convert the rows in my excel to json
> format
> > with the header of my excel file becoming the json keys (corresponding to
> > the fields I have defined in my managed-schema.xml). And then each cell
> in
> > the excel file will become the value of this field.
> >
> > However, I am sure there must be a better way and directly ingesting the
> > excel file to achieve the same. I was trying to reach about DIH and
> Apache
> > Tika, but I am not very sure of how the configuration works.
> >
> > My sample excel file has 4 columns namely -
> > 1. First Name
> > 2. Last Name
> > 3. Phone
> > 4. Website Link
> >
> > I want to index these fields into SOLR in a way that all these columns
> > become my solr schema fields and later I can search based on these
> fields.
> >
> > Any suggestions please.
> >
> > thanks !
> >
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>
>

Reply via email to