Sorry, with the "paging through the results outside of Solr," I meant
writing a test to see how long it takes to get through all the results in a
test harness that doesn't use Solr.

I agree with Shawn that you might need to do some JVM tuning to get things
going quicker. You might want to try to monitor your Solr instance with
something like VisualVM to see if it's garbage collecting too much. Also, I
agree with Walter that this might be a pretty decent rate for a single
thread to be importing documents into a single instance, and you might need
to investigate other options to parallelize it if you want it to go faster.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions
w: appinions.com <http://www.appinions.com/>


On Mon, Jun 10, 2013 at 3:05 AM, Sebastian Steinfeld <
sebastian.steinf...@mgm-tp.com> wrote:

> Hi Michael,
>
> the database I am using is Oracle. That's right, I am selecting from a
> view.
> What do you mean by selecting from outside of solr? I thought the
> batchsize will do the pagination?
>
> The load of the database server is not increasing during the import. It
> seems that the database is doing nothing.
>
> Thanks,
> Sebastian
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
> Gesendet: Donnerstag, 6. Juni 2013 18:29
> An: solr-user@lucene.apache.org
> Betreff: Re: Solr indexing slows down
>
> Hi Sebastian,
>
> What database are you using? How much RAM is available on your machine? It
> looks like you're selecting from a view... Have you tried paging through
> the view outside of Solr? Does that slow down as well? Do you notice any
> increased load on the Solr box or the database server?
>
>
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062  | c: +1 917 477 7906
>
> appinions inc.
>
> "The Science of Influence Marketing"
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions <https://twitter.com/Appinions> | g+:
> plus.google.com/appinions
> w: appinions.com <http://www.appinions.com/>
>
>
> On Thu, Jun 6, 2013 at 6:13 AM, Sebastian Steinfeld <
> sebastian.steinf...@mgm-tp.com> wrote:
>
> > Hi,
> >
> > I am new to solr and we want to use Solr to speed up our product search.
> > And it is working really nice, but I think I have a problem with the
> > indexing.
> > It slows down after a few minutes.
> >
> > I am using the DataImportHandler to import the products from the
> database.
> > And I start the import by executing the following HTTP request:
> > /dataimport?command=full-import&clean=true&commit=true
> >
> > I guess this are the importend parts of my configuration:
> >
> > schema.xml:
> > ----------------------------------------------
> > <fields>
> >    <field name="pk"               type="long"        indexed="true"
> >  stored="true" required="true"  />
> >    <field name="code"             type="string"      indexed="true"
> >  stored="true" required="true"  />
> >    <field name="ean"              type="string"      indexed="true"
> >  stored="false"  />
> >    <field name="name"             type="lowercase"   indexed="true"
> >  stored="false"  />
> >    <field name="text" type="text_general" indexed="true" stored="false"
> > multiValued="true"/>
> >    <field name="_version_" type="long" indexed="true" stored="true"/>
> > </fields> ....
> >     <fieldType name="lowercase" class="solr.TextField"
> > positionIncrementGap="100">
> >       <analyzer>
> >         <tokenizer class="solr.KeywordTokenizerFactory"/>
> >         <filter class="solr.LowerCaseFilterFactory" />
> >       </analyzer>
> >     </fieldType>
> > ----------------------------------------------
> >
> > solrconfig.xml:
> > ----------------------------------------------
> >   <requestHandler name="/dataimport"
> > class="org.apache.solr.handler.dataimport.DataImportHandler">
> >     <lst name="defaults">
> >         <str name="config">dataimport-handler.xml</str>
> >     </lst>
> >   </requestHandler>
> > ----------------------------------------------
> >
> > dataimport-handler.xml:
> > ----------------------------------------------
> > <dataConfig>
> >     <dataSource name="local" driver="="*************" "
> >                 url="*************"
> >                 user="*************" "
> >                 password="*************"
> >                 />
> >    <document>
> >             <entity name="product" pk="PRODUCTS_PK" dataSource="local"
> >                         query="SELECT   PRODUCTS_PK, PRODUCTS_CODE,
> > PRODUCTS_EAN, PRODUCTSLP_NAME FROM V_SOLR_IMPORT4PRODUCT_SEARCH">
> >             <field column="PRODUCTS_PK"       name="pk" />
> >             <field column="PRODUCTS_CODE"     name="code" />
> >             <field column="PRODUCTS_EAN"      name="ean" />
> >             <field column="PRODUCTSLP_NAME"   name="name" />
> >         </entity>
> >     </document>
> > </dataConfig>
> > ----------------------------------------------
> >
> > The amout of documents I want to index is 8 million, the first 1,6
> > million are indexed in 2min, but to complete the Import it takes nearly
> 2 hours.
> > The size of the index on the hard drive is 610MB.
> > I started the solr server with 2GB memory.
> >
> >
> > I read that the duration of indexing might be connected to the batch
> > size, so I increased the batchSize in the dataSource to 10.000, but
> > this didn't make any differences.
> > I also tried to disable the autocommit, which is configured in the
> > solrconfig.xml. I disabled it by uncommenting it, but this also didn't
> > made any differences.
> >
> > It would be realy nice if someone of you could help me with this problem.
> >
> > Thank you very much,
> > Sebastian
> >
> >
>

Reply via email to