You haven't really told us _how_ you are indexing, so I'm going to make some comments that may be irrelevant...
At 600M documents, you'll almost certainly have to shard your index. It sounds like you're doing the sharding yourself in Lucene by having different Lucene indexes based on date. As you indicate, this makes it more difficult for users. SolrCloud handles this for you and is very likely preferable. bq: I know that SolrCloud can solve the search problem when the index data is big, but it`s even slower in indexing than Solr. This is not necessarily true. In fact, SolrCloud should be much faster _if_ you use SolrJ, and the "CloudSolrClient" in the SolrJ program to index to Solr. Under the covers, Solr routes documents to the correct shard based on a hash of the <uniqueKey>. CloudSolrClient sends the documents to the correct shard automatically, so if you have 10 shards and index a batch of, say, 1,000 documents, 10 groups of 100 docs will be sent out in parallel, one to each shard. Here's an example of a SolrJ program (just a Java program using Solr libraries): https://lucidworks.com/2012/02/14/indexing-with-solrj/. Note that this code is rather old so it uses StreamingUpdateSolrServer where you should use CloudSolrClient. It also processes structured documents using Tika, but you can remove those bits of the code. One technique I use when using SolrJ is to comment out the single line that sends the doc to Solr (_server.add(docs) in the example). This tells me whether the bottleneck is in getting the data from the database or indexing it to Solr. Often the bottleneck is getting the data, but with 600M documents that may not be the case. Once your cluster is set up, you might then be able to fire up several indexing clients. This assumes that you can partition getting the data from your database. Say you are indexing 10 years' of data. Fire up 10 clients each of which only indexes 1 year's worth of data. Hope that helps, Erick On Tue, Mar 21, 2017 at 7:08 AM, Q&Q <793555...@qq.com> wrote: > Dear Sir/Madam, I am Li Wei, from China, and I`m writing to you for your > help. Here is the problem I encountered: > > > There is a timing task set at night in our project which uses Lucene to > build index for the data from Oracle database. It was working fine at the > beginning, however, as the index file grows bigger, the indexing work is > getting slower, when the data needed to be indexed is big, the timing task > can't be finished at night. > > > To solve this problem, we take the following measure: > We store the index data in different directories according to the time the > data inserted into the database. This measure can solve the indexing problem > in some way. However, when searching the index data, the user has to specify > the year when the data is created so as to search in the corresponding > directory,it`s a bad experience for the users. > > > Then we learned that Solr is good at indexing data from database, so we > decide to adopt Solr into our project. But as the index data gets bigger, it > would also take more and more time for Solr to finish the index task. I know > that SolrCloud can solve the search problem when the index data is big, but > it`s even slower in indexing than Solr. > > > So I am writing to you for help. Is there any solution for Solr to handle > this kind problem? There are more than six hundred million records in the > database right now, and data will be added into the database everyday. > Whether it is true that if we don't set the UniqueKey property in the > config.xml file, then the problem will be avoided? If so, there`s another > problem, the index data can be only added, but can't be updated without the > UniqueKey property. Could you please give me some solutions for these > problems? > > > I am looking forward to you sincerely. Thank you very much for your time! > > > Best regards, > Li Wei