Re: OOM at MultiSegmentReader.norms

2009-03-28 Thread Michael McCandless
Still, 1024M ought to be enough to load one field's norms (how many
fields have norms?).  If you do things requiring FieldCache that'll
also consume RAM.

It's also possible you're hitting this bug (false OOME) in Sun's JRE:

  http://issues.apache.org/jira/browse/LUCENE-1566

Feel free to go vote for it!

Mike

On Fri, Mar 27, 2009 at 10:11 PM, Otis Gospodnetic
 wrote:
>
> That's a tiny heap.  Part of it is used for indexing, too.  And the fact that 
> your heap is so small shows you are not really making use of that nice 
> ramBufferSizeMB setting. :)
>
> Also, use omitNorms="true" for fields that don't need norms (if their types 
> don't already do that).
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: vivek sar 
>> To: solr-user@lucene.apache.org
>> Sent: Friday, March 27, 2009 6:15:59 PM
>> Subject: OOM at MultiSegmentReader.norms
>>
>> Hi,
>>
>>    I've index of size 50G (around 100 million documents) and growing -
>> around 2000 records (1 rec = 500 byes) are being written every second
>> continuously. If I make any search on this index I get OOM. I'm using
>> default cache settings (512,512,256) in the solrconfig.xml. The search
>> is using the admin interface (returning 10 rows) with no sorting,
>> faceting or highlighting. Max heap size is 1024m.
>>
>> Mar 27, 2009 9:13:41 PM org.apache.solr.common.SolrException log
>> SEVERE: java.lang.OutOfMemoryError: Java heap space
>>         at
>> org.apache.lucene.index.MultiSegmentReader.norms(MultiSegmentReader.java:335)
>>         at
>> org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:69)
>>         at 
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132)
>>         at org.apache.lucene.search.Searcher.search(Searcher.java:126)
>>         at org.apache.lucene.search.Searcher.search(Searcher.java:105)
>>         at
>> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:966)
>>         at
>> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)
>>         at
>> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)
>>         at
>> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
>>         at
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169)
>>         at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
>>         at
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>>         at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>>         at
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>         at
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>         at
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>         at
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>         at
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>         at
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>         at
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>
>> What could be the problem?
>>
>> Thanks,
>> -vivek
>
>


RE: [solr-user] Upgrade from 1.2 to 1.3 gives 3x slowdown

2009-03-28 Thread Grant Ingersoll

Hey Fergus,

Finally got a chance to run your scripts, etc. per the thread:
http://www.lucidimagination.com/search/document/5c3de15a4e61095c/upgrade_from_1_2_to_1_3_gives_3x_slowdown_script#8324a98d8840c623

I can reproduce your slowdown.

One oddity with rev 643465 is:

On the old version, there is an exception during startup:
Mar 28, 2009 10:44:31 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
at  
org 
.apache 
.solr 
.handler.component.SearchHandler.handleRequestBody(SearchHandler.java: 
129)
at  
org 
.apache 
.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
125)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:953)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:968)
at  
org 
.apache 
.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:50)

at org.apache.solr.core.SolrCore$3.call(SolrCore.java:797)
at java.util.concurrent.FutureTask 
$Sync.innerRun(FutureTask.java:303)

at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor 
$Worker.runTask(ThreadPoolExecutor.java:885)
at java.util.concurrent.ThreadPoolExecutor 
$Worker.run(ThreadPoolExecutor.java:907)

at java.lang.Thread.run(Thread.java:637)

I see two things in CHANGES.txt that might apply, but I'm not sure:
1. I think commons-csv was upgraded
2. The CSV loader stuff was refactored to share common code

I'm still investigating.

-Grant


Issue with Unique Key

2009-03-28 Thread prerna07


Issue: Unique Key not working.

We defined unique key in schema.xml:

 
IndexId_s 

Still multiple index are getting craete with the same Index_id.

Please suggest if we need to add/ change something in schema.xml.

Thanks,
prerna
-- 
View this message in context: 
http://www.nabble.com/Issue-with-Unique-Key-tp22759124p22759124.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Issue with Unique Key

2009-03-28 Thread Shalin Shekhar Mangar
As a best practice, use a unique key which is not tokenized e.g. a string
type.

On Sat, Mar 28, 2009 at 10:42 PM, prerna07  wrote:

>
>
> Issue: Unique Key not working.
>
> We defined unique key in schema.xml:
>
> 
> IndexId_s
>
> Still multiple index are getting craete with the same Index_id.
>
> Please suggest if we need to add/ change something in schema.xml.
>
> Thanks,
> prerna
> --
> View this message in context:
> http://www.nabble.com/Issue-with-Unique-Key-tp22759124p22759124.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: How to optimize Index Process?

2009-03-28 Thread vivek sar
Thanks Otis. This is very useful. I'll try all your suggestions and
post my findings (and improvements).

Thanks,
-vivek

On Fri, Mar 27, 2009 at 7:08 PM, Otis Gospodnetic
 wrote:
>
> Hi,
>
> Answers inlined.
>
>
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> - Original Message 
>>   We have a distributed Solr system (2-3 boxes with each running 2
>> instances of Solr and each Solr instance can write to multiple cores).
>
> Is this really optimal?  How many CPU cores do your boxes have vs. the number 
> of Solr cores?
>
>> Our use case is high index volume - we can get up to 100 million
>> records (1 record = 500 bytes) per day, but very low query traffic
>> (only administrators may need to search for data - once an hour our
>> so). So, we need very fast index time. Here are the things I'm trying
>> to find out in order to optimize our index process,
>
> It's tarting to sound like you might be able to batch your data and use 
> http://wiki.apache.org/solr/UpdateCSV -- it's the fastest indexing method, I 
> believe.
>
>> 1) What's the optimum index size? I've noticed as the index size grows
>> the indexing time starts increasing. In our test less than 10G index
>> size we could index over 2K/sec, but as it grows over 20G the index
>> rate drops to 1400/sec and keeps dropping as index size grows. I'm
>> trying to see whether we can partition (create new SolrCore) after
>> 10G.
>
> That's likely due to Lucene's segment merging. You can make mergeFactor 
> bigger to make segment merging less frequent, but don't make it to high or 
> you'll run into open file descriptor limits (which you could raise, of 
> course).
>
>>      - related question, is there a way to find the SolrCore size (any
>> web service for that?) - based on that information I can create a new
>> core and freeze the one which has reached 10G.
>
> You can see the number of docs in an index via Admin Statistics page (the 
> response is actually XML, look at the source)
>
>> 2) In our test, we noticed that after few hours (after 8 hours of
>> indexing) there is a period (3-4 hours period) where the indexing is
>> very-very slow (like 500 records/sec) and after that period indexing
>> returns back to normal rate (1500/sec). Does Solr run any optimize
>> command on its own? How can we find that out?  I'm not issuing any
>> optimize command - should I be doing that after certain time?
>
> No, it doesn't run optimize on its own.  It could be running auto-commit, but 
> you should comment that out anyway.  Try doing a thread dump to see what's 
> doing on and watching the system with top, vmstat.
> No, you shouldn't optimize until you are completely done.
>
>> 3) Every time I add new documents (10K at once) to the index I see
>> searcher closing and then re-opening/re-warming (in Catalina.out)
>> after commit is done. I'm not sure if this is an expensive operation.
>> Since, our search volume is very low can I configure Solr to not do
>> this? Would it make indexing any faster?
>
> Are you running the commit command after every 10K docs?  No need to do that 
> if you don't need your searcher to see the changes immediately.
>
>> Mar 26, 2009 11:59:45 PM org.apache.solr.search.SolrIndexSearcher close
>> INFO: Closing searc...@33d9337c main
>> Mar 26, 2009 11:59:52 PM org.apache.solr.update.DirectUpdateHandler2 commit
>> INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
>> Mar 26, 2009 11:59:52 PM org.apache.solr.search.SolrIndexSearcher
>> INFO: Opening searc...@46ba6905 main
>> Mar 26, 2009 11:59:52 PM org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming searc...@46ba6905 main from searc...@5c5ffecd main
>>
>> 4) Anything else (any other configuration in Solr - I'm currently
>> using all default settings in the solrconfig.xml and default handlers)
>> that could help optimize my indexing process?
>
> Increase ramBufferSizeMB as much as you can afford.
> Comment out maxBufferedDocs, it's deprecated.
> Increase mergeFactor slightly.
> Consider the CSV approach.
> Index with multiple threads (match the number of CPU cores).
> If you are using Solrj, use the Streaming version of SolrServer.
> Give the JVM more memory (you'll need it if you increase ramBufferSizeMB)
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>


Re: spellcheck.onlyMorePopular

2009-03-28 Thread David Smiley @MITRE.org

I know your issue has already been addressed but you may want to consider
"gran" being a synonym for "grand" and then analyzing it as such.
~ David Smiley


Marcus Stratmann wrote:
> 
> Hello,
> 
> I have another question concerning the spell checking mechanism.
> Setting onlyMorePopular=true and using the parameters
> 
> spellcheck=true&spellcheck.q=gran&q=gran&spellcheck.onlyMorePopular=true
> 
> I get the result
> 
> 
>   
>
> 1
> 0
> 4
> 13
> 
>  32
>  grand
> 
>
>true
>   
> 
> 
> which is okay.
> But when I turn off onlyMorePopular
> 
> spellcheck=true&spellcheck.q=gran&q=gran&spellcheck.onlyMorePopular=false
> 
> the output is
> 
> 
>   
> 
> 
> I was expecting to get *more* results when I turn off onlyMorePopular 
> and to get all of the results contained in the result without 
> onlyMorePopular ("grand") plus some more. Instead I get no spell check 
> results at all. Why is that?
> 
> Thanks,
> Marcus
> 
> 

-- 
View this message in context: 
http://www.nabble.com/spellcheck.onlyMorePopular-tp21975735p22761717.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: optimization advice?

2009-03-28 Thread Otis Gospodnetic

OK, how about this trick then.  Do you really need the full string for sorting? 
 Could you get by (cheat) sorting only on the first N characters?  If so, you 
could create a separate field for that (copyField will come handy) and that 
should consume a little less memory.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Steve Conover 
> To: solr-user@lucene.apache.org
> Sent: Saturday, March 28, 2009 1:13:04 AM
> Subject: Re: optimization advice?
> 
> String ;-) - we only allow sorting on string fields.
> 
> On Fri, Mar 27, 2009 at 9:21 PM, Otis Gospodnetic
> wrote:
> >
> > Steve,
> >
> > A field named "name" sounds like a free text field.  What is its type, 
> > string 
> or text?  Fields you sort by should not be tokenized and should be indexed.  
> I 
> have a hunch your name field is tokenized.
> >
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > - Original Message 
> >> From: Steve Conover 
> >> To: solr-user@lucene.apache.org
> >> Sent: Friday, March 27, 2009 11:59:52 PM
> >> Subject: Re: optimization advice?
> >>
> >> We sort by default on "name", which varies quite a bit (we're never
> >> going to make sorting by field go away).
> >>
> >> The thing is solr has been pretty amazing across 1 million records.
> >> Now that we've doubled the size of the dataset things are definitely
> >> slower in a nonlinear way...I'm wondering what factors are involved
> >> here.
> >>
> >> -Steve
> >>
> >> On Fri, Mar 27, 2009 at 6:58 PM, Otis Gospodnetic
> >> wrote:
> >> >
> >> > OK, we are a step closer.  Sorting makes things slower.  What field(s) 
> >> > do 
> you
> >> sort on, what are their types, and if there is a date in there, are the 
> >> dates
> >> very granular, and if they are, do you really need them to be that precise?
> >> >
> >> >
> >> > Otis
> >> > --
> >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >> >
> >> >
> >> >
> >> > - Original Message 
> >> >> From: Steve Conover
> >> >> To: solr-user@lucene.apache.org
> >> >> Sent: Friday, March 27, 2009 1:51:14 PM
> >> >> Subject: Re: optimization advice?
> >> >>
> >> >> > Steve,
> >> >> >
> >> >> > Maybe you can tell us about:
> >> >>
> >> >> sure
> >> >>
> >> >> > - your hardware
> >> >>
> >> >> 2.5GB RAM, pretty modern virtual servers
> >> >>
> >> >> > - query rate
> >> >>
> >> >> Let's say a few queries per second max... < 4
> >> >>
> >> >> And in general the challenge is to get latency on any given query down
> >> >> to something very low - we don't have to worry about a huge amount of
> >> >> load at the moment.
> >> >>
> >> >> > - document cache and query cache settings
> >> >>
> >> >>
> >> >> class="solr.LRUCache"
> >> >> size="512"
> >> >> initialSize="512"
> >> >> autowarmCount="256"/>
> >> >>
> >> >>
> >> >> class="solr.LRUCache"
> >> >> size="512"
> >> >> initialSize="512"
> >> >> autowarmCount="0"/>
> >> >>
> >> >> > - your current response times
> >> >>
> >> >> This depends on the query.  For queries that involve a total record
> >> >> count of < 1 million, we often see < 10ms response times, up to
> >> >> 4-500ms in the worst case.  When we do a page one, sorted query on our
> >> >> full record set of 2 million+ records, response times can get up into
> >> >> 2+ seconds.
> >> >>
> >> >> > - any pain points, any slow query patterns
> >> >>
> >> >> Something that can't be emphasized enough is that we can't predict
> >> >> what records people will want.  Almost every query is aimed at a
> >> >> different set of records.
> >> >>
> >> >> -Steve
> >> >
> >> >
> >
> >



Re: optimization advice?

2009-03-28 Thread Steve Conover
Otis,

That's an interesting suggestion.  I'm curious about the thought
process behind it though - we currently don't have memory problems,
and in fact our max memory setting is below where it could be.

Does your suggestion imply that something could be gained by throwing
more memory at the problem?  If so, could you explain a little bit
about why?

Regards,
Steve

On Sat, Mar 28, 2009 at 6:31 PM, Otis Gospodnetic
 wrote:
>
> OK, how about this trick then.  Do you really need the full string for 
> sorting?  Could you get by (cheat) sorting only on the first N characters?  
> If so, you could create a separate field for that (copyField will come handy) 
> and that should consume a little less memory.
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: Steve Conover 
>> To: solr-user@lucene.apache.org
>> Sent: Saturday, March 28, 2009 1:13:04 AM
>> Subject: Re: optimization advice?
>>
>> String ;-) - we only allow sorting on string fields.
>>
>> On Fri, Mar 27, 2009 at 9:21 PM, Otis Gospodnetic
>> wrote:
>> >
>> > Steve,
>> >
>> > A field named "name" sounds like a free text field.  What is its type, 
>> > string
>> or text?  Fields you sort by should not be tokenized and should be indexed.  
>> I
>> have a hunch your name field is tokenized.
>> >
>> >
>> > Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >
>> >
>> >
>> > - Original Message 
>> >> From: Steve Conover
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Friday, March 27, 2009 11:59:52 PM
>> >> Subject: Re: optimization advice?
>> >>
>> >> We sort by default on "name", which varies quite a bit (we're never
>> >> going to make sorting by field go away).
>> >>
>> >> The thing is solr has been pretty amazing across 1 million records.
>> >> Now that we've doubled the size of the dataset things are definitely
>> >> slower in a nonlinear way...I'm wondering what factors are involved
>> >> here.
>> >>
>> >> -Steve
>> >>
>> >> On Fri, Mar 27, 2009 at 6:58 PM, Otis Gospodnetic
>> >> wrote:
>> >> >
>> >> > OK, we are a step closer.  Sorting makes things slower.  What field(s) 
>> >> > do
>> you
>> >> sort on, what are their types, and if there is a date in there, are the 
>> >> dates
>> >> very granular, and if they are, do you really need them to be that 
>> >> precise?
>> >> >
>> >> >
>> >> > Otis
>> >> > --
>> >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >> >
>> >> >
>> >> >
>> >> > - Original Message 
>> >> >> From: Steve Conover
>> >> >> To: solr-user@lucene.apache.org
>> >> >> Sent: Friday, March 27, 2009 1:51:14 PM
>> >> >> Subject: Re: optimization advice?
>> >> >>
>> >> >> > Steve,
>> >> >> >
>> >> >> > Maybe you can tell us about:
>> >> >>
>> >> >> sure
>> >> >>
>> >> >> > - your hardware
>> >> >>
>> >> >> 2.5GB RAM, pretty modern virtual servers
>> >> >>
>> >> >> > - query rate
>> >> >>
>> >> >> Let's say a few queries per second max... < 4
>> >> >>
>> >> >> And in general the challenge is to get latency on any given query down
>> >> >> to something very low - we don't have to worry about a huge amount of
>> >> >> load at the moment.
>> >> >>
>> >> >> > - document cache and query cache settings
>> >> >>
>> >> >>
>> >> >>         class="solr.LRUCache"
>> >> >>         size="512"
>> >> >>         initialSize="512"
>> >> >>         autowarmCount="256"/>
>> >> >>
>> >> >>
>> >> >>         class="solr.LRUCache"
>> >> >>         size="512"
>> >> >>         initialSize="512"
>> >> >>         autowarmCount="0"/>
>> >> >>
>> >> >> > - your current response times
>> >> >>
>> >> >> This depends on the query.  For queries that involve a total record
>> >> >> count of < 1 million, we often see < 10ms response times, up to
>> >> >> 4-500ms in the worst case.  When we do a page one, sorted query on our
>> >> >> full record set of 2 million+ records, response times can get up into
>> >> >> 2+ seconds.
>> >> >>
>> >> >> > - any pain points, any slow query patterns
>> >> >>
>> >> >> Something that can't be emphasized enough is that we can't predict
>> >> >> what records people will want.  Almost every query is aimed at a
>> >> >> different set of records.
>> >> >>
>> >> >> -Steve
>> >> >
>> >> >
>> >
>> >
>
>


Re: How to optimize Index Process?

2009-03-28 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Sat, Mar 28, 2009 at 7:38 AM, Otis Gospodnetic
 wrote:
>
> Hi,
>
> Answers inlined.
>
>
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> - Original Message 
>>   We have a distributed Solr system (2-3 boxes with each running 2
>> instances of Solr and each Solr instance can write to multiple cores).
>
> Is this really optimal?  How many CPU cores do your boxes have vs. the number 
> of Solr cores?
>
>> Our use case is high index volume - we can get up to 100 million
>> records (1 record = 500 bytes) per day, but very low query traffic
>> (only administrators may need to search for data - once an hour our
>> so). So, we need very fast index time. Here are the things I'm trying
>> to find out in order to optimize our index process,
>
> It's tarting to sound like you might be able to batch your data and use 
> http://wiki.apache.org/solr/UpdateCSV -- it's the fastest indexing method, I 
> believe.

does CSV work with multivalued field?
If not, using SolrJ with BinaryRequestWriter is quite fast
>
>> 1) What's the optimum index size? I've noticed as the index size grows
>> the indexing time starts increasing. In our test less than 10G index
>> size we could index over 2K/sec, but as it grows over 20G the index
>> rate drops to 1400/sec and keeps dropping as index size grows. I'm
>> trying to see whether we can partition (create new SolrCore) after
>> 10G.
>
> That's likely due to Lucene's segment merging. You can make mergeFactor 
> bigger to make segment merging less frequent, but don't make it to high or 
> you'll run into open file descriptor limits (which you could raise, of 
> course).
>
>>      - related question, is there a way to find the SolrCore size (any
>> web service for that?) - based on that information I can create a new
>> core and freeze the one which has reached 10G.
>
> You can see the number of docs in an index via Admin Statistics page (the 
> response is actually XML, look at the source)
>
>> 2) In our test, we noticed that after few hours (after 8 hours of
>> indexing) there is a period (3-4 hours period) where the indexing is
>> very-very slow (like 500 records/sec) and after that period indexing
>> returns back to normal rate (1500/sec). Does Solr run any optimize
>> command on its own? How can we find that out?  I'm not issuing any
>> optimize command - should I be doing that after certain time?
>
> No, it doesn't run optimize on its own.  It could be running auto-commit, but 
> you should comment that out anyway.  Try doing a thread dump to see what's 
> doing on and watching the system with top, vmstat.
> No, you shouldn't optimize until you are completely done.
>
>> 3) Every time I add new documents (10K at once) to the index I see
>> searcher closing and then re-opening/re-warming (in Catalina.out)
>> after commit is done. I'm not sure if this is an expensive operation.
>> Since, our search volume is very low can I configure Solr to not do
>> this? Would it make indexing any faster?
>
> Are you running the commit command after every 10K docs?  No need to do that 
> if you don't need your searcher to see the changes immediately.
>
>> Mar 26, 2009 11:59:45 PM org.apache.solr.search.SolrIndexSearcher close
>> INFO: Closing searc...@33d9337c main
>> Mar 26, 2009 11:59:52 PM org.apache.solr.update.DirectUpdateHandler2 commit
>> INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
>> Mar 26, 2009 11:59:52 PM org.apache.solr.search.SolrIndexSearcher
>> INFO: Opening searc...@46ba6905 main
>> Mar 26, 2009 11:59:52 PM org.apache.solr.search.SolrIndexSearcher warm
>> INFO: autowarming searc...@46ba6905 main from searc...@5c5ffecd main
>>
>> 4) Anything else (any other configuration in Solr - I'm currently
>> using all default settings in the solrconfig.xml and default handlers)
>> that could help optimize my indexing process?
>
> Increase ramBufferSizeMB as much as you can afford.
> Comment out maxBufferedDocs, it's deprecated.
> Increase mergeFactor slightly.
> Consider the CSV approach.
> Index with multiple threads (match the number of CPU cores).
> If you are using Solrj, use the Streaming version of SolrServer.
> Give the JVM more memory (you'll need it if you increase ramBufferSizeMB)
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>



-- 
--Noble Paul