Hi Nagendra, Thanks a lot .. i will start working on NRT today.. meanwhile old settings (increased warmSearcher in Master) have not given me trouble till now ..
but NRT will be more suitable to us ... Will work on that one and will analyze the performance and share with you. Thanks Naveen 2011/8/17 Nagendra Nagarajayya <nnagaraja...@transaxtions.com> > Naveen: > > See below: > >> *NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a >> >> document to become searchable*. Any document that you add through update >> becomes immediately searchable. So no need to commit from within your >> update client code. Since there is no commit, the cache does not have to >> be >> cleared or the old searchers closed or new searchers opened, and warmed >> (error that you are facing). >> >> >> Looking at the link which you mentioned is clearly what we wanted. But the >> real thing is that you have "RA does need a commit for a document to >> become >> searchable" (please take a look at bold sentence) . >> >> > Yes, as said earlier you do not need a commit. A document becomes > searchable as soon as you add it. Below is an example of adding a document > with curl (this from the wiki at http://solr-ra.tgels.com/wiki/** > en/Near_Real_Time_Search_ver_**3.x<http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x> > ): > > curl "http://localhost:8983/solr/**update/csv?stream.file=/tmp/** > x1.csv&encapsulator=%1f<http://localhost:8983/solr/update/csv?stream.file=/tmp/x1.csv&encapsulator=%1f> > " > > > There is no commit included. The contents of the document become > immediately searchable. > > > In future, for more loads, can it cater to Master Slave (Replication) and >> etc to scale and perform better? If yes, we would like to go for NRT and >> looking at the performance described in the article is acceptable. We were >> expecting the same real time performance for a single user. >> >> > There are no changes to Master/Slave (replication) process. So any changes > you have currently will work as before or if you enable replication later, > it should still work as without NRT. > > > What about multiple users, should we wait for 1-2 secs before calling the >> curl request to make SOLR perform better. Or internally it will handle >> with >> multiple request (multithreaded and etc). >> > > Again for updating documents, you do not have to change your current > process or code. Everything remains the same, except that if you were > including commit, you do not include commit in your update statements. There > is no change to the existing update process so internally it will not queue > or multi-thread updates. It is as in existing Solr functionality, there no > changes to the existing setup. > > Regarding perform better, in the Wiki paper every update through curl adds > (streams) 500 documents. So you could take this approach. (this was > something that I chose randomly to test the performance but seems to be > good) > > > What would be doc size (10,000 docs) to allow JVM perform better? Have you >> done any kind of benchmarking in terms of multi threaded and multi user >> for >> NRT and also JVM tuning in terms of SOLR sever performance. Any kind of >> performance analysis would help us to decide quickly to switch over to >> NRT. >> >> > The performance discussed in the wiki paper uses the MBArtists index. The > MBArtists index is the index used as one of the examples in the book, Solr > 1.4 Enterprise Search Server. You can download and build this index if you > have the book or can also download the contents from musicbrainz.org. > Each doc maybe about 100 bytes and has about 7 fields. Performance with > wikipedia's xml dump, commenting out skipdoc field (include redirects) in > the dataconfig.xml [ dataimport handler ], the update performance is about > 15000 docs / sec (100 million docs), with the skipdoc enabled (does not skip > redirects), the performance is about 1350 docs / sec [ time spent mostly > converting validating/xml than actual update ] (about 11 million docs ). > Documents in wikipedia can be quite big, at least avg size of about > 2500-5000 bytes or more. > > I would suggest that you download and give NRT with Apache Solr 3.3 and > RankingAlgorithm a try and get a feel of it as this would be the best way to > see how your config works with it. > > > Questions in terms for switching over to NRT, >> >> >> 1.Should we upgrade to SOLR 4.x ? >> >> 2. Any benchmarking (10,000 docs/secs). The question here is more >> specific >> >> the detail of individual doc (fields, number of fields, fields size, >> parameters affecting performance with faceting or w/o faceting) >> > > Please see the MBArtists index as discussed above. > > > > 3. What about multiple users ? >> >> A user in real time might be having an large doc size of .1 million. How >> to >> break and analyze which one is better (though it is our task to do). But >> still any kind of break up will help us. Imagine a user inbox. >> >> > You maybe able to stream the documents in a set as in the example in the > wiki. The example streams 500 documents at a time. The wiki paper has an > example of a document that was used. You could copy/paste that to try it > out. > > > 4. JVM tuning and performance result based on Multithreaded environment. >> >> 5. Machine Details (RAM, CPU, and settings from SOLR perspective). >> >> > Default Solr settings with the shipped jetty container. The startup script > used is available when you download Solr 3.3 with RankingAlgorithm. It has > mx set to 2Gb and uses the default collector with parallel collection > enabled for the young generation. The system is a x86_64 Linux (2.6 > kernel), 2 core (2.5Ghz) and uses internal disks for indexing. > > My suggestion would be to download a version of Solr 3.3 with > RankingAlgorithm and give it a try to see if any changes are needed from > your existing setup. > > > Regards, > > - Nagendra Nagarajayya > http://solr-ra.tgels.org > http://rankingalgorithm.tgels.**org <http://rankingalgorithm.tgels.org> > > > Hoping that you are getting my point. We want to benchmark the >> performance. >> If you can involve me in your group, that would be great. >> >> Thanks >> Naveen >> >> >> >> 2011/8/15 Nagendra >> Nagarajayya<nnagarajayya@**transaxtions.com<nnagaraja...@transaxtions.com> >> > >> >> Bill: >>> >>> I did look at Marks performance tests. Looks very interesting. >>> >>> Here is the Apacle Solr 3.3 with RankingAlgorithm NRT performance: >>> http://solr-ra.tgels.com/wiki/****en/Near_Real_Time_Search_**ver_**3.x<http://solr-ra.tgels.com/wiki/**en/Near_Real_Time_Search_ver_**3.x> >>> <http://solr-ra.**tgels.com/wiki/en/Near_Real_**Time_Search_ver_3.x<http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x> >>> > >>> >>> >>> >>> Regards >>> >>> - Nagendra Nagarajayya >>> http://solr-ra.tgels.org >>> http://rankingalgorithm.tgels.****org<http://rankingalgorithm.** >>> tgels.org <http://rankingalgorithm.tgels.org>> >>> >>> >>> >>> >>> On 8/14/2011 7:47 PM, Bill Bell wrote: >>> >>> I understand. >>>> >>>> Have you looked at Mark's patch? From his performance tests, it looks >>>> pretty good. >>>> >>>> When would RA work better? >>>> >>>> Bill >>>> >>>> >>>> On 8/14/11 8:40 PM, "Nagendra Nagarajayya"<nnagarajayya@** >>>> transaxtions.com<nnagarajayya@**transaxtions.com<nnagaraja...@transaxtions.com> >>>> >> >>>> wrote: >>>> >>>> Bill: >>>> >>>>> The technical details of the NRT implementation in Apache Solr with >>>>> RankingAlgorithm (SOLR-RA) is available here: >>>>> >>>>> http://solr-ra.tgels.com/****papers/NRT_Solr_****RankingAlgorithm.pdf<http://solr-ra.tgels.com/**papers/NRT_Solr_**RankingAlgorithm.pdf> >>>>> <http://**solr-ra.tgels.com/papers/NRT_**Solr_RankingAlgorithm.pdf<http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf> >>>>> > >>>>> >>>>> >>>>> (Some changes for Solr 3.x, but for most it is as above) >>>>> >>>>> Regarding support for 4.0 trunk, should happen sometime soon. >>>>> >>>>> Regards >>>>> >>>>> - Nagendra Nagarajayya >>>>> http://solr-ra.tgels.org >>>>> http://rankingalgorithm.tgels.****org<http://rankingalgorithm.** >>>>> tgels.org <http://rankingalgorithm.tgels.org>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 8/14/2011 7:11 PM, Bill Bell wrote: >>>>> >>>>> OK, >>>>>> >>>>>> I'll ask the elephant in the roomÅ . >>>>>> >>>>>> What is the difference between the new UpdateHandler from Mark and the >>>>>> SOLR-RA? >>>>>> >>>>>> The UpdateHandler works with 4.0 does SOLR-RA work with 4.0 trunk? >>>>>> >>>>>> Pros/Cons? >>>>>> >>>>>> >>>>>> On 8/14/11 8:10 PM, "Nagendra >>>>>> Nagarajayya"<nnagarajayya@**tr**ansaxtions.com<http://transaxtions.com> >>>>>> <nnagarajayya@**transaxtions.com <nnagaraja...@transaxtions.com>> >>>>>> wrote: >>>>>> >>>>>> Naveen: >>>>>> >>>>>>> NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for >>>>>>> a >>>>>>> document to become searchable. Any document that you add through >>>>>>> update >>>>>>> becomes immediately searchable. So no need to commit from within >>>>>>> your >>>>>>> update client code. Since there is no commit, the cache does not >>>>>>> have >>>>>>> to be cleared or the old searchers closed or new searchers opened, >>>>>>> and >>>>>>> warmed (error that you are facing). >>>>>>> >>>>>>> Regards >>>>>>> >>>>>>> - Nagendra Nagarajayya >>>>>>> http://solr-ra.tgels.org >>>>>>> http://rankingalgorithm.tgels.****org<http://rankingalgorithm.** >>>>>>> tgels.org <http://rankingalgorithm.tgels.org>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 8/14/2011 10:37 AM, Naveen Gupta wrote: >>>>>>> >>>>>>> Hi Mark/Erick/Nagendra, >>>>>>>> >>>>>>>> I was not very confident about NRT at that point of time, when we >>>>>>>> started >>>>>>>> project almost 1 year ago, definitely i would try NRT and see the >>>>>>>> performance. >>>>>>>> >>>>>>>> The current requirement was working fine till we were using >>>>>>>> commitWithin 10 >>>>>>>> millisecs in the XMLDocument which we were posting to SOLR. >>>>>>>> >>>>>>>> But due to which, we were getting very poor performance (almost 3 >>>>>>>> mins >>>>>>>> for >>>>>>>> 15,000 docs) per user. There are many paraller user committing to >>>>>>>> our >>>>>>>> SOLR. >>>>>>>> >>>>>>>> So we removed the commitWithin, and hence performance was much much >>>>>>>> better. >>>>>>>> >>>>>>>> But then we are getting this maxWarmingSearcher Error, because we >>>>>>>> are >>>>>>>> committing separately as a curl request after once entire doc is >>>>>>>> submitted >>>>>>>> for indexing. >>>>>>>> >>>>>>>> The question here is what is difference between commitWithin and >>>>>>>> commit >>>>>>>> (apart from the fact that commit takes memory and processes and >>>>>>>> additional >>>>>>>> hardware usage) >>>>>>>> >>>>>>>> Why we want it to be visible as soon as possible, since we are >>>>>>>> applying >>>>>>>> many >>>>>>>> business rules on top of the results (older indexes as well as new >>>>>>>> one) >>>>>>>> and >>>>>>>> apply different filters. >>>>>>>> >>>>>>>> upto 5 mins is fine for us. but more than that we need to think then >>>>>>>> other >>>>>>>> optimizations. >>>>>>>> >>>>>>>> We will definitely try NRT. But please tell me other options which >>>>>>>> we >>>>>>>> can >>>>>>>> apply in order to optimize.? >>>>>>>> >>>>>>>> Thanks >>>>>>>> Naveen >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Aug 14, 2011 at 9:42 PM, Erick >>>>>>>> Erickson<erickerickson@gmail.****com<erickerick...@gmail.com>>** >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Ah, thanks, Mark... I must have been looking at the wrong JIRAs. >>>>>>>> >>>>>>>>> Erick >>>>>>>>> >>>>>>>>> On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller< >>>>>>>>> markrmil...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote: >>>>>>>>>> >>>>>>>>>> You either have to go to near real time (NRT), which is under >>>>>>>>>> >>>>>>>>>>> development, but not committed to trunk yet >>>>>>>>>>> >>>>>>>>>>> NRT support is committed to trunk. >>>>>>>>>> >>>>>>>>>> - Mark Miller >>>>>>>>>> lucidimagination.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >>>> >