Re: Migration from Solr 1.2 to Solr 1.4
Hi, I recently ran across the same issues; I'm updating my solr 1.4 up to the last Nightly Build ( to have ZooKeeper functionality ). I've copied the solr_home dir - but with no success. ( The config files were not accepted on the new build - due to version mismatch ). Then I copied only the index data & used a fresh copy of conf/solrconfig.xml ( which I adapted to represent my old solrconfig.xml settings ) - I still could use the old schema. That way, I came out with a working - new version - of solr. But the copied index gave some trouble: 'Format version is not supported'. I guess I have to rebuild the index. For your case: Maybe you could replicate the index by using the Replication handler - build into solconfig.xml You could set your old system as master and your new one as slave & hope the slave gets updated. ( Note: I guess this will not work - as the replication handler, in solrconfig.xml was only introduced in solr 1.4 and is not present in solr 1.3 ) 2011/2/17 Chris Hostetter > > : > if you don't have any custom components, you can probably just use > : > your entire solr home dir as is -- just change the solr.war. (you can't > : > just copy the data dir though, you need to use the same configs) > : > > : > test it out, and note the "Upgrading" notes in the CHANGES.txt for the > : > 1.3, 1.4, and 1.4.1 releases for "gotchas" that you might wnat to watch > : > out for. > > : Thank you for your reply, I've tried to copy the data and configuration > : directory without success : > : SEVERE: Could not start SOLR. Check solr/home property > : java.lang.RuntimeException: org.apache.lucene.index.CorruptIndexException: > : Unknown format version: -10 > > Hmmm... ok, i'm not sure why that would happen. According to the > CAHNGES.txt, Solr 1.2 used Lucene 2.1 and Solr 1.4.1 used 2.9.3 -- so > Solr 1.4 should have been able to read an index created by Solr 1.2. > > You *could* try upgrading first from 1.2 to 1.3, run an optimize command, > and then try upgradin from 1.3 to 1.4 -- but i can't make any assertions > that that will work better, since going straight from 1.2 to 1.4 should > have worked the same way. > > When in doubt: reindex. > > > -Hoss
Re: Separating Index Reader and Writer
Push again. Regards Em wrote: > > Just wanted to push that topic. > > Regards > > > Em wrote: >> >> Hi Peter, >> >> I must jump in this discussion: From a logical point of view what you are >> saying makes only sense if both instances do not run on the same machine >> or at least not on the same drive. >> >> When both run on the same machine and the same drive, the overall used >> memory should be equal plus I do not understand why this setup should >> affect cache warming etc., since the process of rewarming should be the >> same. >> >> Well, my knowledge about the internals is not very deep. But from just a >> logical point of view - to me - the same is happening as if I would do it >> in a single solr-instance. So what is the difference, what do I overlook? >> >> Another thing: While W is committing and writing to the index, is there >> any inconsistency in R or isn't there any, because W is writing a new >> Segment and so for R there isn't anything different until the commit >> finished? >> Are there problems during optimizing an index? >> >> How do you inform R about the finished commit? >> >> Thank you for your explanation, it's a really interesting topic! >> >> Regards, >> Em >> >> Peter Sturge-2 wrote: >>> >>> Hi, >>> >>> We use this scenario in production where we have one write-only Solr >>> instance and 1 read-only, pointing to the same data. >>> We do this so we can optimize caching/etc. for each instance for >>> write/read. The main performance gain is in cache warming and >>> associated parameters. >>> For your Index W, it's worth turning off cache warming altogether, so >>> commits aren't slowed down by warming. >>> >>> Peter >>> >>> >>> On Sun, Feb 6, 2011 at 3:25 PM, Isan Fulia >>> wrote: Hi all, I have setup two indexes one for reading(R) and other for writing(W).Index R refers to the same data dir of W (defined in solrconfig via ). To make sure the R index sees the indexed documents of W , i am firing an empty commit on R. With this , I am getting performance improvement as compared to using the same index for reading and writing . Can anyone help me in knowing why this performance improvement is taking place even though both the indexeses are pointing to the same data directory. -- Thanks & Regards, Isan Fulia. >>> >>> >> >> > > -- View this message in context: http://lucene.472066.n3.nabble.com/Separating-Index-Reader-and-Writer-tp2437666p2516736.html Sent from the Solr - User mailing list archive at Nabble.com.
My Plan to Scale Solr
Dear all, I started to learn how to use Solr three months ago. My experiences are still limited. Now I crawl Web pages with my crawler and send the data to a single Solr server. It runs fine. Since the potential users are large, I decide to scale Solr. After configuring replication, a single index can be replicated to multiple servers. For shards, I think it is also required. I attempt to split the index according to the data categories and priorities. After that, I will use the above replication techniques and get high performance. The following work is not so difficult. I noticed some new terms, such as SolrClould, Katta and ZooKeeper. According to my current understandings, it seems that I can ignore them. Am I right? What benefits can I get if using them? Thanks so much! LB
Re: Replication and newSearcher registerd > poll interval
Hi, Keeping the thread alive, any thought on only doing replication if there is no warming currently going on? Cheers, Dan On Thu, Feb 10, 2011 at 11:09 AM, dan sutton wrote: > Hi, > > If the replication window is too small to allow a new searcher to warm > and close the current searcher before the new one needs to be in > place, then the slaves continuously has a high load, and potentially > an OOM error. we've noticed this in our environment where we have > several facets on large multivalued fields. > > I was wondering what the list though about modifying the replication > process to skip polls (though warning to logs) when there is a > searcher in the process of warming? Else as in our case it brings the > slave to it's knees, workaround was to extend the poll interval, > though not ideal. > > Cheers, > Dan >
fine tuning the solr search
Hi I would love to know how to do this with solr say a user inputs "Account manager files", I wish that solr puts priority on the documents it finds as follows 1) documents containing account manager files gets a greater score 2) then documents with account manager come next 3) then documents with account can come before the other words are used to get documents in the search right now I think it works different as it finds documents with accounts and puts them like in first position or documents with word manager in second position or so thanks -- Mambe Churchill Nanje 237 33011349, AfroVisioN Founder, President,CEO http://www.afrovisiongroup.com | http://mambenanje.blogspot.com skypeID: mambenanje www.twitter.com/mambenanje
Re: Replication and newSearcher registerd > poll interval
If you set maxWarmingSearchers to 1 then you cannot issue an overlapping commit. Slaves won't poll for a new index version while replication is in progress. It works well in my environment where there is a high update/commit frequency, about a thousand documents per minute. The system even behaves well a thousand updates per second and a commit per minute with a poll interval of 2 seconds. On Thursday 17 February 2011 11:54:32 dan sutton wrote: > Hi, > > Keeping the thread alive, any thought on only doing replication if > there is no warming currently going on? > > Cheers, > Dan > > On Thu, Feb 10, 2011 at 11:09 AM, dan sutton wrote: > > Hi, > > > > If the replication window is too small to allow a new searcher to warm > > and close the current searcher before the new one needs to be in > > place, then the slaves continuously has a high load, and potentially > > an OOM error. we've noticed this in our environment where we have > > several facets on large multivalued fields. > > > > I was wondering what the list though about modifying the replication > > process to skip polls (though warning to logs) when there is a > > searcher in the process of warming? Else as in our case it brings the > > slave to it's knees, workaround was to extend the poll interval, > > though not ideal. > > > > Cheers, > > Dan -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: My Plan to Scale Solr
Hi Bing Li, On Thursday 17 February 2011 10:32:11 Bing Li wrote: > Dear all, > > I started to learn how to use Solr three months ago. My experiences are > still limited. > > Now I crawl Web pages with my crawler and send the data to a single Solr > server. It runs fine. > > Since the potential users are large, I decide to scale Solr. After > configuring replication, a single index can be replicated to multiple > servers. > > For shards, I think it is also required. I attempt to split the index > according to the data categories and priorities. After that, I will use the > above replication techniques and get high performance. The following work > is not so difficult. It's better to use a consistent hashing algorithm to decide which server takes which documents if you want good relevancy. Using a modulo with the number of servers will return the shard per document. If you have integers as unique key then just a modulo will suffice. > > I noticed some new terms, such as SolrClould, Katta and ZooKeeper. > According to my current understandings, it seems that I can ignore them. > Am I right? What benefits can I get if using them? SolrCloud comes with ZooKeeper. It's designed to provide a fail-over cluster and more useful features. I haven't tried Katta. > > Thanks so much! > LB -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: My Plan to Scale Solr
Hi, I'm currently looking at SolrCloud. I've managed to set up a scalable cluster with ZooKeeper. ( see the examples in http://wiki.apache.org/solr/SolrCloud for a quick understanding ) This way, all different shards / replicas are stored in a centralised configuration. Moreover the ZooKeeper contains out-of-the-box loadbalancing. So, lets say - you have 2 different shards and each is replicated 2 times. Your zookeeper config will look like this: \config ... /live_nodes (v=6 children=4) lP_Port:7500_solr (ephemeral v=0) lP_Port:7574_solr (ephemeral v=0) lP_Port:8900_solr (ephemeral v=0) lP_Port:8983_solr (ephemeral v=0) /collections (v=20 children=1) collection1 (v=0 children=1) "configName=myconf" shards (v=0 children=2) shard1 (v=0 children=3) lP_Port:8983_solr_ (v=4) "node_name=lP_Port:8983_solr url=http://lP_Port:8983/solr/"; lP_Port:7574_solr_ (v=1) "node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/"; lP_Port:8900_solr_ (v=1) "node_name=lP_Port:8900_solr url=http://lP_Port:8900/solr/"; shard2 (v=0 children=2) lP_Port:7500_solr_ (v=0) "node_name=lP_Port:7500_solr url=http://lP_Port:7500/solr/"; lP_Port:7574_solr_ (v=1) "node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/"; --> This setup can be realised, by 1 ZooKeeper module - the other solr machines need just to know the IP_Port were the zookeeper is active & that's it. --> So no configuration / installing is needed to realise quick a scalable / load balanced cluster. Disclaimer: ZooKeeper is a relative new feature - I'm not sure if it will work out in a real production environment, which has a tight SLA pending. But - definitely keep your eyes on this stuff - this will mature quickly! Stijn Vanhoorelbeke
Re: Validate Query Syntax of Solr Request Before Sending
Uh, how about the LuceneQParser? It does some checks and can return appropriate error messages. On Thursday 17 February 2011 06:44:16 csj wrote: > Hi, > > I wonder if it is possible to let the user build up a Solr Query and have > it validated by some java API before sending it to Solr. > > Is there a parser that could help with that? I would like to help the user > building a valid query as she types by showing messages like "The query is > not valid" or purhaps even more advanced: "The parentheses are not > balanced". > > Maybe one day it would also be possible to analyse the semantics of the > query like: "This query has a build-in inconsistency because the two dates > you have specified requires documents to be before AND after these date". > But this is far future... > > Regards, > > Christian Sonne Jensen -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
is solr dynamic calculation??
Hi All, I have a query whether the solr shows the results of documents by calculating the score on dynamic or is it pre calculating and supplying??.. for example: if a query is made on q=solr in my index... i get a results of 25 documents... what is it calculating?? i am very keen to know its way of calculation of score and ordering of results Regards, satya
Re: fine tuning the solr search
Have a read: http://lucene.apache.org/java/2_9_1/scoring.html On Thursday 17 February 2011 12:50:08 Churchill Nanje Mambe wrote: > Hi > I would love to know how to do this with solr > say a user inputs "Account manager files", > I wish that solr puts priority on the documents it finds as follows > 1) documents containing account manager files gets a greater score > 2) then documents with account manager come next > 3) then documents with account can come > before the other words are used to get documents in the search > > right now I think it works different as it finds documents with > accounts and puts them like in first position or documents with word > manager in second position or so > > thanks -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: is solr dynamic calculation??
Both, you should also read about scoring. http://lucene.apache.org/java/2_9_1/scoring.html On Thursday 17 February 2011 13:39:05 satya swaroop wrote: > Hi All, > I have a query whether the solr shows the results of documents by > calculating the score on dynamic or is it pre calculating and supplying??.. > > for example: > if a query is made on q=solr in my index... i get a results of 25 > documents... what is it calculating?? i am very keen to know its way of > calculation of score and ordering of results > > > Regards, > satya -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: How to use XML parser in DIH for a database?
If your using a DIH for SQL server you can configure it however you want. Here is a snippet of my code. Note the Driver you need to grab from sourcenet. > >driver="oracle.jdbc.driver.OracleDriver" > url="jdbc:oracle:thin:@localhost:1521:xe" > user="user" > password="password" > name="ds"/> > > > transformer="ClobTransformer"> > > clob="true"/> > processor="XPathEntityProcessor" forEach="/suppliers/supplier" > dataField="clobxml.SUPPLIER_APPROVALS" onError="continue" > > /> > > > > > > > > - > Thanx: > Grijesh > http://lucidimagination.com > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-use-XML-parser-in-DIH-for-a-database-tp2508015p2515910.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to use XML parser in DIH for a database?
I was also gonna say why even worry about using XPath when you can write a SQL query to get your data out. Thats what i did and it seems much simpler and cuts out a step. Adam Sent from my iPhone On Feb 16, 2011, at 10:21 PM, Bill Bell wrote: > Does anyone have an example of using this with SQL Server varchar or XML > field? > > ?? > > > > > > forEach="/the/record/xpath" url="${y.xml_name}"> > > > > > > > > > On 2/16/11 2:17 AM, "Stefan Matheis" wrote: > >> What about using >> http://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessor ? >> >> On Wed, Feb 16, 2011 at 10:08 AM, Bill Bell wrote: >>> I am using DIH. >>> >>> I am trying to take a column in a SQL Server database that returns an >>> XML >>> string and use Xpath to get data out of it. >>> >>> I noticed that Xpath works with external files, how do I get it to work >>> with >>> a database? >>> >>> I need something like "//insur[5][@name='Blue Cross']" >>> >>> Thanks. >>> >>> >>> > >
Building queries for SolR
Hi, I'm porting/upgrading a project from Lucene to Solr. In Lucene, I was using the user-provided Lucene query string, and I did complete it to implement access restriction, based on fields saved in the index: Query userQ=... // query from user Query restrictQ=.. // implement restrictions Query finalQ=new BooleanQuery(); finalQ.add(userQ,BooleanClause.Occur.MUST); finalQ.add(restrictQ,BooleanClause.Occur.MUST); Searching on 'finalQ' gives only the results which can be shown to the user. Since Solr doesn't have programmatically build queries, how can I do something equivalent? Do we always have to build strings to make a query in SolR? Is there really no equivalent to the Lucene API to build a query, using BooleanQuery, TermQuery,... ? Thanks for your help!
Re: Building queries for SolR
Vincent, Look at Solr's fq (filter query) capability. You'll likely want to put your restricting query in an fq parameter from your search client. If your restricting query is a simple TermQuery, have a look at the various built-in query parsers in Solr. On trunk you can do this: &fq={!term f=restriction_field}value, or in other versions look at the raw or field query parsers. Or you can use Lucene query syntax in an fq parameter and make more sophisticated expressions (at the risk of query parsing exceptions, of course). Erik On Feb 17, 2011, at 09:09 , Vincent Cautaerts wrote: > Hi, > > I'm porting/upgrading a project from Lucene to Solr. > > In Lucene, I was using the user-provided Lucene query string, and I did > complete it to implement access restriction, based on fields saved in the > index: > > Query userQ=... // query from user > Query restrictQ=.. // implement restrictions > Query finalQ=new BooleanQuery(); > finalQ.add(userQ,BooleanClause.Occur.MUST); > finalQ.add(restrictQ,BooleanClause.Occur.MUST); > > Searching on 'finalQ' gives only the results which can be shown to the user. > > > Since Solr doesn't have programmatically build queries, how can I do > something equivalent? > > Do we always have to build strings to make a query in SolR? Is there really > no equivalent to the Lucene API to build a query, using BooleanQuery, > TermQuery,... ? > > Thanks for your help!
Re: last item in results page is always the same
Thanks, going to update now. This is a system that is currently deployed. Should I just update to 1.4.1, or should I go straight to 3.0? Does 1.4 => 3.0 require reindexing? On Wed, Feb 16, 2011 at 5:37 PM, Yonik Seeley wrote: > On Wed, Feb 16, 2011 at 5:08 PM, Paul wrote: >> Is this a known solr bug or is there something subtle going on? > > Yes, I think it's the following bug, fixed in 1.4.1: > > * SOLR-1777: fieldTypes with sortMissingLast=true or sortMissingFirst=true can > result in incorrectly sorted results. > > -Yonik > http://lucidimagination.com >
Re: last item in results page is always the same
Its fixed in 1.4.1. https://issues.apache.org/jira/browse/SOLR-1777 On Thursday 17 February 2011 16:04:18 Paul wrote: > Thanks, going to update now. This is a system that is currently > deployed. Should I just update to 1.4.1, or should I go straight to > 3.0? Does 1.4 => 3.0 require reindexing? > > On Wed, Feb 16, 2011 at 5:37 PM, Yonik Seeley > > wrote: > > On Wed, Feb 16, 2011 at 5:08 PM, Paul wrote: > >> Is this a known solr bug or is there something subtle going on? > > > > Yes, I think it's the following bug, fixed in 1.4.1: > > > > * SOLR-1777: fieldTypes with sortMissingLast=true or > > sortMissingFirst=true can result in incorrectly sorted results. > > > > -Yonik > > http://lucidimagination.com -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: last item in results page is always the same
Paul - go with 1.4.1 in this case. Keep tabs on the upcoming 3.1 release (of both Lucene and Solr) and consider that in a month or so. Erik On Feb 17, 2011, at 10:04 , Paul wrote: > Thanks, going to update now. This is a system that is currently > deployed. Should I just update to 1.4.1, or should I go straight to > 3.0? Does 1.4 => 3.0 require reindexing? > > On Wed, Feb 16, 2011 at 5:37 PM, Yonik Seeley > wrote: >> On Wed, Feb 16, 2011 at 5:08 PM, Paul wrote: >>> Is this a known solr bug or is there something subtle going on? >> >> Yes, I think it's the following bug, fixed in 1.4.1: >> >> * SOLR-1777: fieldTypes with sortMissingLast=true or sortMissingFirst=true >> can >> result in incorrectly sorted results. >> >> -Yonik >> http://lucidimagination.com >>
Re: last item in results page is always the same
On Thu, Feb 17, 2011 at 10:04 AM, Paul wrote: > Thanks, going to update now. This is a system that is currently > deployed. Should I just update to 1.4.1, or should I go straight to > 3.0? Does 1.4 => 3.0 require reindexing? There is no 3.0 - that release happened before the Lucene/Solr merge, hence there is no corresponding Solr version. We're working on the 3.1 release now (hopefully out within a month). -Yonik http://lucidimagination.com
RE: Solr multi cores or not
We have 3 applications and they need to have different relevancy models, synonyms, stop words etc. App A - content size - 20 GB - MySQL and Drupal based app App B - # of documents ~ 400K; index size ~ 25 GB - primarily a portal with links to different applications, data sources include crawl pages and db sources App C - PeopleSoft based application - underlying Oracle DB ~ content size ~ 10 GB App A - approx 60k hits/week App B - approx 1 million hits/week App C - approx 250k hits/wk Frequency of updates App A - near real time indexing - every 20 minutes App B - every 2 hours App C - daily All applications need personalization based on appl specific biz rules. Yes, we must enforce security and Clients are in our control Reason, our server (Virtual Machine) was configured that way - is when we first installed - we were told to throw lot of memory for Solr. App A runs on our production server, it hardly does anything to the server - our CPUs are less than 4% and our memory is hardly troubled. Our business need now is that all the three apps wants to use Solr for their search needs and with the ability to share indexes. I need to not only separate indexes, but also selectively query across the applications. -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Wednesday, February 16, 2011 6:25 PM To: solr-user@lucene.apache.org Cc: Thumuluri, Sai Subject: Re: Solr multi cores or not Hi, That depends (as usual) on your scenario. Let me ask some questions: 1. what is the sum of documents for your applications? 2. what is the expected load in queries/minute 3. what is the update frequency in documents/minute and how many documents per commit? 4. how many different applications do you have? 5. are the query demands for the business the same (or very similar) for all applications? 6. can you easily upgrade hardware or demand more machines? 7. must you enforce security between applications and are the clients not under your control? I'm puzzled though, you have so much memory but so little CPU. What about the disks? Size? Spinning or SSD? Cheers, > Hi, > > I have a need to index multiple applications using Solr, I also have the > need to share indexes or run a search query across these application > indexes. Is solr multi-core - the way to go? My server config is > 2virtual CPUs @ 1.8 GHz and has about 32GB of memory. What is the > recommendation? > > Thanks, > Sai Thumuluri
Re: My Plan to Scale Solr
What's an 'LSA' Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. From: Stijn Vanhoorelbeke To: solr-user@lucene.apache.org; bing...@asu.edu Sent: Thu, February 17, 2011 4:28:13 AM Subject: Re: My Plan to Scale Solr Hi, I'm currently looking at SolrCloud. I've managed to set up a scalable cluster with ZooKeeper. ( see the examples in http://wiki.apache.org/solr/SolrCloud for a quick understanding ) This way, all different shards / replicas are stored in a centralised configuration. Moreover the ZooKeeper contains out-of-the-box loadbalancing. So, lets say - you have 2 different shards and each is replicated 2 times. Your zookeeper config will look like this: \config ... /live_nodes (v=6 children=4) lP_Port:7500_solr (ephemeral v=0) lP_Port:7574_solr (ephemeral v=0) lP_Port:8900_solr (ephemeral v=0) lP_Port:8983_solr (ephemeral v=0) /collections (v=20 children=1) collection1 (v=0 children=1) "configName=myconf" shards (v=0 children=2) shard1 (v=0 children=3) lP_Port:8983_solr_ (v=4) "node_name=lP_Port:8983_solr url=http://lP_Port:8983/solr/"; lP_Port:7574_solr_ (v=1) "node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/"; lP_Port:8900_solr_ (v=1) "node_name=lP_Port:8900_solr url=http://lP_Port:8900/solr/"; shard2 (v=0 children=2) lP_Port:7500_solr_ (v=0) "node_name=lP_Port:7500_solr url=http://lP_Port:7500/solr/"; lP_Port:7574_solr_ (v=1) "node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/"; --> This setup can be realised, by 1 ZooKeeper module - the other solr machines need just to know the IP_Port were the zookeeper is active & that's it. --> So no configuration / installing is needed to realise quick a scalable / load balanced cluster. Disclaimer: ZooKeeper is a relative new feature - I'm not sure if it will work out in a real production environment, which has a tight SLA pending. But - definitely keep your eyes on this stuff - this will mature quickly! Stijn Vanhoorelbeke
Re: My Plan to Scale Solr
http://lmgtfy.com/?q=SLA wunder On Feb 17, 2011, at 11:04 AM, Dennis Gearon wrote: > What's an 'LSA' > > Dennis Gearon > > > Signature Warning > > It is always a good idea to learn from your own mistakes. It is usually a > better > idea to learn from others’ mistakes, so you do not have to make them > yourself. > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > EARTH has a Right To Life, > otherwise we all die. > > > > > > From: Stijn Vanhoorelbeke > To: solr-user@lucene.apache.org; bing...@asu.edu > Sent: Thu, February 17, 2011 4:28:13 AM > Subject: Re: My Plan to Scale Solr > > Hi, > > I'm currently looking at SolrCloud. I've managed to set up a scalable > cluster with ZooKeeper. > ( see the examples in http://wiki.apache.org/solr/SolrCloud for a quick > understanding ) > This way, all different shards / replicas are stored in a centralised > configuration. > > Moreover the ZooKeeper contains out-of-the-box loadbalancing. > So, lets say - you have 2 different shards and each is replicated 2 times. > Your zookeeper config will look like this: > > \config > ... > /live_nodes (v=6 children=4) > lP_Port:7500_solr (ephemeral v=0) > lP_Port:7574_solr (ephemeral v=0) > lP_Port:8900_solr (ephemeral v=0) > lP_Port:8983_solr (ephemeral v=0) > /collections (v=20 children=1) > collection1 (v=0 children=1) "configName=myconf" > shards (v=0 children=2) >shard1 (v=0 children=3) > lP_Port:8983_solr_ (v=4) > "node_name=lP_Port:8983_solr url=http://lP_Port:8983/solr/"; > lP_Port:7574_solr_ (v=1) > "node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/"; > lP_Port:8900_solr_ (v=1) > "node_name=lP_Port:8900_solr url=http://lP_Port:8900/solr/"; >shard2 (v=0 children=2) > lP_Port:7500_solr_ (v=0) > "node_name=lP_Port:7500_solr url=http://lP_Port:7500/solr/"; > lP_Port:7574_solr_ (v=1) > "node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/"; > > --> This setup can be realised, by 1 ZooKeeper module - the other solr > machines need just to know the IP_Port were the zookeeper is active & that's > it. > --> So no configuration / installing is needed to realise quick a scalable / > load balanced cluster. > > Disclaimer: > ZooKeeper is a relative new feature - I'm not sure if it will work out in a > real production environment, which has a tight SLA pending. > But - definitely keep your eyes on this stuff - this will mature quickly! > > Stijn Vanhoorelbeke -- Walter Underwood Venture ASM, Troop 14, Palo Alto
GET or POST for large queries?
We are running into some issues with large queries. Initially, they were ostensibly header buffer overruns, because increasing Jetty's headerBufferSize value to 65536 resolved them. This seems like a kludge, but it does solve the problem for 95% of our users. However, we do have queries that are physically larger than that and for which increasing the headerBufferSize to 65536 does not work. This is due to security requirements: Security descriptors are baked into the index, and then potentially thousands of them (depending on the user context) are passed in with each query. These excessive queries are only a problem with approximately 5% of users who are highly entitled, but the number of security descriptors in are likely to increase and we won't have a workaround for this security policy any time soon. After a lot of Googling, it seems to me that it's common to increase the headerBufferSize, but I don't see any other strategies. Is it possible/feasible to switch to use POST for querying? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2521700.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question About Highlighting
> I had a requirement to implement phrase proximity like ["a > b c" w/5 "d e f"] for > this i have implemented a custom query parser plug in which > I make use of nested > span queries to fulfill this requirement. Now it looks that > documents are > filtered correctly, but there is an issue in highlighting > that also highlights > the terms that are alone(not in phrase), can some body > suggest me a fix to this > issue > Appending &hl.usePhraseHighlighter=true should work.
Re: GET or POST for large queries?
Yes, you may use POST to make search requests to Solr. Erik On Feb 17, 2011, at 14:27 , mrw wrote: > > We are running into some issues with large queries. Initially, they were > ostensibly header buffer overruns, because increasing Jetty's > headerBufferSize value to 65536 resolved them. This seems like a kludge, but > it does solve the problem for 95% of our users. > > However, we do have queries that are physically larger than that and for > which increasing the headerBufferSize to 65536 does not work. This is due > to security requirements: Security descriptors are baked into the index, > and then potentially thousands of them (depending on the user context) are > passed in with each query. These excessive queries are only a problem with > approximately 5% of users who are highly entitled, but the number of > security descriptors in are likely to increase and we won't have a > workaround for this security policy any time soon. > > After a lot of Googling, it seems to me that it's common to increase the > headerBufferSize, but I don't see any other strategies. Is it > possible/feasible to switch to use POST for querying? > > Thanks! > -- > View this message in context: > http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2521700.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: GET or POST for large queries?
Yeah, I tried switching to POST. It seems to be handling the size, but apparently Solr has a limit on the number of boolean comparisons -- I'm now getting "too many boolean clauses" errors emanating from org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:108). :) Thanks for responding. Erik Hatcher-4 wrote: > > Yes, you may use POST to make search requests to Solr. > > Erik > > -- View this message in context: http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2522293.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: GET or POST for large queries?
Yes, I think it's 1024 by default. I think you can raise it in your config. But your performance may suffer. Best would be to try and find a better way to do what you want without using thousands of clauses. This might require some custom Java plugins to Solr though. On 2/17/2011 3:52 PM, mrw wrote: Yeah, I tried switching to POST. It seems to be handling the size, but apparently Solr has a limit on the number of boolean comparisons -- I'm now getting "too many boolean clauses" errors emanating from org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:108). :) Thanks for responding. Erik Hatcher-4 wrote: Yes, you may use POST to make search requests to Solr. Erik
Re: GET or POST for large queries?
Probably you could do it, and solving a problem in business supersedes 'rightness' concerns, much to the dismay of geeks and 'those who like rightness and say the word "Neemph!" '. the not rightness about this is that: POST, PUT, DELETE are assumed to make changes to the URL's backend. GET is assumed NOT to make changes. So if your POST does not make a change . . . it breaks convention. But if it solves the problem . . . :-) Another way would be to GET with a 'query file' location, and then have the server fetch that query and execute it. Boy!!! I'd love to see one of your queries!!! You must have a few ANDs/ORs in them :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. From: mrw To: solr-user@lucene.apache.org Sent: Thu, February 17, 2011 11:27:06 AM Subject: GET or POST for large queries? We are running into some issues with large queries. Initially, they were ostensibly header buffer overruns, because increasing Jetty's headerBufferSize value to 65536 resolved them. This seems like a kludge, but it does solve the problem for 95% of our users. However, we do have queries that are physically larger than that and for which increasing the headerBufferSize to 65536 does not work. This is due to security requirements: Security descriptors are baked into the index, and then potentially thousands of them (depending on the user context) are passed in with each query. These excessive queries are only a problem with approximately 5% of users who are highly entitled, but the number of security descriptors in are likely to increase and we won't have a workaround for this security policy any time soon. After a lot of Googling, it seems to me that it's common to increase the headerBufferSize, but I don't see any other strategies. Is it possible/feasible to switch to use POST for querying? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2521700.html Sent from the Solr - User mailing list archive at Nabble.com.
solr.KeepWordsFilterFactory confusion
I have a solr index where certain facet fields should only contain one or more items from a limited list of values. To enforce this restriction at index time I have been looking at using a KeepWordFilterFactory. It seems it ought to work as I have it implamented, and actually seems to work when tested through the admin analysis page, but when I index a document with that filter in place values that ought to be filtered out aren't. (I am running the solr 1.4 release) I've added a new field type in schema.xml: sortMissingLast="true" omitNorms="true"> words="format_facet.txt" ignoreCase="false" /> placed a file format_facet.txt in the conf directory containing: Book Online Microform Journal/Magazine Musical Score Musical Recording Thesis/Dissertation Video Streaming Video Software/Multimedia Photographs Cassette referenced this new field type with a field declaration in schema.xml stored="true" multiValued="true" /> also have this dynamic field, but this seems irrelevant: stored="true" multiValued="true" omitNorms="true" /> restarted the jetty server running the solr server. and submitted a solr add document containing format_facet=format_facet(1.0)={[Video, Streaming Video, Online, Gooberhead, Book of the Month]} Of these values only Video, Streaming Video and Online ought to end up in the index, however all five values end up as format_facet values for the solr item in question. Video Streaming Video Online Gooberhead Book of the Month I must be missing something fairly basic, since this doesn't seem especially complicated. Thanks in advance for any assistance, -Bob Haschart
Re: solr.KeepWordsFilterFactory confusion
> I've added a new field type in schema.xml: > > class="solr.StrField" sortMissingLast="true" > omitNorms="true"> > > class="solr.KeywordTokenizerFactory"/> > class="solr.KeepWordFilterFactory" words="format_facet.txt" > ignoreCase="false" /> > > > class="solr.StrField" should be class="solr.TextField"
Re: SolrCloud - Example C not working
FYI, this should be fixed in the (very) latest trunk. -Yonik http://lucidimagination.com
Date Math
The SolrQuerySyntax Wiki page refers to DateMathParser for examples. When I tried "-1DAY", I got: org.apache.lucene.queryParser.ParseException: Cannot parse 'last_modified:-DAY': Encountered " "-" "- "" at line 1, column 14. Was expecting one of: "(" ... "*" ... ... ... ... ... "[" ... "{" ... ... Are they not supported as a short-cut for "NOW-1DAY"? I'm using Solr 1.4.
Index Design Question
We are indexing documents with several associated fields for search and display, some of which may change with a much higher frequency than the document content. As per my understanding, we have to resubmit the entire gamut of fields with every update. If the reindexing of the documents becomes a performance bottleneck, what choices of design alternatives are there within Solr? Thanks in advance for your contributions.
Re: TermVector query using Solr Tutorial
: I am searching the keyword 25, in the field : : 30" TFT active matrix LCD, 2560 x 1600, .25mm : dot pitch, 700:1 contrast : : I want to know the character position of matched keyword in the : corresponding field. : : usb or cabl is not what I want. your search is getting a match on the features field, but the termvectors being returned are from the "includes" field, which you can see based on the output you mentioned in your previous message... > > - > > 3007WFP > - > > - > > 1 ... ...by the looks of things, the "includes" field is the only field with termVectors enabled in your schema.xml (which is consistent with the trunk, 3x, and solr 1.4 example schemas. if you want termVectors for hte "features" field, you need to specify termVectors="true" on the "features" field. -Hoss
Re: Index Design Question
Some options to reduce performance implications are: replication... index your documents in one solr instance, and query in a different one. that way the users of the query side will not be as adversely impacted by frequent changes. You have better control over when change occurs. separate search from display...one mistake I see a lot is people putting everything into Solr. Solr is optimized for search, therefore it sometimes makes sense to only put into a solr index those fields you are searching against. In some architectures this leaves a large amount of data that can be stored somewhere else, an RDBMS, a file system, a third party host, whatever. You search on Solr, and use some identifier to get the rest of the data from somewhere else. That way, only changes to searchable fields need to be indexed, the rest just need to be stored. It could minimize the impact on your Solr documents. Multi-threading...usually any performance bottleneck is on the sending side, not the Solr side. Solr handles multiple data inputs gracefully. Be very aware of how many commits you are doing, and what kind of warming queries you have in place. Those are the biggest performance issues from what I've seen. Having 2 Solr instances, one optimized for Indexing (the master) and one optimized for querying (the slave) with replication, would help minimize the problem. -- View this message in context: http://lucene.472066.n3.nabble.com/Index-Design-Question-tp2523811p2524424.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is solr dynamic calculation??
Hi Markus, As far i gone through the scoring of solr. The scoring is done during searching on the use of boost values which were given during the indexing. I have a query now if i search for a keyword java then 1)if for a term named "java" in index contain 50,000 documents then do solr calculate the score value for each and every document and filter them and then sort it and server results??? if it does the dynamic calculation for each and every document then it takes a long time, but how can solr reduced it?? Am i right??? or if any wrong please tell me??? Regards, satya
Re: My Plan to Scale Solr
Or even better, search with 'LSA'. On Thu, Feb 17, 2011 at 9:22 AM, Walter Underwood wrote: > http://lmgtfy.com/?q=SLA > > wunder > > On Feb 17, 2011, at 11:04 AM, Dennis Gearon wrote: > >> What's an 'LSA' >> >> Dennis Gearon >> >> >> Signature Warning >> >> It is always a good idea to learn from your own mistakes. It is usually a >> better >> idea to learn from others’ mistakes, so you do not have to make them >> yourself. >> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' >> >> >> EARTH has a Right To Life, >> otherwise we all die. >> >> >> >> >> >> From: Stijn Vanhoorelbeke >> To: solr-user@lucene.apache.org; bing...@asu.edu >> Sent: Thu, February 17, 2011 4:28:13 AM >> Subject: Re: My Plan to Scale Solr >> >> Hi, >> >> I'm currently looking at SolrCloud. I've managed to set up a scalable >> cluster with ZooKeeper. >> ( see the examples in http://wiki.apache.org/solr/SolrCloud for a quick >> understanding ) >> This way, all different shards / replicas are stored in a centralised >> configuration. >> >> Moreover the ZooKeeper contains out-of-the-box loadbalancing. >> So, lets say - you have 2 different shards and each is replicated 2 times. >> Your zookeeper config will look like this: >> >> \config >> ... >> /live_nodes (v=6 children=4) >> lP_Port:7500_solr (ephemeral v=0) >> lP_Port:7574_solr (ephemeral v=0) >> lP_Port:8900_solr (ephemeral v=0) >> lP_Port:8983_solr (ephemeral v=0) >> /collections (v=20 children=1) >> collection1 (v=0 children=1) "configName=myconf" >> shards (v=0 children=2) >> shard1 (v=0 children=3) >> lP_Port:8983_solr_ (v=4) >> "node_name=lP_Port:8983_solr url=http://lP_Port:8983/solr/"; >> lP_Port:7574_solr_ (v=1) >> "node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/"; >> lP_Port:8900_solr_ (v=1) >> "node_name=lP_Port:8900_solr url=http://lP_Port:8900/solr/"; >> shard2 (v=0 children=2) >> lP_Port:7500_solr_ (v=0) >> "node_name=lP_Port:7500_solr url=http://lP_Port:7500/solr/"; >> lP_Port:7574_solr_ (v=1) >> "node_name=lP_Port:7574_solr url=http://lP_Port:7574/solr/"; >> >> --> This setup can be realised, by 1 ZooKeeper module - the other solr >> machines need just to know the IP_Port were the zookeeper is active & that's >> it. >> --> So no configuration / installing is needed to realise quick a scalable / >> load balanced cluster. >> >> Disclaimer: >> ZooKeeper is a relative new feature - I'm not sure if it will work out in a >> real production environment, which has a tight SLA pending. >> But - definitely keep your eyes on this stuff - this will mature quickly! >> >> Stijn Vanhoorelbeke > > -- > Walter Underwood > Venture ASM, Troop 14, Palo Alto > > > > -- Lance Norskog goks...@gmail.com
Re: My Plan to Scale Solr
its just a joke? - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/My-Plan-to-Scale-Solr-tp2516904p2524700.html Sent from the Solr - User mailing list archive at Nabble.com.