Rolling Deploys and SolrCloud
Does anybody have any experience with "rolling deployments" and SolrCloud? We have a production environment where we deploy new software and config simultaneously to individual servers in a rolling manner. At any point during the deployment, there may be N boxes with old software/config and M boxes with new software/config. Eventually N=0 (no more old software/config) and M=100% (all boxes have new software/config). This is very convenient because one knows that the config that the new software requires is present, but it is not present for old software on other boxes. We can maintain 100% uptime for the service using this technique. If I'm understanding the role that Zk plays in SolrCloud, this no longer works. If config lives in Zk, then it's all or nothing, all old or all new config. If this is true, it presents a bunch of new challenges for deploying software. So to ask a concrete question, is it possible to not use zk for config distribution, i.e. keep the config local to each shard? Mike Schultz -- View this message in context: http://lucene.472066.n3.nabble.com/Rolling-Deploys-and-SolrCloud-tp4026212.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Rolling Deploys and SolrCloud
Ok, that makes sense and it's probably workable, but, it's still more awkward than having code and configuration deployed together to individual machines. For example, for a deploy of new software/config we need to 1) first upload config to zK. then 2) deploy new software to the nodes. What about the span of time between 1) and 2)? If a box bounces during this time it will come up with the wrong config. Or what if 2) goes awry and some boxes succeed and some fail? It could be very complicated to recover from that. Another use case is, I may want to push a new software/config to a single box for a smoke test before rolling to all nodes in production (some might call this testing in production but it's just real world safety). I guess at the end of the day, what I don't understand is, given that I need to roll new software bits to individual nodes for the deployment of new software, what good does keeping config in zk do for me? Why not just keep the config with the software and roll it at the same time? -- View this message in context: http://lucene.472066.n3.nabble.com/Rolling-Deploys-and-SolrCloud-tp4026212p4026419.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 4.0 doesn't send qt parameter to shards
Can someone explain the logic of not sending the qt parameter down to the shards? I see from here that qt is handled as a special case for ResultGrouping: http://lucidworks.lucidimagination.com/display/solr/Result+Grouping where there is a special shard.qt parameter. in 3.x solrconfig.xml supports defining a list of SearchComponents on handler by handler basis. This flexibility goes away if qt isn't passed down or am I missing something? I'm using: http://localhost:8983/solr/select) and modify query processing by varying only the query parameters. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-doesn-t-send-qt-parameter-to-shards-tp4034653.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: n values in one fieldType
It depends on what kind of behavior you're looking for. if for your queries the order of the 6 integer values doesn't matter you could do: then you could query with ORed or ANDed integer values over that field. If the order matters but you always query on the "set" of 6 values, then you turn your six integers into a guid or simply HEX encode them into singleValued string field. Another possibility is to HEX encode the integers and separate them with whitespace and whitespace tokenize. Then you get a mixture of the two above but you can also specify some locality constraints, eg. using phrase queries, etc. The answer really depends on the types of queries you need to be able to respond to. M -- View this message in context: http://lucene.472066.n3.nabble.com/n-values-in-one-fieldType-tp4034552p4034662.html Sent from the Solr - User mailing list archive at Nabble.com.
Adding replacement node to 4.x cluster with down node
I have a 4.x cluster that is 10 wide and 4 deep. One of the nodes of a shard went down. I provisioned a replacement node and introduced into the cluster, but it ended up on a random shard, not the shard with the downed node. Is there a maintenance step I need to perform before introducing a node? I assume if I had somehow removed the downed node from the zookeeper database so that I no longer saw the grayed-out representation of it in the cloud display, that the new node would have ended up in the right place... -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-replacement-node-to-4-x-cluster-with-down-node-tp4038469.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Anyone else see this error when running unit tests?
Yes. Just today actually. I had some unit test based on AbstractSolrTestCase which worked in 4.0 but in 4.1 they would fail intermittently with that error message. The key to this behavior is found by looking at the code in the lucene class: TestRuleSetupAndRestoreClassEnv. I don't understand it completely but there are a number of random code paths through there. The following helped me get around the problem, at least in the short term. @org.apache.lucene.util.LuceneTestCase.SuppressCodecs({"Lucene3x","Lucene40"}) public class CoreLevelTest extends AbstractSolrTestCase { I also need to call this inside my setUp() method, in 4.0 this wasn't required. initCore("solrconfig.xml", "schema.xml", "/tmp/my-solr-home"); -- View this message in context: http://lucene.472066.n3.nabble.com/Anyone-else-see-this-error-when-running-unit-tests-tp4015034p4038472.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding replacement node to 4.x cluster with down node
Just to clarify, I want to be able to replace the down node with a host with a different name. If I were repairing that particular machine and replacing it, there would be no problem. But I don't have control over the name of my replacement machine. -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-replacement-node-to-4-x-cluster-with-down-node-tp4038469p4038612.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr4x: Separate Indexer and Query Instances for Performance
Solr 3x had a master/slave architecture which meant that indexing did not happen in the same process as querying, in fact normally not even on the same machine. The querier only needed to copy down snapshots of the new index files and commit them. Great isolation for maximum query performance and indexing performance. Now in Solr4x this is gone. Does anyone have any answer or tuning approaches to address this? We have a high query load, high indexing load environment. I see TP99 query latency go from under 100mS to 4-10 seconds during indexing. Even TP90 hits 2 seconds. Looking at GC in visualVM, I see the a pretty sawtooth turn into a scraggily forest when indexing happens and the eden space gets burned through. It seems like one approach is to have the shard leaders replicate (a la 3x) to their replicas instead of sending them the document stream. I know the replicas do that when they get "too far behind", so this would simply mean, always doing that at some given interval. This would make it possible to only put replicas into a query load balancer. In the event of a leader failure, a replica would be promoted and you'd have to deal with it, but it'd be no worse than what is now steady-state in standard 4x. Another approach might be to have separate Solr instances point to the same index directory. One instance is used for indexing and tuned for that, that other tuned for querying. It's not like having the operations on separate machines as 3x but it still would be better isolation than standard 4x. Would this at least work in theory, if say the query instance started up a new IndexSearcher when necessary? Any insight, advice or experience on this appreciated. Mike -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4x-Separate-Indexer-and-Query-Instances-for-Performance-tp4045035.html Sent from the Solr - User mailing list archive at Nabble.com.
Index vs. Query Time Aware Filters
We have very long schema files for each of our language dependent query shards. One thing that is doubling the configuration length of our main text processing field definition is that we have to repeat the exact same filter chain for query time version EXCEPT with a queryMode=true parameter. Is there a way for a filter to figure out if it's the index vs. query time version? A similar wish would be for the filter to be able to figure out the name of the field currently being indexed. This would allow a filter to set a parameter at runtime based on fieldname, instead of boilerplate copying the same filterchain definition in schema.xml EXCEPT for one parameter. The motivation is again to reduce errors and increase readability of the schema file. -- View this message in context: http://lucene.472066.n3.nabble.com/Index-vs-Query-Time-Aware-Filters-tp3009450p3009450.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index vs. Query Time Aware Filters
I should have explained that the queryMode parameter is for our own custom filter. So the result is that we have 8 filters in our field definition. All the filter parameters (30 or so) of the query time and index time are identical EXCEPT for our one custom filter which needs to know if it's in query time or index time mode. If we could determine inside our custom code whether we're indexing or querying, then we could omit the query time definition entirely and save about 50 lines of configuration and be much less error prone. One possible solution would be if we could get at the SolrCore from within a filter. Then at init time we could iterate through the filter chains and determine when we find a factory == this. (I've done this in other places where it's useful to know the name of a ValueSourceParser for example) -- View this message in context: http://lucene.472066.n3.nabble.com/Index-vs-Query-Time-Aware-Filters-tp3009450p3011556.html Sent from the Solr - User mailing list archive at Nabble.com.
queryResultCache not checked with fieldCollapsing
I have an index with field collapsing defined like this: SomeField true true When I run dismax queries I see there are no lookups in the queryResultCache. If I remove the field collapsing - lookups happen. I can't find any mention of this anywhere or think of reason why this should disable caching. I've tried playing with the group.cache.percent parameter but that doesn't seem to play a role here. Anybody know what's going on here? Mike -- View this message in context: http://lucene.472066.n3.nabble.com/queryResultCache-not-checked-with-fieldCollapsing-tp3994954.html Sent from the Solr - User mailing list archive at Nabble.com.
Custom Hit Collector
As far as I can tell, using field collapsing prevents the use of the queryResultCache from being checked. It's important for our application to have both. There are threads on incorporating custom hit collectors which seems like it could be a way to implement the simplified collapsing I need (just deduping based on the fieldCache value) but still consult the queryResultCache. Does anyone know the state being able to incorporate a custom hit collector, say, in 4.0. Or probably better, how to get caching to work with field collapsing? Mike -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-Hit-Collector-tp3995073.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Understanding SOLR search results
Can you include the entire text for only the titolo field? 1.0 = tf(termFreq(titolo:trent)=1) means the index contains one hit for 'trent' for that field, that doc. Mike -- View this message in context: http://lucene.472066.n3.nabble.com/Understanding-SOLR-search-results-tp4003480p4003540.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: The way to customize ranking?
You can use CustomScoreQuery to combine a scalar field value (e.g. like the amount of the paid placement) together with the textual relevancy. You can combine things anyway you want, e.g. finalScore = textualScore + 1000.0 * scalarValue. Or whatever makes sense. It sounds like you want some kind of step function, where if there is any scalar value, that overwhelms the score. This could do that for you. -- View this message in context: http://lucene.472066.n3.nabble.com/The-way-to-customize-ranking-tp4002885p4003565.html Sent from the Solr - User mailing list archive at Nabble.com.
FilterCache Memory consumption high
I've looked through documentation and postings and expect that a single filter cache entry should be approx MaxDoc/8 bytes. Our frequently updated index (replication every 3 minutes) has maxdoc ~= 23 Million. So I'm figuring 3MB per entry. With CacheSize=512 I expect something like 1.5GB of RAM, but with the server in steady state after 1/2 hour, it is 7GB larger than without the cache. I can understand maybe a 2x difference, given the warming searcher but 4x I don't understand. I do have maxWarmingSearchers = 2, but have never seen 2 searchers sumiltaneously being warmed. Ideas anybody? -- View this message in context: http://lucene.472066.n3.nabble.com/FilterCache-Memory-consumption-high-tp4008444.html Sent from the Solr - User mailing list archive at Nabble.com.
Memory Cost of group.cache.percent parameter
Does anyone have a clear understanding of how group.caching achieves it's performance improvements memory wise? Percent means percent of maxDoc so it's a function of that, but is it a function of that *per* item in the cache (like filterCache) or altogether? The speed improvement looks pretty dramatic for our macDoc=25M index but it would be helpful to understand what the costs are. Mike -- View this message in context: http://lucene.472066.n3.nabble.com/Memory-Cost-of-group-cache-percent-parameter-tp4012967.html Sent from the Solr - User mailing list archive at Nabble.com.