from:"Mike Schultz"

Rolling Deploys and SolrCloud

2012-12-11 Thread Mike Schultz

Does anybody have any experience with "rolling deployments" and SolrCloud?

We have a production environment where we deploy new software and config
simultaneously to individual servers in a rolling manner.  At any point
during the deployment, there may be N boxes with old software/config and M
boxes with new software/config.  Eventually N=0 (no more old
software/config) and M=100% (all boxes have new software/config).  This is
very convenient because one knows that the config that the new software
requires is present, but it is not present for old software on other boxes. 
We can maintain 100% uptime for the service using this technique.

If I'm understanding the role that Zk plays in SolrCloud, this no longer
works.  If config lives in Zk, then it's all or nothing, all old or all new
config.  If this is true, it presents a bunch of new challenges for
deploying software.

So to ask a concrete question, is it possible to not use zk for config
distribution, i.e. keep the config local to each shard?

Mike Schultz



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Rolling-Deploys-and-SolrCloud-tp4026212.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Rolling Deploys and SolrCloud

2012-12-12 Thread Mike Schultz

Ok, that makes sense and it's probably workable, but, it's still more awkward
than having code and configuration deployed together to individual machines.  

For example, for a deploy of new software/config we need to 1) first upload
config to zK.  then 2) deploy new software to the nodes.

What about the span of time between 1) and 2)?  If a box bounces during this
time it will come up with the wrong config.  Or what if 2) goes awry and
some boxes succeed and some fail?  It could be very complicated to recover
from that.

Another use case is, I may want to push a new software/config to a single
box for a smoke test before rolling to all nodes in production (some might
call this testing in production but it's just real world safety).

I guess at the end of the day, what I don't understand is, given that I need
to roll new software bits to individual nodes for the deployment of new
software, what good does keeping config in zk do for me?  Why not just keep
the config with the software and roll it at the same time?  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Rolling-Deploys-and-SolrCloud-tp4026212p4026419.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr 4.0 doesn't send qt parameter to shards

2013-01-18 Thread Mike Schultz

Can someone explain the logic of not sending the qt parameter down to the
shards?

I see from here that qt is handled as a special case for ResultGrouping:
http://lucidworks.lucidimagination.com/display/solr/Result+Grouping
where there is a special shard.qt parameter.

in 3.x solrconfig.xml supports defining a list of SearchComponents on
handler by handler basis.  This flexibility goes away if qt isn't passed
down or am I missing something?

I'm using:
http://localhost:8983/solr/select)
and modify query processing by varying only the query parameters.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-doesn-t-send-qt-parameter-to-shards-tp4034653.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: n values in one fieldType

2013-01-18 Thread Mike Schultz

It depends on what kind of behavior you're looking for.

if for your queries the order of the 6 integer values doesn't matter you
could do:



then you could query with ORed or ANDed integer values over that field.

If the order matters but you always query on the "set" of 6 values, then you
turn your six integers into a guid or simply HEX encode them into
singleValued string field.

Another possibility is to HEX encode the integers and separate them with
whitespace and whitespace tokenize.  Then you get a mixture of the two above
but you can also specify some locality constraints, eg. using phrase
queries, etc.

The answer really depends on the types of queries you need to be able to
respond to.

M



--
View this message in context: 
http://lucene.472066.n3.nabble.com/n-values-in-one-fieldType-tp4034552p4034662.html
Sent from the Solr - User mailing list archive at Nabble.com.

Adding replacement node to 4.x cluster with down node

2013-02-04 Thread Mike Schultz

I have a 4.x cluster that is 10 wide and 4 deep.  One of the nodes of a shard
went down.  I provisioned a replacement node and introduced into the
cluster, but it ended up on a random shard, not the shard with the downed
node.

Is there a maintenance step I need to perform before introducing a node?  I
assume if I had somehow removed the downed node from the zookeeper database
so that I no longer saw the grayed-out representation of it in the cloud
display, that the new node would have ended up in the right place...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-replacement-node-to-4-x-cluster-with-down-node-tp4038469.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Anyone else see this error when running unit tests?

2013-02-04 Thread Mike Schultz

Yes.  Just today actually.  I had some unit test based on
AbstractSolrTestCase which worked in 4.0 but in 4.1 they would fail
intermittently with that error message.  The key to this behavior is found
by looking at the code in the lucene class: TestRuleSetupAndRestoreClassEnv. 
I don't understand it completely but there are a number of random code paths
through there.  The following helped me get around the problem, at least in
the short term.

@org.apache.lucene.util.LuceneTestCase.SuppressCodecs({"Lucene3x","Lucene40"})
public class CoreLevelTest extends AbstractSolrTestCase {

I also need to call this inside my setUp() method, in 4.0 this wasn't
required.
initCore("solrconfig.xml", "schema.xml", "/tmp/my-solr-home");



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Anyone-else-see-this-error-when-running-unit-tests-tp4015034p4038472.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Adding replacement node to 4.x cluster with down node

2013-02-05 Thread Mike Schultz

Just to clarify,  I want to be able to replace the down node with a host with
a different name.  If I were repairing that particular machine and replacing
it, there would be no problem.  But I don't have control over the name of my
replacement machine.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-replacement-node-to-4-x-cluster-with-down-node-tp4038469p4038612.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr4x: Separate Indexer and Query Instances for Performance

2013-03-05 Thread Mike Schultz

Solr 3x had a master/slave architecture which meant that indexing did not
happen in the same process as querying, in fact normally not even on the
same machine. The querier only needed to copy down snapshots of the new
index files and commit them. Great isolation for maximum query performance
and indexing performance. Now in Solr4x this is gone. Does anyone have any
answer or tuning approaches to address this?

We have a high query load, high indexing load environment. I see TP99 query
latency go from under 100mS to 4-10 seconds during indexing. Even TP90 hits
2 seconds. Looking at GC in visualVM, I see the a pretty sawtooth turn into
a scraggily forest when indexing happens and the eden space gets burned
through.

It seems like one approach is to have the shard leaders replicate (a la 3x)
to their replicas instead of sending them the document stream. I know the
replicas do that when they get "too far behind", so this would simply mean,
always doing that at some given interval. This would make it possible to
only put replicas into a query load balancer. In the event of a leader
failure, a replica would be promoted and you'd have to deal with it, but
it'd be no worse than what is now steady-state in standard 4x.

Another approach might be to have separate Solr instances point to the same
index directory. One instance is used for indexing and tuned for that, that
other tuned for querying. It's not like having the operations on separate
machines as 3x but it still would be better isolation than standard 4x.
Would this at least work in theory, if say the query instance started up a
new IndexSearcher when necessary?

Any insight, advice or experience on this appreciated.

Mike

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr4x-Separate-Indexer-and-Query-Instances-for-Performance-tp4045035.html
Sent from the Solr - User mailing list archive at Nabble.com.

Index vs. Query Time Aware Filters

2011-05-31 Thread Mike Schultz

We have very long schema files for each of our language dependent query
shards.  One thing that is doubling the configuration length of our main
text processing field definition is that we have to repeat the exact same
filter chain for query time version EXCEPT with a queryMode=true parameter.

Is there a way for a filter to figure out if it's the index vs. query time
version?

A similar wish would be for the filter to be able to figure out the name of
the field currently being indexed.  This would allow a filter to set a
parameter at runtime based on fieldname, instead of boilerplate copying the
same filterchain definition in schema.xml EXCEPT for one parameter.  The
motivation is again to reduce errors and increase readability of the schema
file.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-vs-Query-Time-Aware-Filters-tp3009450p3009450.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index vs. Query Time Aware Filters

2011-06-01 Thread Mike Schultz

I should have explained that the queryMode parameter is for our own custom
filter.  So the result is that we have 8 filters in our field definition. 
All the filter parameters (30 or so) of the query time and index time are
identical EXCEPT for our one custom filter which needs to know if it's in
query time or index time mode.  If we could determine inside our custom code
whether we're indexing or querying, then we could omit the query time
definition entirely and save about 50 lines of configuration and be much
less error prone.

One possible solution would be if we could get at the SolrCore from within a
filter.  Then at init time we could iterate through the filter chains and
determine when we find a factory == this.  (I've done this in other places
where it's useful to know the name of a ValueSourceParser for example)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-vs-Query-Time-Aware-Filters-tp3009450p3011556.html
Sent from the Solr - User mailing list archive at Nabble.com.

queryResultCache not checked with fieldCollapsing

2012-07-13 Thread Mike Schultz

I have an index with field collapsing defined like this:



 SomeField
  
 true  
   
 true   
   


When I run dismax queries I see there are no lookups in the
queryResultCache.  If I remove the field collapsing - lookups happen.  I
can't find any mention of this anywhere or think of reason why this should
disable caching.  I've tried playing with the group.cache.percent parameter
but that doesn't seem to play a role here.

Anybody know what's going on here?

Mike

--
View this message in context: 
http://lucene.472066.n3.nabble.com/queryResultCache-not-checked-with-fieldCollapsing-tp3994954.html
Sent from the Solr - User mailing list archive at Nabble.com.

Custom Hit Collector

2012-07-14 Thread Mike Schultz

As far as I can tell, using field collapsing prevents the use of the
queryResultCache from being checked. It's important for our application to
have both.  There are threads on incorporating custom hit collectors which
seems like it could be a way to implement the simplified collapsing I need
(just deduping based on the fieldCache value) but still consult the
queryResultCache.

Does anyone know the state being able to incorporate a custom hit collector,
say, in 4.0.  Or probably better, how to get caching to work with field
collapsing?

Mike

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-Hit-Collector-tp3995073.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Understanding SOLR search results

2012-08-27 Thread Mike Schultz

Can you include the entire text for only the titolo field?  

1.0 = tf(termFreq(titolo:trent)=1) means the index contains one hit for
'trent' for that field, that doc.

Mike



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Understanding-SOLR-search-results-tp4003480p4003540.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: The way to customize ranking?

2012-08-27 Thread Mike Schultz

You can use CustomScoreQuery to combine a scalar field value (e.g. like the
amount of the paid placement) together with the textual relevancy.   You can
combine things anyway you want, e.g.

finalScore = textualScore + 1000.0 * scalarValue.

Or whatever makes sense.  It sounds like you want some kind of step
function, where if there is any scalar value, that overwhelms the score. 
This could do that for you.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-way-to-customize-ranking-tp4002885p4003565.html
Sent from the Solr - User mailing list archive at Nabble.com.

FilterCache Memory consumption high

2012-09-17 Thread Mike Schultz

I've looked through documentation and postings and expect that a single
filter cache entry should be approx MaxDoc/8 bytes.

Our frequently updated index (replication every 3 minutes) has maxdoc ~= 23
Million.

So I'm figuring 3MB per entry.  With CacheSize=512 I expect something like
1.5GB of RAM, but with the server in steady state after 1/2 hour, it is 7GB
larger than without the cache.

I can understand maybe a 2x difference, given the warming searcher but 4x I
don't understand.

I do have maxWarmingSearchers = 2, but have never seen 2 searchers
sumiltaneously being warmed.

Ideas anybody?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/FilterCache-Memory-consumption-high-tp4008444.html
Sent from the Solr - User mailing list archive at Nabble.com.

Memory Cost of group.cache.percent parameter

2012-10-10 Thread Mike Schultz

Does anyone have a clear understanding of how group.caching achieves it's
performance improvements memory wise?  Percent means percent of maxDoc so
it's a function of that, but is it a function of that *per* item in the
cache (like filterCache) or altogether?  The speed improvement looks pretty
dramatic for our macDoc=25M index but it would be helpful to understand what
the costs are.

Mike



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Memory-Cost-of-group-cache-percent-parameter-tp4012967.html
Sent from the Solr - User mailing list archive at Nabble.com.