Filter caching
Re-reading the documentation, it seems that Solr caches the results of the fq parameter, not lower level field constraints. This would imply that breaking a single complex boolean filter into multiple conjunctive fq parameters would improve the odds for cache hits. Is this correct? fq=(a:foo or b:bar) and c:bah Vs. fq=(a:foo or b:bar)&fq=c:bah Thanks, -Jess -- Sent from my Android phone with K-9 Mail. Please excuse my brevity.
Delete by query with soft commit
It appears that UpdateResponse.setCommitWithin is not honored when executing a delete query against SolrCloud (SolrJ 4.6). However, setting the hard commit parameter functions as expected. Is this a known bug? Thanks, -Jess
solrconfig.xml carrot2 params
Would someone help me out with the syntax for setting Tokenizer.documentFields in the ClusteringComponent engine definition in solrconfig.xml? Carrot2 is expecting a Collection of Strings. There's no schema definition for this XML file and a big TODO on the Wiki wrt init params. Every permutation I have tried results in an error stating: Cannot set java.until.Collection field ... to java.lang.String. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity.
Listing collection fields
I'd like to get the complete field list for a collection, including dynamic fields. Is issuing a Luke request still the recommended way for retrieving this data? -- Sent from my Android phone with K-9 Mail. Please excuse my brevity.
Re: Listing collection fields
Thanks. I have an Xtext DSL doing some config and code generation downstream of the data ingestion. It probably wouldn't be that hard to generate a solrconfig.xml, but for now I just want to build in some runtime reconciliation to aid in dynamic query generation. It sounds like Luke is still the best approach. Regards, -Jess Shalin Shekhar Mangar wrote: >You can use the ListFields method in the new Schema API: > >https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-ListFields > >Note that this will return all configured fields but it doesn't tell >you the actual dynamic field names in the index. I don't know if we >have anything better than a luke request for that yet. > >On Tue, Nov 19, 2013 at 5:56 AM, youknow...@heroicefforts.net > wrote: >> I'd like to get the complete field list for a collection, including >dynamic fields. Is issuing a Luke request still the recommended way >for retrieving this data? >> >> -- >> Sent from my Android phone with K-9 Mail. Please excuse my brevity. > > > >-- >Regards, >Shalin Shekhar Mangar. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity.
Re: Advise on an architecture with lot of cores
"On the other hand, it [sic] most of the cores are idle most of the time, the 1 core/customer setup would be give better utilization of the hardware." This is an important point. I've seen performance go to hell when 10M, 100M, and 1B cloud collections were consolidated in a hardware constrained environment. The data belonged to the same customer and there were good reason for this approach. In our case, we were able to reduce our queries by n-1 (where n is the number of collections consolidated), but the overall query was slower; many seconds vs subsecond. You won't have that option, but maybe you are in a better place wrt hardware. The newer cloud routing may also play an important role here (maybe someone else could speak to that). As you alluded earlier, the query generation must be altered to generate a fq security clause (operator precedence is important here). If search performance is a vital part of your company's service offering, then it's definitely worth the money to collect representative queries and test on alternate hardware before committing your production environment. Cheers, -Jess On October 7, 2014 8:56:46 AM EST, Manoj Bharadwaj wrote: >Hi Toke, > >Thank you for your insights. > > >> Why do you want to collapse the cores? >> > >Most of the cores are small and a few big ones make up the bulk. Our >thinking was that it would be as easy to just have one core. Monitoring >becomes easy as well (we are using a monitoring tool in which there is >a >limit on the number of endpoints that can be monitored, and we are >considering other monitoring solutions including Sematext). > >Regards >Manoj -- Sent from my mobile. Please excuse my brevity.