RE: Question on multi-threaded faceting

2014-08-02 Thread Toke Eskildsen
Vamsee Yarlagadda [vam...@cloudera.com] Wrote: > I filed https://issues.apache.org/jira/browse/SOLR-6314 to track this issue > going forward. > Any ideas around this problem? Apparently the distributed faceting handling collapsed the duplicate fields, which singular did not. I guess your test cas

Re: SolrCloud Scale Struggle

2014-08-02 Thread Bill Bell
Seems way overkill. Are you using /get at all ? If you need the docs avail right away - why ? How about after 30 seconds ? How many docs do you get added per second during peak ? Even Google has a delay when you do Adwords. One idea is yo have an empty core that you insert into and then shard i

Re: SolrCloud Scale Struggle

2014-08-02 Thread Bill Bell
Auto correct not good Corrected below Bill Bell Sent from mobile > On Aug 2, 2014, at 11:11 AM, Bill Bell wrote: > > Seems way overkill. Are you using /get at all ? If you need the docs avail > right away - why ? How about after 30 seconds ? How many docs do you get > added per second duri

Re: SolrCloud Scale Struggle

2014-08-02 Thread anand.mahajan
Thank you everyone for your responses. Increased the hard commit to 10mins and autoSoftCommit to 10 secs. (I wont really need a real time get - tweaked the app code to cache the doc and use the app side cached version instead of fetching it from Solr) Will watch it for a day or two and clock the th

Re: SolrCloud Scale Struggle

2014-08-02 Thread anand.mahajan
Thanks Shawn. I'm using 2 level composite id routing right now. These are all Used Cars listings and all search queries always have car year and make in the search criteria - hence that made sense to have Year+Make as level 1 in the composite id. Beyond that the second level composite id is based o

Re: Query on Facet

2014-08-02 Thread Umesh Prasad
You can use pivot faceting. https://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting There is no index time work required and you can nest the facets at search time as for your need. PS : It won't work with SolrCloud / Sharded Index .. SOLR-2894 is in progress if

Re: Solr gives the same fieldnorm for two different-size fields

2014-08-02 Thread Umesh Prasad
What you really need is a covering type match. I feel your use case fits into this type Score (Exact match in order) > Score ( Exact match without order ) > Score (Non Exact Match) Example Query : a b c Example docs : d1 : a b c d2 : a c b d3 : c a b d4 : a b c d d5 : a b c d e

Re: Searching words with spaces for word without spaces in solr

2014-08-02 Thread Umesh Prasad
I would suggest breaking the problem in smaller parts 1. Identify variations(say compound words) offline (where you can combine multiple sources to ensure much better quality). 2. Expand the user query during search time using your sources. So query will become icecream OR (ice cream) (wi

Re: Bloom filter

2014-08-02 Thread Umesh Prasad
+1 to Guava's BloomFilter implementation. You can actually hook into UpdateProcessor chain and have the logic of updating bloom filter / checking there. We had a somewhat similar use case. We were using DIH and it was possible that same solr input document (meaning same content) will be coming l

Re: Shuffle results a little

2014-08-02 Thread Umesh Prasad
What you are look for is a distribution of search results. One way would be a two phase search Phase 1 : Search (with rows =0, No scoring, no grouping) 1. Find the groups (unique combinations) using pivot facets (won't work in distributed env yet) 2. Transform those groups as group.queries .. Pha

Re: To warm the whole cache of Solr other than the only autowarmcount

2014-08-02 Thread Umesh Prasad
@Eric : As you said, each use-case is different. We actually autowarm our caches to 80% and we have a 99% hit ratio on filter cache. For query cache, hit ratios are like 25% but given that cache hit saves us about 10X, we strive to increase cache hit ratio. @Yang : You can't do a direct copy of va

Re: Implementing custom analyzer for multi-language stemming

2014-08-02 Thread Umesh Prasad
Also, take a look at the Lucid revolution talk Typed Index https://www.youtube.com/watch?v=X93DaRfi790 *Published on 25 Nov 2013* Presented by Christoph Goller, Chief Scientist, IntraFind Software AG If you want to search in a multilingual environment with high-quality language-specific word-no

Re: Identify specific document insert error inside a solrj batch request

2014-08-02 Thread Umesh Prasad
Solr schema over REST https://wiki.apache.org/solr/SchemaRESTAPI https://cwiki.apache.org/confluence/display/solr/Schema+API You can use that for getting required fields and validate at client side .. On 31 July 2014 14:32, Liram Vardi wrote: > Hi Jack, > Thank you for your reply. > This

Re: SolrCloud Scale Struggle

2014-08-02 Thread Shawn Heisey
On 8/2/2014 2:46 PM, anand.mahajan wrote: > Also, since there are already 18 JVMs per machine - How do I go about > merging these existing cores under just 1 JVM? Would it be that I'd need to > create 1 Solr instance with 18 cores inside and then migrate data from these > separate JVMs into the new