collection aliasing,solrctl

2016-01-20 Thread vidya
Hi I am using solr with cloudera distribution to index data from hdfs and I am using "solrctl" utility for my deployment. Now i wanted to create collection alias. How can i perform the action of creating collection aliasing by commands. >From google i got : " /admin/collections?action=CREATE "

Re: Scaling SolrCloud

2016-01-20 Thread Erick Erickson
bq: 3 are to risky, you lost one you lost quorum Typo? You need to lose two. On Wed, Jan 20, 2016 at 6:25 AM, Yago Riveiro wrote: > Our Zookeeper cluster is an ensemble of 5 machines, is a good starting point, > 3 are to risky, you lost one you lost quorum and with 7 sync cost increase. > >

Re: Using Solr's spatial functionality for astronomical catalog

2016-01-20 Thread david.w.smi...@gmail.com
Hello Colin, If the spatial field you use is the SpatialRecursivePrefixTreeFieldType one (RPT for short) with geo="true" then the circle shape (i.e. point-radius filter) implied by the geofilt Solr QParser is on a sphere. That is, it uses the "great circle" distance computed using the Haversine f

Re: schemaless vs schema based core

2016-01-20 Thread Erick Erickson
I would really avoid schemaless in _any_ situation where I know the schema ahead of time. bq: But in my case, I am planning to use solrj (so, no spelling mistakes) On, I'm quite sure there'll be some kind of mistake sometime ;) I know of at at least one situation where a programming mistake in So

Re: FieldCache

2016-01-20 Thread Yonik Seeley
On Thu, Jan 14, 2016 at 2:43 PM, Lewin Joy (TMS) wrote: > Thanks for the reply. > But, the grouping on multivalued is working for me even with multiple data in > the multivalued field. > I also tested this on the tutorial collection from the later solr version > 5.3.1 , which works as well. Old

Re: Solrcloud getting warning "missed update"

2016-01-20 Thread Mugeesh Husain
Hello, I am sharing warning image, please find/check this could anyone have an idea of above warning -- View this message in context: http://lucene.472066.n3.nabble.com/Solrcloud-getting-warning-missed-update-tp4251556p4252110.html

Re: solr score threashold

2016-01-20 Thread Doug Turnbull
What problem are you trying to solve? If you're trying to cut out "bad" results, I might suggest explicitly using filters that eliminate undesirable search items in terms that are meaningful to how your users evaluate relevance. For example, let's say your users only want items that have at least

Re: Returning all documents in a collection

2016-01-20 Thread Joel Bernstein
The limitations of the /export handler should already be documented. Lot's of documentation still todo for Solr 6 around Streaming Expressions and some left todo on SQL. The SQL interface in Solr 6 can also select and sort entire result sets as it's built on top of the Streaming API. Joel Bernste

Re: solr score threashold

2016-01-20 Thread Walter Underwood
The ScoresAsPercentages page is not really instructions for how to normalize scores. It is an explanation of why a score threshold does not do what you want. Don’t use thresholds. If you want thresholds, you will need a search engine with a probabilistic model, like Verity K2. Those generally gi

Re: schemaless vs schema based core

2016-01-20 Thread Shawn Heisey
On 1/20/2016 10:17 AM, Prateek Jain J wrote: What all I could gather from various blogs is, defining schema stops developers from accidently adding fields to solr. But in my case, I am planning to use solrj (so, no spelling mistakes). My point is: 1. Is there any advantage like performa

schemaless vs schema based core

2016-01-20 Thread Prateek Jain J
Hi, I have just started to play around with solr capabilities. Got a basic doubt (couldn't get clear answer by searching over internet), I am working on an application which has very basic query requirement like searching on uniqueID or date range(not any faceting or NLP) and I have all the in

Re: FileBased Spellcheck on Solr cloud

2016-01-20 Thread Binoy Dalal
One thing you could do is index your entire spell check file into lucene as string values. That way your index will be available across the cloud and you can build your dictionary from the indexed field. This will however mean that everytime you change the spellcheck file, you will need to do reind

Re: Solr UninvertingReader getNumericDocValues doesn't seem to work for fields that are not stored or indexed

2016-01-20 Thread Yonik Seeley
On Wed, Jan 20, 2016 at 10:19 AM, plbarrios wrote: > Joel, > > Thank you for the reply! > > This approach solved my problem. > > Now should I be concerned about the 32 bits that are lost in converting the > long to an int? Also, is this the intended approach when using > NumericDocValues? If the

Re: Returning all documents in a collection

2016-01-20 Thread Jack Krupansky
It would be nice to have an explicit section in the doc on the topic of "Dealing with Large Result Sets" to point people to the various approaches (paging, caching, export, streaming expressions, and how to select the best one for a given use case.) (And Joel is going to promise to update the doc

Re: Solr UninvertingReader getNumericDocValues doesn't seem to work for fields that are not stored or indexed

2016-01-20 Thread plbarrios
Joel, Thank you for the reply! This approach solved my problem. Now should I be concerned about the 32 bits that are lost in converting the long to an int? Also, is this the intended approach when using NumericDocValues? -- View this message in context: http://lucene.472066.n3.nabble.com/Sol

FileBased Spellcheck on Solr cloud

2016-01-20 Thread Riyaz
Hi, *Environment: * Solr-4.10.4 tomcat6 Solr Cloud - 6 shards and 6 replicas with external zookeeper ensemble We are configuring Filebased spellcheck component on Solr Cloud. The source file for dictionary generation having 5 million text entries. Since the solr configurations(including spellings

Re: Solr trying to auto-update schema.xml

2016-01-20 Thread Bob Lawson
Thanks, I was using an invalid field type. All is good now. Thanks Hoss and Eric. You guys are the best! On Tue, Jan 19, 2016 at 6:47 PM, Chris Hostetter wrote: > > : Thanks, very helpful. I think I'm on the right track now, but when I do > a > : post now and my UpdateRequestProcessor extens

Re: Returning all documents in a collection

2016-01-20 Thread Joel Bernstein
CloudSolrStream is available in Solr 5. The "search" streaming expression can used or CloudSolrStream can be used in directly. https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions The export handler does not export stored fields though. It only exports fields using DocValues cac

Re: Scaling SolrCloud

2016-01-20 Thread Yago Riveiro
Our Zookeeper cluster is an ensemble of 5 machines, is a good starting point, 3 are to risky, you lost one you lost quorum and with 7 sync cost increase. ZK cluster is in machines without IO and rotative hdd (don't not use SDD to gain IO performance, zookeeper is optimized to spinning disks).

Re: Returning all documents in a collection

2016-01-20 Thread Salman Ansari
Thanks Emir, Susheel and Jack for your responses. Just to update, I am using Solr Cloud plus I want to get the data completely without pagination or cursor (I mean in one shot). Is there a way to do this in Solr? Regards, Salman On Wed, Jan 20, 2016 at 4:49 PM, Jack Krupansky wrote: > Yes, Expo

Re: Rolling upgrade to 5.4 from 5.0 - "bug" caused by leader changes - is there a workaround?

2016-01-20 Thread Michael Joyner
Unfortunately, it really couldn't wait. I did a rolling upgrade to the 5.4.1RC2 then downgraded everything to 5.4.0 and so far everything seems fine. Couldn't take the cluster down. On 01/19/2016 05:03 PM, Anshum Gupta wrote: If you can wait, I'd suggest to be on the bug fix release. It shou

Re: Returning all documents in a collection

2016-01-20 Thread Jack Krupansky
Yes, Exporting Results Sets is the preferred and recommended technique for returning all documents in a collection, or even simply for queries that select a large number of documents, all of which are to be returned. It uses efficient streaming rather than paging. But... this great feature current

Re: Returning all documents in a collection

2016-01-20 Thread Susheel Kumar
Hello Salman, Please checkout the export functionality https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets Thanks, Susheel On Wed, Jan 20, 2016 at 6:57 AM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: > Hi Salman, > You should use cursors in order to avoid "deep pag

Re: Scaling SolrCloud

2016-01-20 Thread Troy Edwards
Thank you for sharing your experiences/ideas. Yago since you have 8 billion documents over 500 collections, can you share what/how you do index maintenance (e.g. add field)? And how are you loading data into the index? Any experiences around how Zookeeper ensemble behaves with so many collections?

Re: solr score threashold

2016-01-20 Thread Emir Arnautovic
Hi Sara, You can use funct and frange to achive needed, but note that scores are not normalized meaning score 8 does not mean it is good match - it is just best match. There are examples online how to normalize score (e.g. http://wiki.apache.org/lucene-java/ScoresAsPercentages). Other approach

solr score threashold

2016-01-20 Thread sara hajili
hi all, i wanna to know about solr search relevency scoreing threashold. can i change it? i mean immagine when i searching i get this result doc1 score =8 doc2 score =6.4 doc3 score=6 doc8score=5.5 doc5 score=2 i wana to change solr score threashold .in this way i set threashold for example >4 and

Re: Returning all documents in a collection

2016-01-20 Thread Emir Arnautovic
Hi Salman, You should use cursors in order to avoid "deep paging issues". Take a look at https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results. Regards, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://semate

Returning all documents in a collection

2016-01-20 Thread Salman Ansari
Hi, I am looking for a way to return all documents from a collection. Currently, I am restricted to specifying the number of rows using Solr.NET but I am looking for a better approach to actually return all documents. If I specify a huge number such as 1M, the processing takes a long time. Any fe

Re: Position increment in WordDelimiterFilter.

2016-01-20 Thread Alessandro Benedetti
On 19 January 2016 at 05:41, Modassar Ather wrote: > Thanks Shawn for your explanation. > > Everything else about the analysis looks > correct to me, and the positions you see are needed for a phrase query > to work correctly. > > Here the "WiFi device" will not be searched as there is a gap in b

Re: ramBufferSizeMB and maxIndexingThreads

2016-01-20 Thread Emir Arnautovic
Kind of obvious/logical, but seen some people forgetting that it is per core - if single node host multiple shards, each will take 100MB. Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On 20.01.2016 07:02, Sha