Re: OR query strange results

2018-11-15 Thread Danilo Tomasoni
Thank you for your reply Erick. I've thought about termsquery but it doesn't support phrase search AFAIK, and I want to query for near words like "Mycobacterium tuberculosis" and also i would like to use the tilde syntax "Mycobacterium tuberculosis"~2 . Does it exists a parser for that, so t

Re: How to use multiple data drives?

2018-11-15 Thread Jan Høydahl
You can create a new collection that you explicitly place on drive E and split your data in that way. Otherwise we normally advise to buy bigger drives or to use an OS tool to create a logical drive spanning several physical ones so that Solr sees it as one. -- Jan Høydahl, search solution arch

Question about elevations

2018-11-15 Thread Andrew Luong
Hi, I have a quick question about elevations. If i have a query with rows=10 and over 10 elevateIds, will solr only lookup the elevateIds and not perform the normal search? Thanks Andrew -- *P.S. We've launched a new blog to share the latest ideas and case studies from our team. Check it out he

Re: Extracting important multi term phrases from the text

2018-11-15 Thread Alexandre Rafalovitch
I think the underscore actually comes from the Shingles (parameter fillerToken). Have you tried setting it to empty string? Regards, Alex. On Thu, 15 Nov 2018 at 17:16, Pratik Patel wrote: > > Hi Markus, > > Thanks for the reply. I tried using ShingleFilter and it seems to > be working. Howeve

Re: Extracting important multi term phrases from the text

2018-11-15 Thread Walter Underwood
+1 for not using stopwords. I haven’t used them since 1996. When I was at Netflix, I collected some movie titles that were 100% stopwords. https://observer.wunderwood.org/2007/05/31/do-all-stopword-queries-matter/ wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my

RE: Extracting important multi term phrases from the text

2018-11-15 Thread Markus Jelsma
Hello Pratik, How about not using StopFilter at all? We got rid of it a long time ago, and only use it in very specific circumstances. LUCENE-4065 is not going to be fixed any time soon. Removing StopFilter will introduce noise, but you could work around it with SKG. Please let us know if it w

Re: Extracting important multi term phrases from the text

2018-11-15 Thread Pratik Patel
Hi Markus, Thanks for the reply. I tried using ShingleFilter and it seems to be working. However, I am hitting an issue when it is used with StopWordFilter. StopWordFilter leaves an underscore "_" for removed words and it kind of screws up the data in index. I tried setting enablePositionIncremen

RE: Cassandra Solr Integration, what driver to use?

2018-11-15 Thread Liu, Daphne
I use this fa jar for Solr 6.6.5 https://github.com/adejanovski/cassandra-jdbc-wrapper Kind regards, Daphne Liu BI Architect • Big Data - Matrix SCM CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 USA / www.cevalogistics.com T 904.5641192/ F 904.928.1525 / daphne..

Cassandra Solr Integration, what driver to use?

2018-11-15 Thread Ka Mok
I'm trying to do some data integration with a Cassandra 3.11.3 database with Solr 7.5 I've spent the past 2 days looking for the right driver, and hasn't found a single one other than some product offered by Datastax. Is there really no way to use the default DataImportHandler? In the Solr Admin

RE: Extracting important multi term phrases from the text

2018-11-15 Thread Markus Jelsma
Hello Pratik, We would use ShingleFilter for this indeed. If you only want bigrams/shingles, don't forget to disable outputUnigrams and set both shinle size limits to 2. Regards, Markus -Original message- > From:Pratik Patel > Sent: Thursday 15th November 2018 17:00 > To: solr-user@luc

DistributedIDF cache choices

2018-11-15 Thread Walter Underwood
Is there any reason to use anything other than LRUStatsCache or LocalStatsCache? It seems like the LRU implementation would be the fastest of the global IDF implementations. Also, any experience with the slowdown due to global IDF? I know that could be done without an additional call. And I kno

Re: How to use multiple data drives?

2018-11-15 Thread Alexandre Rafalovitch
You can configure where your data directory is in core.properties: https://lucene.apache.org/solr/guide/7_5/defining-core-properties.html#defining-core-properties-files Or probably via API. Regards, Alex. On Thu, 15 Nov 2018 at 12:45, John Milton wrote: > > Hi Solr Team, > > I have installed

How to use multiple data drives?

2018-11-15 Thread John Milton
Hi Solr Team, I have installed Solr in Windows OS, on my C drive. And I make the D drive as the data directory. If D drive physical memory is almost full, can I use the other drives to store data? I mean, If the current data directory is getting full, I need to use the multiple drives as a data

Solr cloud, solr nodes on 2 datacenters

2018-11-15 Thread ilango dhandapani
Am using external zookeeper ensemble (3.4.6) and solr cloud (5.3). Have 2 shards for a collection in solr cloud. Have 2 datacenters DC1 and Dc2. Each shard has 2 nodes, 1 on DC1 and another on DC2. We have added DC1 nodes to solr-DC1 load balancer and DC2 nodes to solr-DC2 load balancer. The applic

Re: Exporting results and schema design

2018-11-15 Thread Erick Erickson
NP, having something in the manual is A Good Thing, but it's very, very easy to not find a paragraph in a 1,000+ page doc! Oh, and I recommend downloading the PDF version of the Solr ref guide for your version of solr for locally-searchable reference FWIW. Best, Erick On Thu, Nov 15, 2018 at 1:26

Re: querying on field of type string doesn't work as expected

2018-11-15 Thread Erick Erickson
Well, there's little likelihood that Solr will be changed this way. If your field were a text-based field that had a lowercase as part of it's analysis chain, then what would you expect from searching for "Some Text"~3? An exact match ignoring the slop? Or searching "Some Text" (against against a t

Re: OR query strange results

2018-11-15 Thread Erick Erickson
You're using edismax which has the "mm" parameter that you can think of as a sliding scale between pure OR and pure AND. What happens if you set it to zero? As for maxboolean clauses, the easiest/fasted way around that would be to use an "fq" clause and the TermsQueryParser. Best, Erick On Thu, N

Extracting important multi term phrases from the text

2018-11-15 Thread Pratik Patel
Hello Everyone, Standard way of tokenizing in solr would divide the text by white space in solr. Is there a way by which we can index multi-term phrases like "Machine Learning" instead of "Machine", "Learning"? Is it possible to create a specific field type for such phrases which has its own inde

OR query strange results

2018-11-15 Thread Danilo Tomasoni
Hello all, I'm performing some queries with a big list of terms in OR on our solr instance, and this odd situation happened - A. query with N alternatives returns ~130.000 documents - B. query with N-3 alternatives returns ~ 6.000.000 documents N is relatively small in this case, but in g

Re: 3 Solr instances different ports

2018-11-15 Thread David Hastings
To add to the concerns above, running on the same machine, using the same disk, is going to be really detrimental to performance..but for a prototype its fine On Wed, Nov 14, 2018 at 4:10 PM Shawn Heisey wrote: > On 11/14/2018 7:58 AM, cristian.tiu...@gmail.com wrote: > > I want to have 3 differ

Re: querying on field of type string doesn't work as expected

2018-11-15 Thread e_briere
Try comparing strings char by char. White spaces are sometimes unprintable characters.Eric.Sent from my Samsung Galaxy smartphone. Original message From: Angel Todorov Date: 2018-11-15 04:06 (GMT-05:00) To: solr-user@lucene.apache.org Subject: Re: querying on field of type st

Re: Median in Solr json facet api

2018-11-15 Thread Anil
Thanks Toke and Joel. On Wed, 14 Nov 2018 at 19:47, Joel Bernstein wrote: > The JSON facet API uses the t-digest approach to estimate the percentiles. > > You can also use Solr Math Expressions to take a random sample from a field > and estimate the median from the sample. Here is the Streaming

Re: Exporting results and schema design

2018-11-15 Thread Dwane Hall
Thanks Erick that's great advice as always it's very much appreciated. I've never seen an example of that pattern used before (stored=false,indexed=false,useDocValuesAsStored=true) on any of the fantastic solr blogs I've read and I've read lot of them many times (all of your excellent Lucidwor

Re: querying on field of type string doesn't work as expected

2018-11-15 Thread Angel Todorov
hi Erik, Thanks, but neither of those seem to work. (neither "Some\ Text" , nor Some\ Text). Also, assuming I may have many fields with different types, I don't think this is a very good design to leave it up to the application developer to have different encoding based on what the underlying SOLR

Re: Delete by query in SOLR 6.3

2018-11-15 Thread Emir Arnautović
Hi Rakesh, Since Solr has to maintain eventual consistency of all replicas, it has to block updates while DBQ is running. Here is blog post with high level explaination of the issue: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html