query rewriting

2017-03-05 Thread Hendrik Haddorp
Hi, I would like to dynamically modify a query, for example by replacing a field name with a different one. Given how complex the query parsing is it does look error prone to duplicate that so I would like to work on the Lucene Query object model instead. The subclasses of Query look relative

sort by function with cursor based result fetching

2017-03-05 Thread Dmitry Kan
Hi, Solr: 4.10.2 We've noticed a potential bug with fetching results over cursor when sorting by a function on dynamic date fields. Filed as: https://issues.apache.org/jira/browse/SOLR-10231 Is there an obvious reason for sorting by function not to work with cursors? Could this have been fixed

Re: FieldName as case insenstive

2017-03-05 Thread Mikhail Khludnev
Hello, Preeti. Field name it case sensitive. Probably you need to extend default query parser of case insensitivity, or check something about aliases in eDisMax, iirc. On Mon, Mar 6, 2017 at 9:31 AM, Preeti Bhat wrote: > Hi All, > > Did anyone get a chance to look at this? > > > Thanks and Reg

RE: FieldName as case insenstive

2017-03-05 Thread Preeti Bhat
Hi All, Did anyone get a chance to look at this? Thanks and Regards, Preeti Bhat From: Preeti Bhat Sent: Friday, March 03, 2017 2:47 PM To: solr-user Subject: FieldName as case insenstive Hi All, I have a field named "CompanyName" in one of my collection. When I try to search CompanyName:xyz

Re: Use Solr Suggest to autocomplete words and suggest co-occurences

2017-03-05 Thread Mikhail Khludnev
Hello, Georg! Have you seen http://blog.mikemccandless.com/2014/01/finding-long-tail-suggestions-using.html ? On Sun, Mar 5, 2017 at 11:43 PM, Georg Sorst wrote: > Hi all, > > is there a way to get the suggester to autocomplete words and suggest > co-occurences instead of suggesting complete fie

Re: Use Solr Suggest to autocomplete words and suggest co-occurences

2017-03-05 Thread Joel Bernstein
The significantTerms streaming expression could be useful as a co-occurrence based suggester. This coming in Solr 6.5 but could be easily backported to earlier releases. This blog describes how it works: http://joelsolr.blogspot.com/2017/02/anomaly-detection-in-solr-65.html Joel Bernstein http://

Use Solr Suggest to autocomplete words and suggest co-occurences

2017-03-05 Thread Georg Sorst
Hi all, is there a way to get the suggester to autocomplete words and suggest co-occurences instead of suggesting complete field values? The behavior I'm looking for is quite similar to Google, only based on index values not actual queries. Let's say there are two items in the index: 1. "Adid

Re: maxwarmingSearchers and memory leak

2017-03-05 Thread SOLR4189
1) We've actually got 60 to 80 GB of index on the machine (in the image below you can see that size of index on the machine 82GB, because all index is in path /opt/solr): 2) Our commits runs: autoSoftCommit - each 15 minutes and autoHard

Re: I want to contribute custom made NLP based solr filters but dont know how.

2017-03-05 Thread Joel Bernstein
I believe StanfordCore is licensed under the GPL which means it will be incompatible with the Apache License. Would it be possible to port to a different NLP library? Joel Bernstein http://joelsolr.blogspot.com/ On Sun, Mar 5, 2017 at 12:14 PM, Erick Erickson wrote: > Well, you've taken the fir

Re: Setting up to index multiple datastores

2017-03-05 Thread Erick Erickson
bq: Is each shard/replica/core in fact a separate instance? No. I'm defining "instance" here as a JVM running Solr. And be careful here, a "shard" is made up of one or more "replicas". Those replicas may or may not be distributed amongst separate JVMs/machines. Each replica of a given shard has t

Re: I want to contribute custom made NLP based solr filters but dont know how.

2017-03-05 Thread Erick Erickson
Well, you've taken the first step ;). Start by going here: https://issues.apache.org/jira/browse/SOLR/ and creating a logon and a JIRA. NOTE: Before you go to the trouble of creating a patch, it's perfectly OK to do a high-level overview of the approach you used and see what the feedback is. It'l

Re: Use case for the Shingle Filter

2017-03-05 Thread Ryan Josal
I thought new versions of solr didn't split on whitespace at the query parser anymore, so this should work? That being said, I think I remember it having a problem coming after a synonym filter. IIRC, if your input is "Foo Bar" and you have a synonym "foo <=> baz" you would get foobaz bazbar inst

Re: What is the bottleneck for an optimise operation? / solve the disk space and time issues by specifying multiple segments to optimize

2017-03-05 Thread Caruana, Matthew
Hi Rick, We already do this with 30 eight-core machines running seven jobs each, working off a shared queue. See https://github.com/ICIJ/extract which has been in production for almost two years. Originally developed in order to OCR almost ten million PDFs and TIFFs from the Panama Papers. Mat

Re: What is the bottleneck for an optimise operation? / solve the disk space and time issues by specifying multiple segments to optimize

2017-03-05 Thread Rick Leir
Hi Matthew OCR is something which can be parallelized outside of Solr/Tika. Do one OCR task per core, and you can have all cores running at 100%. Write the OCR output to a staging area in the filesystem. cheers -- Rick On 2017-03-03 03:00 AM, Caruana, Matthew wrote: This is the current con

I want to contribute custom made NLP based solr filters but dont know how.

2017-03-05 Thread Avtar Singh Mehra
Hello everyone, I have developed project called WiseOwl which is basically a fact based question answering system which can be accessed at : https://github.com/asmehra95/wiseowl In the process of making the project work i have developed pluggable solr filters optimised for solr 6.3.0. I would like

RE: Use case for the Shingle Filter

2017-03-05 Thread Markus Jelsma
Hello - we use it for text classification and online near-duplicate document detection/filtering. Using shingles means you want to consider order in the text. It is analogous to using bigrams and trigrams when doing language detection, you cannot distinguish between Danish and Norwegian solely o

Re: Data Import Handler, also "Real Time" index updates

2017-03-05 Thread Damien Kamerman
You could configure the dataimporthandler to not delete at the start (either do a delta or set the preimportdeltequery), and set a postimportdeletequery if required. On Saturday, 4 March 2017, Alexandre Rafalovitch wrote: > Commit is index global. So if you have overlapping timelines and commit

Re: Setting up to index multiple datastores

2017-03-05 Thread Daniel Miller
On 3/4/2017 12:00 PM, Shawn Heisey wrote: On 3/3/2017 11:28 PM, Daniel Miller wrote: What I think I want is create a single collection, with a shard/replica/core per user. Or maybe I'm wanting a separate collection per user - which would again mean a single shard/replica/core. But it seems lik