Re: &fq degrades qtime in a 20million doc collection

2016-01-15 Thread Anria B.
hi Yonik We definitely didn't overlook that q=* being a wildcard scan, we just had so many systemic problems to focus on I neglected to thank Shawn for that particular piece of useful information. I must admit, I seriously never knew this. Ever since q=* was allowed I was so happy that it never

Re: state.json base_url has internal IP of ec2 instance set instead of 'public DNS' entry in some cases

2016-01-15 Thread Brendan Grainger
Hi Hoss, Thanks for your help. Going over the install page again I realized I had originally not adjusted the value of SOLR_HOST and it had started up using the default internal IP. I changed that to the public DNS and restarted solr. However in /live_nodes I then had 2 values: one for the publ

Re: state.json base_url has internal IP of ec2 instance set instead of 'public DNS' entry in some cases

2016-01-15 Thread Brendan Grainger
Hi Hoss, Thanks for the reply. I installed the service using the install script. I double checked it and it looks like it install solr.in.sh in /etc/defaults/solr.in.sh. It actually looks like if it is in /var the install script moves it into /etc/defaults (unless I’m reading this wrong): http

Re: state.json base_url has internal IP of ec2 instance set instead of 'public DNS' entry in some cases

2016-01-15 Thread Chris Hostetter
: What I’m finding is that now and then base_url for the replica in : state.json is set to the internal IP of the AWS node. i.e.: : : "base_url":"http://10.29.XXX.XX:8983/solr”, : : On other attempts it’s set to the public DNS name of the node: : : "base_url":"http://ec2_host:8983/solr”, : :

Re: &fq degrades qtime in a 20million doc collection

2016-01-15 Thread Yonik Seeley
On Wed, Jan 13, 2016 at 7:01 PM, Shawn Heisey wrote: [...] >> 2. q=*&fq=someField:SomeVal ---> takes 2.5 seconds >> 3.q=someField:SomeVal --> 300ms [...] >> >> have any of you encountered such a thing? >> that FQ degrades query time by so much? > A value of * for your query will be sl

Re: &fq degrades qtime in a 20million doc collection

2016-01-15 Thread Toke Eskildsen
Anria B. wrote: > Thanks Toke for this. It gave us a ton to think about, and it really helps > supporting the notion of several smaller indexes over one very large one,> > where we can rather distribute a few JVM processes with less size each, than > have one massive one that is according to this

Re: Solr Block join not working after parent update

2016-01-15 Thread Jack Krupansky
Read the note at the bottom of the doc page: "One limitation of indexing nested documents is that the whole block of parent-children documents must be updated together whenever any changes are required. In other words, even if a single child document or the parent document is changed, the whole blo

Re: &fq degrades qtime in a 20million doc collection

2016-01-15 Thread Anria B.
Thanks Toke for this. It gave us a ton to think about, and it really helps supporting the notion of several smaller indexes over one very large one, where we can rather distribute a few JVM processes with less size each, than have one massive one that is according to this, less efficient. Toke

Re: Query results change

2016-01-15 Thread Brian Narsi
Data is indexed using Data Import Handler with clean=true, commit=true and optimize=true. After that there are no updates or delete. The setup is SolrCloud with 2 shards and 2 replicas each. If the data and query has not changed, one expects to see the same results on repeated searches; so it is

Re: collapse filter query

2016-01-15 Thread Joel Bernstein
The bug only occurs if you collapse on a numeric field. If you can re-index the field into a String field it should work fine. You can also use grouping with facets. Depending on you usecase this might be your best choice: https://cwiki.apache.org/confluence/display/solr/Result+Grouping Joel Ber

Re: Issue with stemming and lemmatizing

2016-01-15 Thread Jack Krupansky
Yes, you can do all of that, but... Solr is more of a toolkit rather than a packaged solution, so you will have plug together all the pieces yourself. There are a variety of stemmers in Solr and any number of techniques for have to index and query using the stemmed and unstemmed variants of words.

Re: collapse filter query

2016-01-15 Thread sara hajili
Tnx Joel. I wanted to get distinct result from solr.so I found to approach collapse filter and facet. And more like this doesn't support facet. And as u said solr 5.3 has bug on collapse filter. If I wont to immigrate to solr 5.4. Is any other approach to get distinct value that I can use in solr

Issue with stemming and lemmatizing

2016-01-15 Thread sara hajili
I wanna to write my own text tokenizer. And my question is about what solr treat with stemming or lemmatizing? Solr store both lemmatizerd token and orginal token together? I mean if in index time solr lemmatize creation to create. And in query time.user want to search about exactly creation not c

state.json base_url has internal IP of ec2 instance set instead of 'public DNS' entry in some cases

2016-01-15 Thread Brendan Grainger
Hi, I am creating a new collection using the following get request: http://ec2_host:8983/solr/admin/collections?action=CREATE&name=collection_name_1&collection.configName=oem/conf&numShards=1 What I’m finding is that now and then base_url for the replica in state.json is set to the internal IP

Boost query vs function query in edismax query

2016-01-15 Thread sara hajili
Hi all as I underestood. Both of them are for affecting on relevence scoring.but u have more dominate on relevence scoring when using boosted query.is it true? I am willing to understand more about difference between 2. And know what is best situation for using each other? Tnx.

Solr relevancy scoring issue

2016-01-15 Thread sara hajili
Hi all . I have a issue with solr scoring. How solr scoring treat ? I mean is it linearly?

Re: Speculation on Memory needed to efficently run a Solr Instance.

2016-01-15 Thread Toke Eskildsen
Jack Krupansky wrote: > Again to be clear, if you really do need the best/minimal overall query > latency, your best bet is to have sufficient system memory to fully cache > the entire index. If you actually don't need minimal latency, then of > course you can feel free to trade off RAM for lower

Re: SolR 5.3.1 deletes index files

2016-01-15 Thread Daniel Collins
I know Solr used to have issues with indexes on NFS, there was a segments.gen file specifically for issues around that, though that was removed in 5.0. But you say this happens on local disks too, so that would rule NFS out of it. I still think you should look at ensuring your merge policy is turn

Re: Speculation on Memory needed to efficently run a Solr Instance.

2016-01-15 Thread Jack Krupansky
Personally, I'll continue to recommend that the ideal goal is to fully cache the entire Lucene index in system memory, as well as doing a proof of concept implementation to validate actual performance for your actual data. You can do a POC with a small fraction of your full data, like 15% or even 1

Re: Speculation on Memory needed to efficently run a Solr Instance.

2016-01-15 Thread Erick Erickson
And to make matters worse, much worse (actually, better)... See: https://issues.apache.org/jira/browse/SOLR-8220 That ticket (and there will be related ones) is about returning data from DocValues fields rather than from the stored data in some situations. Which means it will soon (I hope) be ent

Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

2016-01-15 Thread Jack Krupansky
Yeah, and to the original question, there is no master list of features and how SolrCloud vs. legacy distributed mode compare feature by feature. And until SolrCloud actually does subsume every single (important) feature of legacy distributed mode, Solr probably still needs to continue to support

Re: Query results change

2016-01-15 Thread Erick Erickson
Probably the fact that information from deleted/updated documents is still hanging around in the corpus until merged away. The nub of the issue is that terms in deleted documents (or the replaced doc if you update) still influence tf/idf calculations. If you optimize as Binoy suggests, all of the

RE: Pro and cons of using Solr Cloud vs standard Master Slave Replica

2016-01-15 Thread Davis, Daniel (NIH/NLM) [C]
In the multi-tenant model, SolrCloud shines because the configuration directories need not include any details about the cluster.SolrCloud also shines if the number of documents and/or indexing rate requires sharding. But master-slave with replica configuration is OK if you have just a coupl

Re: SolR 5.3.1 deletes index files

2016-01-15 Thread Moll, Dr. Andreas
Hi, If you look at the files at the ls-Output in my last post you will see that SolR has deleted the segments_f -file. Thus the index can no longer be loaded. I also had other cases in which the data directory of SolR was empty after the SolR shutdown. And yes, it ist bad. Best regards Andre

Re: SolR 5.3.1 deletes index files

2016-01-15 Thread Daniel Collins
Can I just clarify something. The title of this thread implies Solr is losing data when it shuts down which would be really bad(!) The core isn't deleting any data, it is performing a merge, so the data exists, just in fewer larger segments instead of all the smaller segments you had before. So

Leader Election Time

2016-01-15 Thread Robert Brown
Hi, I have 2 shards, 1 leader and 1 replica in each. I've just removed a leader from one of the shards but the replica hasn't become a leader yet. How quickly should this normally happen? tickTime=2000 dataDir=/home/rob/zoodata clientPort=2181 initLimit=5 syncLimit=2 Thanks, Rob

Re: Can we create multiple cluster in single Zookeeper instance

2016-01-15 Thread Shawn Heisey
On 1/15/2016 4:14 AM, Mugeesh Husain wrote: > Actually i have a question , if i will use single zookeeper, > > suppose I have a 3 cluster and each of cluster used zookeeper instance(only > one zk). > > how we will manage zk in a way all of cluster will not communicate each > other? This is not t

Re: Classes in solr_home /lib cannot import from solr/dist

2016-01-15 Thread Shawn Heisey
On 1/15/2016 5:36 AM, Callum Lamb wrote: > Good to know Solr already loads them, that removed a bunch of lines from my > solrconfig.xml. > > Having to copy the required jars from dist/ to lib/ isn't ideal but if > that's the only solution then at least I can stop searching for a solution > and figu

Re: Query results change

2016-01-15 Thread Binoy Dalal
You should try debugging such queries to see how exactly they're being executed. That will give you an idea as to why you're seeing the results you see. On Fri, 15 Jan 2016, 19:05 Brian Narsi wrote: > We have an index of 25 fields. Currently number of records in index is > about 120,000. We are

Query results change

2016-01-15 Thread Brian Narsi
We have an index of 25 fields. Currently number of records in index is about 120,000. We are using parser: edismax qf: contains 8 fields fq: 1 field mm = 1 qs = 6 pf: containing g 3 fields bf: containing 1 field We have noticed that sometimes results change between two searches even if ever

Re: Position increment in WordDelimiterFilter.

2016-01-15 Thread Emir Arnautovic
Can you please send us tokens you get (and positions) when you analyze *WiFi device* On 15.01.2016 13:15, Modassar Ather wrote: Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? I am using WhiteSpaceTokenizer in my analysis chain so wi fi becomes two different token. Please

Re: Classes in solr_home /lib cannot import from solr/dist

2016-01-15 Thread Callum Lamb
Good to know Solr already loads them, that removed a bunch of lines from my solrconfig.xml. Having to copy the required jars from dist/ to lib/ isn't ideal but if that's the only solution then at least I can stop searching for a solution and figure out how best to deal with this limitation. I ass

Re: Position increment in WordDelimiterFilter.

2016-01-15 Thread Modassar Ather
Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? I am using WhiteSpaceTokenizer in my analysis chain so wi fi becomes two different token. Please refer to my examples given in previous mail about the issues faced. Wi Fi are two term which will match but what happens if for a co

Re: Issue in custom filter

2016-01-15 Thread Smitha Rajiv
Thanks Ahmet.It worked. As per your suggestion i have changed the code as below. final String term=charTermAttr.toString(); final String convertedTerm = Converter.convert(term); charTermAttr.setEmpty().append(convertedTerm); return true; now for the input stream "term1 part 2 ass

Re: indexing rich data with solr 5.3

2016-01-15 Thread kostali hassan
thank you Erik for your precious advice. 2016-01-14 17:24 GMT+00:00 Erik Hatcher : > And also, bin/post can be your friend when it comes to troubleshooting or > introspecting Tika parsing via /update/extract. Like this: > > $ bin/post -c test -params "extractOnly=true&wt=ruby&indent=yes" -out ye

Re: Can we create multiple cluster in single Zookeeper instance

2016-01-15 Thread Mugeesh Husain
Thanks Shawn&Anria B. for opinion. Actually i have a question , if i will use single zookeeper, suppose I have a 3 cluster and each of cluster used zookeeper instance(only one zk). how we will manage zk in a way all of cluster will not communicate each other? if you still any clarification i

Re: SolR 5.3.1 deletes index files

2016-01-15 Thread Moll, Dr. Andreas
Hi, we still have the problem that SolR deletes index files on closing the application if the index was changed in the meantime from the production application (which has an embedded SolR-Server). The problem also occurs if we use a local file system instead of a NFS. I have changed the loglevel t

RE: Speculation on Memory needed to efficently run a Solr Instance.

2016-01-15 Thread Gian Maria Ricci - aka Alkampfer
THanks a lot I'll have a look to Sematext SPM. Actually the index is not static, but the number of new documents will be small and probably they will be indexed during the night, so I'm not expecting too much problem from merge factor. We can index new document during the night and then optimize

Re: Issue in custom filter

2016-01-15 Thread Ahmet Arslan
Hi Simitha, Please try below : final String term = charTermAttr.toString(); final String s = convertedTerm = Converter.convert(term); // If not changed, don't waste the time adjusting the token.if ((s != null) && !s.equals(term)) charTermAttr.setEmpty().append(s); Ah

Re: Speculation on Memory needed to efficently run a Solr Instance.

2016-01-15 Thread Emir Arnautovic
Hi, OS does not care much about search v.s. retrieve so amount of RAM needed for file caches would depend on your index usage patterns. If you are not retrieving stored fields much and most/all results are only id+score, than it can be assumed that you can go with less RAM than actual index si

Issue in custom filter

2016-01-15 Thread Smitha Rajiv
Hi I have a requirement such that while indexing if tokens contains numbers, it needs to be converted into corresponding words. e.g : term1 part 2 assignments -> termone part two assignments. I have created a custom filter with following code: @Override public boolean incrementToken() throws IO

Speculation on Memory needed to efficently run a Solr Instance.

2016-01-15 Thread Gian Maria Ricci - aka Alkampfer
Hi, When it is time to calculate how much RAM a solr instance needs to run with good performance, I know that it is some form of art, but I'm looking at a general "formula" to have at least one good starting point. Apart the RAM devoted to Java HEAP, that is strongly dependant on how I con

Re: Position increment in WordDelimiterFilter.

2016-01-15 Thread Emir Arnautovic
Modassar, Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? Why do you use WordDelimiterFilter? Can you give us few examples where it is useful? Thanks, Emir On 15.01.2016 05:13, Modassar Ather wrote: Thanks for your responses. It seems to me that you don't want to split

Re: Solr Block join not working after parent update

2016-01-15 Thread Mikhail Khludnev
On Thu, Jan 14, 2016 at 10:01 PM, sairamkumar wrote: > This is a show stopper. Kindly suggest solution/alternative. update whole block. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics

RE: Pro and cons of using Solr Cloud vs standard Master Slave Replica

2016-01-15 Thread Gian Maria Ricci - aka Alkampfer
Yes, I've checked that jira some weeks ago and it is the reason why I was telling that there is still no clear procedure to backup SolrCloud in current latest version. I'm glad that the priority is Major, but until it is not closed in an official version, I have to tell to customers that there

RE: Monitor backup progress when location parameter is used.

2016-01-15 Thread Gian Maria Ricci - aka Alkampfer
Ok thanks, I also think that it's worth a jira, because for restore operation we have a convenient restorestatus command that tells exactly the status of the restore operation, I think that a backupstatus command could be useful. -- Gian Maria Ricci Cell: +39 320 0136949 -Original Mess

Solr Block join not working after parent update

2016-01-15 Thread sairamkumar
Hi, Solr search with child field(s) is not working after an update in the parent field(s). Parent entity has 20 million and child has 30 million records. This is a show stopper. Kindly suggest solution/alternative. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Block-