Re: cursorMark and timeAllowed are mutually exclusive?

2015-06-29 Thread Bernd Fehling
Thanks for your explanation. Right out of your head, are there any other options which prevent getting a cursorMark? Yes, that was also my idea to set up a separate request handler for harvesting without timeAllowed. As Shawn suggested, a short note about this should go into the documentation. R

Some guidance on memory requirements/usage/tuning

2015-06-29 Thread Caroline Hind
Hi, I am very new to SOLR, and would appreciate some guidance if anyone has the time to offer it. We have very recently upgraded from SOLR 4.1 to 5.2.1, and at the same time increased the physical RAM from 24Gb to 96Gb. We run multiple cores on this one server, approximately 20 in total, but

Re: optimize status

2015-06-29 Thread Upayavira
We need to work out why your performance is bad without optimise. What version of Solr are you using? Can you confirm that your config is using the TieredMergePolicy? Upayavira Oe, Jun 30, 2015, at 04:48 AM, Summer Shire wrote: > Hi Upayavira and Erick, > > There are two things we are talking a

Re: optimize status

2015-06-29 Thread Summer Shire
Hi Upayavira and Erick, There are two things we are talking about here. First: Why am I optimizing? If I don’t our SEARCH (NOT INDEXING) performance is 100% worst. The problem lies in the number of total segments. We have to have max segments 1 or 2. I have done intensive performance related

RE: optimize status

2015-06-29 Thread Reitzel, Charles
I see what you mean. Many thanks for the details. -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Monday, June 29, 2015 6:36 PM To: solr-user@lucene.apache.org Subject: Re: optimize status Reitzel, Charles wrote: > Question, Toke: in your "immutable"

Re: Correcting text at index time

2015-06-29 Thread Jack Krupansky
The regex replace processor can be used to do this: https://lucene.apache.org/solr/5_2_0/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.html -- Jack Krupansky On Mon, Jun 29, 2015 at 6:20 PM, Walter Underwood wrote: > Yes, do this in an update request processor before

Re: optimize status

2015-06-29 Thread Toke Eskildsen
Reitzel, Charles wrote: > Question, Toke: in your "immutable" cases, don't the benefits of > optimizing come mostly from eliminating deleted records? Not for us. We have about 1 deleted document for every 1000 or 10.000 standard documents. > Is there any material difference in heap, CPU, etc. b

Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-29 Thread Erick Erickson
You can also use the TermsComponent, that'll read the values from the indexed fields.That gets the raw terms, they aren't grouped. But you don't get the document. Reconstructing the doc from the postings lists is actually quite tedious. The Luke program (not request handler) has a function that do

Re: Questions regarding autosuggest (Solr 5.2.1)

2015-06-29 Thread Erick Erickson
Try not putting it in double quotes? Best, Erick On Mon, Jun 29, 2015 at 12:22 PM, Thomas Michael Engelke wrote: > > > A friend and I are trying to develop some software using Solr in the > background, and with that comes alot of changes. We're used to older > versions (4.3 and below). We espec

Re: Correcting text at index time

2015-06-29 Thread Walter Underwood
Yes, do this in an update request processor before it gets to the analyzer chain. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jun 29, 2015, at 3:19 PM, Erick Erickson wrote: > Hmmm, very hard to do currently. The _point_ of stored fields is that

Re: Correcting text at index time

2015-06-29 Thread Erick Erickson
Hmmm, very hard to do currently. The _point_ of stored fields is that an exact, verbatim copy of the input is returned in fl lists and this is violating that promise. I suppose some kind of custom update processor could work, but it's really "roll your own" funcitonality I think. Best, Erick On M

RE: optimize status

2015-06-29 Thread Reitzel, Charles
Question, Toke: in your "immutable" cases, don't the benefits of optimizing come mostly from eliminating deleted records? Is there any material difference in heap, CPU, etc. between 1, 5 or 10 segments? I.e. at how many segments/shard do you see a noticeable performance hit? Also, I curious

Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-29 Thread Upayavira
Use the schema browser on the admin UI, and click the "load term info" button. It'll show you the terms in your index. You can also use the analysis tab which will show you how it would tokenise stuff for a specific field. Upayavira On Mon, Jun 29, 2015, at 06:53 PM, Dinesh Naik wrote: > Hi Eric

RE: optimize status

2015-06-29 Thread Reitzel, Charles
Hi Garth, Yes, I'm straying from OP's question (I think Steve is all set). But his question, quite naturally, comes up often and a similar discussion ensues each time. I take your point about shards and segments being different things. I understand that the hash ranges per segment are not k

Re: optimize status

2015-06-29 Thread Upayavira
For the sake of history, somewhere around Solr/Lucene 3.2 a new "MergePolicy" was introduced. The old one merged simply based upon age, or "index generation", meaning the older the segment, the less likely it would get merged, hence needing optimize to clear out deletes from your older segments. T

Re: optimize status

2015-06-29 Thread Toke Eskildsen
Reitzel, Charles wrote: > Is there really a good reason to consolidate down to a single segment? In the scenario spawning this thread it does not seem to be the best choice. Speaking more broadly there are Solr setups out there that deals with immutable data, often tied to a point in time, e.g

RE: optimize status

2015-06-29 Thread Garth Grimm
" Is there really a good reason to consolidate down to a single segment?" Archiving (as one example). Come July 1, the collection for log entries/transactions in June will never be changed, so optimizing is actually a good thing to do. Kind of getting away from OP's question on this, but I don't

solr suggester build issues

2015-06-29 Thread Rajesh Hazari
Solr : 4.9.x , with simple solr cloud on jetty. JDK 1.7 num of replica : 4 , one replica for each shard num of shards : 1 Hi All, I have been facing below issues with solr suggester introduced in 4.7.x. Do any one have good working solution or buildOnCommit=true property is suggested not to use

Re: optimize status

2015-06-29 Thread Steven White
Thank you guys, this was very helpful. I was always under the impression that the index need to be optimize periodically to reclaim disk space otherwise the index will just keep on growing and growing (was that the case in Lucene 2.x and prior days?). I agree with Walter, renaming "optimize" to s

RE: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-29 Thread Dinesh Naik
Hi Eric, By compressed value I meant value of a field after removing special characters . In my example its "-". Compressed form of red-apple is redapple . I wanted to know if we can see the analyzed version of fields . For example if I use ngram on a field , how do I see the analyzed values in

RE: Jetty Plus for Solr 4.10.4

2015-06-29 Thread Tarala, Magesh
Hi Shawn - Thank you for the quick and detailed response!! Good to hear that Jetty 8 installation with solr for typical uses does not need to be modified. I believe what we have is a "typical" use case. We will be installing solr on 3 nodes in our Hadoop cluster. Will use Hadoop's zookeeper.

Re: cursorMark and timeAllowed are mutually exclusive?

2015-06-29 Thread Chris Hostetter
: > Have nothing found in the ref guides, docs, wiki, examples about this mutually : > exclusive parameters. : > : > Is this a bug or a feature and if it is a feature, where is the sense of this? The problem is that if a timeAllowed exceeded situation pops up, you won't get a nextCursorMark to

Re: Jetty Plus for Solr 4.10.4

2015-06-29 Thread Shawn Heisey
On 6/29/2015 8:44 AM, Tarala, Magesh wrote: > We are planning to go to production with Solr 4.10.4. Documentation > recommends to use full Jetty package that includes JettyPlus. I'm not able to > find the instructions to do this. Can someone point me in the right direction? I found the official

RE: optimize status

2015-06-29 Thread Reitzel, Charles
Is there really a good reason to consolidate down to a single segment? Any incremental query performance benefit is tiny compared to the loss of managability. I.e. shouldn't segments _always_ be kept small enough to facilitate re-balancing data across shards? Even in non-cloud instances th

Re: cursorMark and timeAllowed are mutually exclusive?

2015-06-29 Thread Shawn Heisey
On 6/29/2015 9:12 AM, Bernd Fehling wrote: > while just trying cursorMark I got the following search response: > > "error": { > "msg": "Can not search using both cursorMark and timeAllowed", > "code": 400 > } > > > Yes, I'm using timeAllowed which is set in my requestHandler as > invariant

Questions regarding autosuggest (Solr 5.2.1)

2015-06-29 Thread Thomas Michael Engelke
A friend and I are trying to develop some software using Solr in the background, and with that comes alot of changes. We're used to older versions (4.3 and below). We especially have problems with the autosuggest feature. This is the field definition (schema.xml) for our autosuggest field: .

cursorMark and timeAllowed are mutually exclusive?

2015-06-29 Thread Bernd Fehling
Hi list, while just trying cursorMark I got the following search response: "error": { "msg": "Can not search using both cursorMark and timeAllowed", "code": 400 } Yes, I'm using timeAllowed which is set in my requestHandler as invariant to 6 (60 seconds) as a limit to "killer search

Re: SolrCloud Document Update Problem

2015-06-29 Thread Amit Jha
It was because of the issues Rgds AJ > On Jun 29, 2015, at 6:52 PM, Shalin Shekhar Mangar > wrote: > >> On Mon, Jun 29, 2015 at 4:37 PM, Amit Jha wrote: >> Hi, >> >> I setup a SolrCloud with 2 shards each is having 2 replicas with 3 >> zookeeper ensemble. >> >> We add and update documents f

Jetty Plus for Solr 4.10.4

2015-06-29 Thread Tarala, Magesh
We are planning to go to production with Solr 4.10.4. Documentation recommends to use full Jetty package that includes JettyPlus. I'm not able to find the instructions to do this. Can someone point me in the right direction? Thanks, Magesh

Re: optimize status

2015-06-29 Thread Walter Underwood
“Optimize” is a manual full merge. Solr automatically merges segments as needed. This also expunges deleted documents. We really need to rename “optimize” to “force merge”. Is there a Jira for that? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jun

Architectural advice & questions on using Solr XML DataImport Handlers (and Nutch) for a Vertical Search engine.

2015-06-29 Thread Arthur Yarwood
Please bear with me here, I'm pretty new to Solr with most of me DB experience being of the relational variety. I'm planning a new project, which I believe Solr (and Nutch) will solve well. Although I've installed Solr 5.2 and Nutch 1.10 (on Centos) and tinkered about a bit, I'd be grateful for

Re: SolrCloud Document Update Problem

2015-06-29 Thread Shalin Shekhar Mangar
On Mon, Jun 29, 2015 at 4:37 PM, Amit Jha wrote: > Hi, > > I setup a SolrCloud with 2 shards each is having 2 replicas with 3 > zookeeper ensemble. > > We add and update documents from web app. While updating we delete the > document and add same document with updated values with same unique id.

Re: Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-29 Thread Erick Erickson
Not quite sure what you mean by "compressed values". admin/luke doesn't show the results of the compression of the stored values, there's no way I know of to do that. Best, Erick On Mon, Jun 29, 2015 at 8:20 AM, dinesh naik wrote: > Hi all, > > Is there a way to read the indexed data for field o

Re: optimize status

2015-06-29 Thread Erick Erickson
Steven: Yes, but First, here's Mike McCandles' excellent blog on segment merging: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html I think the third animation is the TieredMergePolicy. In short, yes an optimize will reclaim disk space. But as you update, this is

RE: Correcting text at index time

2015-06-29 Thread hossmaa
Hi Markus Thanks for the reply. I'm already using the Synonyms filter and it is working fine (i.e., when I search for "customer", it also returns documents containing "cst."). What the synonyms filter does not do is to actually replace the word "cst." with "customer" in the document. Just to be c

RE: Correcting text at index time

2015-06-29 Thread Markus Jelsma
Hello - why not just use synonyms or StemmerOverrideFilter? Markus -Original message- > From:hossmaa > Sent: Monday 29th June 2015 14:08 > To: solr-user@lucene.apache.org > Subject: Correcting text at index time > > Hi everyone > > I'm wondering if it's possible in Solr to correct t

Reading indexed data from solr 5.1.0 using admin/luke?

2015-06-29 Thread dinesh naik
Hi all, Is there a way to read the indexed data for field on which the analysis/processing has been done ? I know using admin GUI we can see field wise analysis But how can i get hold on the complete document using admin/luke? or any other way? For example, if i have 2 fields called name and co

set the param [facet.offset] for EVERY [facet.pivot]

2015-06-29 Thread lzqxb
HI All:I need a pagenigation with facet offset. There are two or more fields in [facet.pivot], but only one value for [facet.offset], eg: facet.offset=10&facet.pivot=field_1,field_2. In this condition, field_2 is 10's offset and then field_1 is 10's offset. But what I want is field_2

Re: optimize status

2015-06-29 Thread Steven White
Hi Upayavira, This is news to me that we should not optimize and index. What about disk space saving, isn't optimization to reclaim disk space or is Solr somehow does that? Where can I read more about this? I'm on Solr 5.1.0 (may switch to 5.2.1) Thanks Steve On Mon, Jun 29, 2015 at 4:16 AM,

Correcting text at index time

2015-06-29 Thread hossmaa
Hi everyone I'm wondering if it's possible in Solr to correct text at indexing time, based on a synonyms-like list. This would be great for expanding undesirable abbreviations (for example, "cst." instead of "customer"). I've been searching the Solr docs and the web quite thoroughly I believe, but

SolrCloud Document Update Problem

2015-06-29 Thread Amit Jha
Hi, I setup a SolrCloud with 2 shards each is having 2 replicas with 3 zookeeper ensemble. We add and update documents from web app. While updating we delete the document and add same document with updated values with same unique id. I am facing a very strange issue that some time 2 documents ha

Re: issue with highlighting in solr 4.10.2

2015-06-29 Thread Dmitry Kan
Hi Erick, The Contents field contains one sentence only and no "watch" exists in it. Plus we use quite large snippet size to surely cover the field. Dmitry On Sat, Jun 27, 2015 at 6:16 PM, Erick Erickson wrote: > Does watch exist in the Contents field somewhere outside the snippet > size you'v

Re: need advice on parent child mulitple category

2015-06-29 Thread Mikhail Khludnev
http://wiki.apache.org/solr/HierarchicalFaceting On Mon, Jun 29, 2015 at 11:27 AM, Darniz wrote: > hello > > any advice please > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/need-advice-on-parent-child-mulitple-category-tp4214140p4214602.html > Sent from the Solr

SOLR 5.1.0 DB dataimport handler from orientdb

2015-06-29 Thread Nauman Ramzan
Hi everyone ! I want to import data from orientdb in solr 5.1.0. here is my configurations *data-config.xml* > > > > driver="com.orientechnologies.orient.jdbc.OrientJdbcDriver" >> url="jdbc:orient:remote:localhost/emallates_combine" user="root" >> password="root" batchSize="-1"/> > > >

Re: need advice on parent child mulitple category

2015-06-29 Thread Darniz
hello any advice please -- View this message in context: http://lucene.472066.n3.nabble.com/need-advice-on-parent-child-mulitple-category-tp4214140p4214602.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: optimize status

2015-06-29 Thread Upayavira
I'm afraid I don't understand. You're saying that optimising is causing performance issues? Simple solution: DO NOT OPTIMIZE! Optimisation is very badly named. What it does is squashes all segments in your index into one segment, removing all deleted documents. It is good to get rid of deletes -

Re: optimize status

2015-06-29 Thread Summer Shire
Have to cause of performance issues. Just want to know if there is a way to tap into the status. > On Jun 28, 2015, at 11:37 PM, Upayavira wrote: > > Bigger question, why are you optimizing? Since 3.6 or so, it generally > hasn't been requires, even, is a bad thing. > > Upayavira > >> On Su