exceeded limit of maxWarmingSearchers = 4 =(

2011-04-12 Thread stockii
hello. my NRT-Search is not correctly configured =( 2 Solr-Instances. one "searcher" and one "updater" the updater start every minute an update of around 3000 documents. and the searcher start an commit ervery minute to refresh the index and read the new doc`s these are my Cache values for an

Re: Solr 3.1 performance compared to 1.4.1

2011-04-12 Thread Marius van Zwijndregt
Hi Lance, Well not actually copied over the whole configuration files, instead i just added in the missing configuration (into a fresh copy of the example directory). By the directory implementation do you mean the readers used by SolrIndexSearcher ? These are: reader : SolrIndexReader{this=1cb0

High (io) load and org.mortbay.jetty.EofException

2011-04-12 Thread Marius van Zwijndregt
Hello ! Every night within my maintenance window, during high load caused by postgresql (vacuum analyze), i see a few (10-30) messages showing up in the solr 3.1 logfile. SEVERE: org.mortbay.jetty.EofException at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791) at org.mortbay

Re: exceeded limit of maxWarmingSearchers = 4 =(

2011-04-12 Thread stockii
i start a commit on "searcher"-Core with: .../core/update?commit=true&waitFlush=false - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores < 100.000 - Solr1 for Search

Berlin Buzzwords - conference schedule released

2011-04-12 Thread Simon Willnauer
Hey folks, The Berlin Buzzwords team recently released the schedule for the conference on high scalability. The conference focuses on the topics search, data analysis and NoSQL. It is to take place on June 6/7th 2011 in Berlin. We are looking forward to two awesome keynote speakers who shaped the

Re: exceeded limit of maxWarmingSearchers = 4 =(

2011-04-12 Thread stockii
my filterCache has a warmupTime from ~6000 ... but my config is like this: LRU Cache(maxSize=3000, initialSize=50, autowarmCount=50 ...) should i set maxSize to 50 or similar value ? - --- System One Server, 12 GB RAM, 2 S

Re: exceeded limit of maxWarmingSearchers = 4 =(

2011-04-12 Thread stockii
oooh. my queryResultCache has a warmupTime from 54000 => ~1 Minute any suggestions ?? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores < 100.000 - Solr1 for Sea

Re: Decrease warmupTime

2011-04-12 Thread stockii
i fighting with the same problem but with jetty. its in this case necessary to delete also the jetty work-DIR ??? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores

Re: Indexing Best Practice

2011-04-12 Thread Darx Oman
Hi Lance thanx for your reply, but I have a question is this patch committed to trunk?

AbstractSolrTestCase and Solr 3.1.0

2011-04-12 Thread Tommaso Teofili
Hi all, I am porting a previously series of Solr plugins developed for 1.4.1 version to 3.1.0, I've written some integration tests extending the AbstractSolrTestCase [1] utility class but now it seems that wasn't included in the solr-core 3.1.0 artifact as it's in the solr/src/test directory. Was t

function query apply only in the subset of the query

2011-04-12 Thread Marco Martinez
Hi everyone, My situation is the next, I need to sum the value of a field to the score to the docs returned in the query, but not to all the docs, example: q=car returns 3 docs 1- name=car ford marketValue=1 score=1.3 2- name=car citroen marketValue=2 score=1.3 3- name=car mercedes marketValue

Help with Nested Query

2011-04-12 Thread Hasnain
Hi, Im trying to do somethinglike this in Solr 1.4.1 fq=category_id:(24 79) However the values inside the parenthesis will be fetched through another query, so far I’ve tried using _query_ but it doesnt work the way I want it to. Here is what im trying fq=category_id:(_query_:”{!lucene fl=catego

Solrj retry handling - prevent "ProtocolException: Unbuffered entity enclosing request can not be repeated"

2011-04-12 Thread Martin Grotzke
Hi, from time to time we're seeing a "ProtocolException: Unbuffered entity enclosing request can not be repeated." in the logs when sending ~500 docs to solr (the stack trace is at the end of the email). I'm aware that this was discussed before (e.g. [1]) and our solution was already to reduce th

Updates during Optimize

2011-04-12 Thread stockii
Hello. When is start an optimize (which takes more than 4 hours) no updates from DIH are possible. i thougt solr is copy the hole index and then start an optimize from the copy and not lock the index and optimize this ... =( any way to do both in the same time ? - ---

Re: AbstractSolrTestCase and Solr 3.1.0

2011-04-12 Thread Robert Muir
On Tue, Apr 12, 2011 at 6:44 AM, Tommaso Teofili wrote: > Hi all, > I am porting a previously series of Solr plugins developed for 1.4.1 version > to 3.1.0, I've written some integration tests extending the > AbstractSolrTestCase [1] utility class but now it seems that wasn't included > in the sol

Re: XML not coming through from nabble to Gmail

2011-04-12 Thread Erick Erickson
Chris: Here's the nabble URL: http://lucene.472066.n3.nabble.com/Strip-spaces-and-new-line-characters-from-data-tp2795453p2795453.html The message in the Solr list is from alexei on 8-April. "Strip spaces and newline characters from data". This started happening a couple (?) of weeks ago and I

Re: XML not coming through from nabble to Gmail

2011-04-12 Thread Erick Erickson
FWIW, I see the xml I just sent in gMail, so I'm guessing things are over on the nabble side, but I have very little evidence.. Erick P.S. It's not a huge deal, getting to the correct message on nabble is just a click away. But it is a bit annoying. On Tue, Apr 12, 2011 at 8:38 AM, Erick Eri

Re: DIH OutOfMemoryError?

2011-04-12 Thread stockii
Make sure streaming is on. --> how to check ? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores < 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB

SolrException: Unavailable Service

2011-04-12 Thread Phong Dais
Hi, I did not want to hijack this thread ( http://www.mail-archive.com/solr-user@lucene.apache.org/msg34181.html) but I am experiencing the same exact problem mentioned here. To sum up the issue, I am getting intermittent "Unavailable Service" exception during indexing commit phase. I know that I

RE: XML not coming through from nabble to Gmail

2011-04-12 Thread Steven A Rowe
I've asked on Nabble if they know of a fix for the problem: http://nabble-support.1.n2.nabble.com/solr-dev-mailing-list-tp6023495p6264955.html Steve > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Tuesday, April 12, 2011 8:43 AM > To: Chris Hostetter

Re: SolrException: Unavailable Service

2011-04-12 Thread Erick Erickson
If your commit from the client fails, you don't really know the state of your index anyway. All the threads you have sending documents to Solr are adding them to a single internal buffer. Committing flushes that buffer. So if thread 1 gets an error on commit, it will presumably have some documents

Re: SolrException: Unavailable Service

2011-04-12 Thread Erick Erickson
Sorry, fat fingers. Sent that last e-mail inadvertently. Anyway, if I have this correct, I'd recommend going to autocommit and NOT committing from the clients. That's usually the recommended procedure. This is especially true if you have a master/slave setup, because each commit from each client

Searching during postcommit

2011-04-12 Thread Reeza Edah Tally
Hi, I have been trying to perform a search using a CommonsHttpSolrServer when my postCommit event listener is called. I am not able to find the documents just commited; the "post" in postCommit caused me to assume that I would; it seems that the commit only takes effect when all postCommit hav

Re: function query apply only in the subset of the query

2011-04-12 Thread Erik Hatcher
Try using AND (or set q.op): q=car+AND+_val_:marketValue On Apr 12, 2011, at 07:11 , Marco Martinez wrote: > Hi everyone, > > My situation is the next, I need to sum the value of a field to the score to > the docs returned in the query, but not to all the docs, example: > > q=car returns 3

Analysing all tokens in a stream

2011-04-12 Thread bjornbear
Hi I would like to build a component that during indexing analyses all tokens in a stream and adds metadata to a new field based on my analysis. I have different tasks that I would like to perform, like basic classification and certain more advanced phrase detections. How would I do this? A normal

Re: AbstractSolrTestCase and Solr 3.1.0

2011-04-12 Thread Tommaso Teofili
Thanks Robert, that was very useful :) Tommaso 2011/4/12 Robert Muir > On Tue, Apr 12, 2011 at 6:44 AM, Tommaso Teofili > wrote: > > Hi all, > > I am porting a previously series of Solr plugins developed for 1.4.1 > version > > to 3.1.0, I've written some integration tests extending the > > Abs

Re: function query apply only in the subset of the query

2011-04-12 Thread Marco Martinez
Thanks but I tried this and I saw that this work in a standard scenario, but in my query i use a my own query parser and it seems that they dont doing the AND and returns all the docs in the index: My query: _query_:"{!bm25}car" AND _val_:marketValue -> 67000 docs returned Solr query parser car

Re: XML not coming through from nabble to Gmail

2011-04-12 Thread Chris Hostetter
: : Here's the nabble URL: : : http://lucene.472066.n3.nabble.com/Strip-spaces-and-new-line-characters-from-data-tp2795453p2795453.html : : The message in the Solr list is from alexei on 8-April. "Strip spaces and : newline characters from data". And the raw message as recieved by apache... h

Re: Updates during Optimize

2011-04-12 Thread Shawn Heisey
On 4/12/2011 6:21 AM, stockii wrote: Hello. When is start an optimize (which takes more than 4 hours) no updates from DIH are possible. i thougt solr is copy the hole index and then start an optimize from the copy and not lock the index and optimize this ... =( any way to do both in the same ti

Re: Updates during Optimize

2011-04-12 Thread Jason Rutherglen
You can index and optimize at the same time. The current limitation or pause is when the ram buffer is flushing to disk, however that's changing with the DocumentsWriterPerThread implementation, eg, LUCENE-2324. On Tue, Apr 12, 2011 at 8:34 AM, Shawn Heisey wrote: > On 4/12/2011 6:21 AM, stockii

Re: Fwd: machine tags, copy fields and pattern tokenizers

2011-04-12 Thread straup
I'm not sure it's a 100% solution but the new path hierarchy tokenizer seems promising. I've only played with a little bit with a little too booze and not enough sleep (in the sky) so apologies for the potty-mouth-ness of this blog post. http://www.aaronland.info/weblog/2011/04/02/status/#sky

Solr 1.30 Collection Distribution Search

2011-04-12 Thread Li Tan
I have 1 master, and 2 slaves setup with 1.30 collection distribution. My frontwed web application does query to the master, do I need to change any code in the web application to query on the slaves? or does the master requests query from the slaves automatcially? Please help thx.

Re: SolrException: Unavailable Service

2011-04-12 Thread Phong Dais
Erick, My setup is not quite the way you described. I have multiple threads indexing simultaneously, but I only have 1 thread doing the commit after all indexing threads finished. I have multiple instances of this running each in their own java vm. I'm ok with throwing out all the docs indexed

Re: Solr 1.30 Collection Distribution Search

2011-04-12 Thread Erick Erickson
Yes. You need to put, say, a load balancer on front of your slaves and distribute the requests to the slave. Best Erick On Tue, Apr 12, 2011 at 2:20 PM, Li Tan wrote: > I have 1 master, and 2 slaves setup with 1.30 collection distribution. My > frontwed web application does query to the master,

Re: SolrException: Unavailable Service

2011-04-12 Thread Erick Erickson
See below: On Tue, Apr 12, 2011 at 2:21 PM, Phong Dais wrote: > Erick, > > My setup is not quite the way you described. I have multiple threads > indexing simultaneously, but I only have 1 thread doing the commit after > all > indexing threads finished. I have multiple instances of this runnin

Spellchecking in the Chinese Lanugage

2011-04-12 Thread alexw
Hi, I have been trying to get spellcheck to work in the Chinese language. So far I have not had any luck. Can someone shed some light here as a general guide line in terms of what need to happen? I am using the CJKAnalyzer in the text field type and searching works fine, but spelling does not wor

Re: Indexing Flickr and Panaramio

2011-04-12 Thread Estrada Groups
Did this go to the list? I think I may need to resubscribe... Sent from my iPhone On Apr 12, 2011, at 12:55 AM, Estrada Groups wrote: > Has anyone tried doing this? Got any tips for someone getting started? > > Thanks, > Adam > > Sent from my iPhone

Re: Solr 1.30 Collection Distribution Search

2011-04-12 Thread Li
Thanks Eric, I thought the master does automatically when you setup collection distribution. I wish there are more document for 1.3 collection distribution. Do you know how to show the slave stats on the Master admin page, the distribution tab? Thanks in advance guys. Sent from my iPhone On Ap

Re: Indexing Flickr and Panaramio

2011-04-12 Thread Otis Gospodnetic
It did: http://search-lucene.com/?q=panaramio Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Estrada Groups > To: Estrada Groups > Cc: "solr-user@lucene.apache.org" > Sent: Tue, Apri

Re: Spellchecking in the Chinese Lanugage

2011-04-12 Thread Otis Gospodnetic
Hi, Does spellchecking in Chinese actually make sense? I once asked a native Chinese speaker about that and the person told me it didn't really make sense. Anyhow, with n-grams, I don't think this could technically work even if it made sense for Chinese, could it? Otis Sematext :: http://

Re: Searching during postcommit

2011-04-12 Thread Otis Gospodnetic
If I follow things correctly, I think you should be seeing new documents only after the commit is done and the new index searcher is open and available for search. If you are searching before the new searcher is available, you are probably still hitting the old searcher. Otis Sematext ::

Re: Indexing Flickr and Panaramio

2011-04-12 Thread Péter Király
Hi, I did Flickr into Lucene about 3 years ago. There is a Flickr API, which covers almost everything you need (as I remember, not always Flickr feature was implemented at that time in the API, like the "collection" was not searchable). You can harvest by user ID or searching for a topic. You can

Re: function query apply only in the subset of the query

2011-04-12 Thread Yonik Seeley
On Tue, Apr 12, 2011 at 10:25 AM, Marco Martinez wrote: > Thanks but I tried this and I saw that this work in a standard scenario, but > in my query i use a my own query parser and it seems that they dont doing > the AND and returns all the docs in the index: > > My query: > _query_:"{!bm25}car" A

Re: Spellchecking in the Chinese Lanugage

2011-04-12 Thread Luke Lu
It doesn't make sense to spell check individual character sized words, but makes a lot of sense for phrases. Due to pervasive use of pinyin IM, it's very easy to write phrases that are totally wrong in semantics and but "sounds" correct. n-gram should work if it doesn't mangle the characters. On T

Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-12 Thread Renee Sun
Hi Hoss, thanks for your response... you are right I got a typo in my question, but I did use maxSegments, and here is the exactly url I used: curl 'http://localhost:8080/solr/97/update?optimize=true&maxSegments=10&waitFlush=true' I used jconsole and du -sk to monitor each partial optimize, and

Re: Spellchecking in the Chinese Lanugage

2011-04-12 Thread alexw
Thanks Otis and Luke. Yes it does make sense to spellcheck phrases in Chinese. Looks like the default Solr spellCheck component is already doing some kind of NGram-ing. When examining the spellCheck index, I did see gram1, gram2, gram3, gram4... The problem is no Chinese terms were indexed into th

Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-12 Thread Chris Hostetter
: /tmp # ls /xxx/solr/data/32455077/index | wc ---> this is the start point, 150 seg files : 150 150 946 : /tmp # time curl the number of files i nthe index directory is not the "number of segments" the number of segments is an internal lucene concept that impacts

Re: Indexing Flickr and Panaramio

2011-04-12 Thread Estrada Groups
Thanks Peter! I am thinking that I may just use Nutch to do the crawl and index off of these sites. I need to check out the APIs for each to make sure I'm not missing anything related to the geospatial data for each image. Obviously both do the extraction when the images are uploaded so I'm gues

Vetting Our Architecture: 2 Repeaters and Slaves.

2011-04-12 Thread Parker Johnson
I am hoping to get some feedback on the architecture I've been planning for a medium to high volume site. This is my first time working with Solr, so I want to be sure what I'm planning isn't totally weird, unsupported, etc. We've got a a pair of F5 loadbalancers and 4 hosts. 2 of those hosts

Re: Vetting Our Architecture: 2 Repeaters and Slaves.

2011-04-12 Thread Erick Erickson
I think the repeaters are misleading you a bit here. The purpose of a repeater is usually to replicate across a slow network, say in a remote data center, then slaves at that center can get more timely updates. I don't think they add anything to your disaster recovery scenario. So I'll ignore repe

Re: Solr and Permissions

2011-04-12 Thread Liam O'Boyle
ManifoldCF sounds like it might be the right solution, so long as it's not secretly building a filter query in the back end, otherwise it will hit the same limits. In the meantime, I have made a minor improvement to my filter query; it now scans the permitted IDs and attempts to build a filter que

Re: Vetting Our Architecture: 2 Repeaters and Slaves.

2011-04-12 Thread Otis Gospodnetic
Hi Parker, Lovely ASCII art. :) Yes, I think you can simplify this by introducing shared storage (e.g., SAN) that hosts the index to which you active/primary master writes. When your primary master dies, you start your stand-by master that is configured to point to the same index. If there

Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-12 Thread Renee Sun
ok I dug more into this and realize the file extensions can vary depending on schema, right? for instance we dont have *.tvx, *.tvd, *.tvf (not using term vector)... and I suspect the file extensions may change with future lucene releases? now it seems we can't just count the file using any formul

Is it possible to create a duplicate field ?

2011-04-12 Thread shrinath.m
For example, I am storing email ids of a person. If the person has 3 email ids, I want to store them as email = 'x...@whatever.com' email = 'a...@blah.com' email = 'p...@moreblah.com' How can we do this ? I know someone will come up with "why don't you store it like email1, email2, email3 and

Re: Is it possible to create a duplicate field ?

2011-04-12 Thread William Bell
Just set up your schema with a "string" multivalued field... On Wed, Apr 13, 2011 at 12:47 AM, shrinath.m wrote: > For example, I am storing email ids of a person. If the person has 3 email > ids, I want to store them as > email = 'x...@whatever.com' > email = 'a...@blah.com' > email = 'p...@more