Re: Solr4 distributed IDF

2012-09-03 Thread Erick Erickson
When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in that thread and gets less attent

Re: Solr4 distributed IDF

2012-09-03 Thread veena rani
Hi, I have an issue with the # symbol, in solr, I m trying to search for string ends up with # , Eg:c#, it is throwing error Like, org.apache.lucene.queryparser.classic.ParseException: Cannot parse '(techskill:c': Encountered "" at line 1, column 12. Was expecting one of: ... ... ..

Re: Solr4 distributed IDF

2012-09-03 Thread Toke Eskildsen
On Fri, 2012-08-31 at 02:25 +0200, Lance Norskog wrote: > The math for "confidence values" in probability theory shows that > distributed DF does not matter after not very many documents. If you > have 10s of thousands of documents in each shard, don't worry. The old advice of distributing the doc

Re: Solr4 distributed IDF

2012-08-30 Thread Eric Wu
Hi Walter, Thank you for your help. I think you are right, the most important issue here is "the most selective terms are rare". So I probably still need to implement distributed IDF to get better results. On Fri, Aug 31, 2012 at 8:36 AM, Walter Underwood wrote: > That is true if you randoml

Re: Solr4 distributed IDF

2012-08-30 Thread Walter Underwood
That is true if you randomly distribute the documents. If they are distributed according to topic, there can be some big anomalies. Also, the DFs for rare terms will have bigger errors. There is some statistical theorem about this, but I can't remember it right now. Thanks to Zipf, most of your

Re: Solr4 distributed IDF

2012-08-30 Thread Eric Wu
Hi Steven and Otis, Thank you! That's very helpful information :) On Fri, Aug 31, 2012 at 4:19 AM, Steven A Rowe wrote: > Hi Ke, > > Have you seen ? > > Steve > > -Original Message- > From: Eric Wu [mailto:eirik...@gmail.com] > Sent:

Re: Solr4 distributed IDF

2012-08-30 Thread Eric Wu
Hi, Lance We may have unbalanced shards, does it matter? And do you know any post that has the detailed math about this? Thank you very much. On Fri, Aug 31, 2012 at 8:25 AM, Lance Norskog wrote: > The math for "confidence values" in probability theory shows that > distributed DF does not m

Re: Solr4 distributed IDF

2012-08-30 Thread Lance Norskog
The math for "confidence values" in probability theory shows that distributed DF does not matter after not very many documents. If you have 10s of thousands of documents in each shard, don't worry. On Thu, Aug 30, 2012 at 1:19 PM, Steven A Rowe wrote: > Hi Ke, > > Have you seen

RE: Solr4 distributed IDF

2012-08-30 Thread Steven A Rowe
Hi Ke, Have you seen ? Steve -Original Message- From: Eric Wu [mailto:eirik...@gmail.com] Sent: Thursday, August 30, 2012 3:05 AM To: solr-user@lucene.apache.org Subject: Solr4 distributed IDF Hi there, Does there exist any issue ticket

Re: Solr4 distributed IDF

2012-08-30 Thread Otis Gospodnetic
Hi Eric, This will show you some previous discussions, as well as the JIRA issue with oldish patches: http://search-lucene.com/?q=distributed+IDF&fc_project=Solr  Otis  Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm  - Original Message - > Fr