Re: anyone use hadoop+solr?

2010-06-24 Thread Otis Gospodnetic
Marc, In Map, purposely ending up with lots of smaller indices/shards at the end of the whole MapReduce job. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Marc Sturlese > To: solr-u

Re: questions about Solr shards

2010-06-24 Thread Otis Gospodnetic
Hi Babak, 1. Yes, you are reading that correctly. 2. This describes the situation where, for instance, a document with ID=10 is updated between the 2 calls to the Solr instance/shard where that doc ID=10 lives. 3. Yup, orthogonal. You can have a master with multiple cores for sharded and non

Re: solr indexing takes a long time and is not reponsive to abort command

2010-06-24 Thread Don Werve
2010/6/25 Ya-Wen Hsu > This situation doesn't happen consistently. When we only ran the > problematic core, the indexing took significant longer than usual(4hrs -> 11 > hrs). It ran successful in the end. When we ran indexing for all cores at > the same time, the problematic core never finished i

Re: Synonym configuration

2010-06-24 Thread Koji Sekiguchi
(10/06/25 11:33), xdzgor wrote: Hi, can someone please confirm the following statements about configuration for the synonym filter, or correct me where I'm wrong? a => b a search for "a", is changed into a search for "b" a, b => c a search for "a" or a search for "b", is changed into a searc

Synonym configuration

2010-06-24 Thread xdzgor
Hi, can someone please confirm the following statements about configuration for the synonym filter, or correct me where I'm wrong? a => b a search for "a", is changed into a search for "b" a, b => c a search for "a" or a search for "b", is changed into a search for "c" (the same as a=>c and b=>

Re: Similarity

2010-06-24 Thread Dave Searle
You could write some client code to translate your query into the following (Foo and baz) or (foo or baz) This seems to work well for me On 24 Jun 2010, at 21:20, Blargy wrote: > > > Yonik Seeley-2-2 wrote: >> >> Depends on the larger context of what you are trying to do. >> Do you still wa

questions about Solr shards

2010-06-24 Thread Babak Farhang
Hi everyone, There are a couple of notes on the limitations of this approach at http://wiki.apache.org/solr/DistributedSearch which I'm having trouble understanding. 1. "When duplicate doc IDs are received, Solr chooses the first doc and discards subsequent ones" "Received" here is from the p

RE: solr indexing takes a long time and is not reponsive to abort command

2010-06-24 Thread Ya-Wen Hsu
This situation doesn't happen consistently. When we only ran the problematic core, the indexing took significant longer than usual(4hrs -> 11 hrs). It ran successful in the end. When we ran indexing for all cores at the same time, the problematic core never finished indexing such that we have to

Re: Similarity

2010-06-24 Thread Yonik Seeley
On Thu, Jun 24, 2010 at 4:20 PM, Blargy wrote: > Yonik Seeley-2-2 wrote: >> >> Depends on the larger context of what you are trying to do. >> Do you still want the idf and length norm relevancy factors?  If not, >> use a filter, or boost the particular clause with 0. >> > > I do want the other rel

Re: Similarity

2010-06-24 Thread Blargy
Yonik Seeley-2-2 wrote: > > Depends on the larger context of what you are trying to do. > Do you still want the idf and length norm relevancy factors? If not, > use a filter, or boost the particular clause with 0. > I do want the other relevancy factors.. ie boost, phrase-boosting etc but I j

Re: performance sorting multivalued field

2010-06-24 Thread Marc Sturlese
Thanks, that's very useful info. However can't reproduce the error. I've created and index where all documents have a multivalued date field and each document have a minimum of one value in that field. (most of the docs have 2 or 3). So, the number of un-inverted term instances is greater than the

Re: Similarity

2010-06-24 Thread Yonik Seeley
On Thu, Jun 24, 2010 at 3:17 PM, Blargy wrote: > > Can someone explain how I can override the default behavior of the tf > contributing a higher score for documents with repeated words? > > For example: > > Query: "foo" > Doc1: "foo bar" score 1.0 > Doc2: "foo foo bar" score 1.1 > > Doc2 contains

Similarity

2010-06-24 Thread Blargy
Can someone explain how I can override the default behavior of the tf contributing a higher score for documents with repeated words? For example: Query: "foo" Doc1: "foo bar" score 1.0 Doc2: "foo foo bar" score 1.1 Doc2 contains "foo" twice so it is scored higher. How can I override this behavi

Re: Can query boosting be used with a custom request handlers?

2010-06-24 Thread Chris Hostetter
: > Maybe this helps: : > http://wiki.apache.org/solr/SolrPlugins#QParserPlugin Right ... from the point of view of a custom RequestHandler (or SearchComponent) they key is to follow the model used by QueryComponent and use "QParser.getParser(...)" to deal with parsing query strings. Then all

Re: Some minor Solritas layout tweaks

2010-06-24 Thread Erik Hatcher
Ken - thanks for these improvements! Comments below... On Jun 23, 2010, at 8:24 PM, Ken Krugler wrote: I grabbed the latest & greatest from trunk, and then had to make a few minor layout tweaks. 1. In main.css, the ".query-box input" { height} isn't tall enough (at least on my Mac 10.5/FF

Re: performance sorting multivalued field

2010-06-24 Thread wojtekpia
Chris Hostetter-3 wrote: > > sorting on a multivalued is defined to have un-specified behavior. it > might fail with an error, or it might fail silently. > I learned this the hard way, it failed silently for a long time until it failed with an error: http://lucene.472066.n3.nabble.com/Diffe

Re: Multiple Solr Webapps in Glassfish with JNDI

2010-06-24 Thread Kelly Taylor
Yes, but I dont see that Glassfish has the concept of "context fragments" like Tomcat does...even though under the covers Glassfish is a bit of Tomcat...(Catalina) -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-Solr-Webapps-in-Glassfish-with-JNDI-tp918383p920008.htm

Re: performance sorting multivalued field

2010-06-24 Thread Chris Hostetter
: I just like play with things. First checked the behavior of sorting on : multiValued field and what I noticed was, let's say you have docs with field sorting on a multivalued is defined to have un-specified behavior. it might fail with an error, or it might fail silently. fundementally solr

Re: dataimport.properties is not updated on delta-import

2010-06-24 Thread Erick Erickson
Is there any chance that the "id" field is, indeed, missing for those documents? Does your schema require ID? I've also seen constraints added to a DB that are not retro-active, so even if there is a constraint requiring ID it's still possible that some items in your DB don't have them. A shot in

Re: MoreLikeThis (mlt) : use the match's maxScore for result score normalization

2010-06-24 Thread MitchK
Chantal, have a look at http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/search/similar/MoreLikeThis.html More like this to have a guess what the MLT's score concerns. The problem is that you can't compare scores. The query for the "normal" result-response was maybe something like

Re: MoreLikeThis (mlt) : use the match's maxScore for result score normalization

2010-06-24 Thread Chantal Ackermann
Hi Otis, thank you for this super quick answer. I understand that normalizing and comparing scores is fishy, and I wouldn't want to do it for regular search results. I just thought that in this special case, the maxScore which is returned for the input document to the MoreLikeThis handler -- and

Re: underscore, comma in terms.prefix

2010-06-24 Thread stockii
okay thx. WordDelimiterFactory with the option generateNumberParts="0" maked trouble ;-) -- View this message in context: http://lucene.472066.n3.nabble.com/underscore-comma-in-terms-prefix-tp919565p919655.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: underscore, comma in terms.prefix

2010-06-24 Thread Otis Gospodnetic
stocki, Solr's Analysis page will tell you what's happening. I can't tell by just looking, though I would first try removing the CommonGramsFF and see if repetition is still happening. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://sear

Re: MoreLikeThis (mlt) : use the match's maxScore for result score normalization

2010-06-24 Thread Otis Gospodnetic
Chantal, The short answer is that you can't compare relevancy scores across requests. I think this may be in a FAQ. Check this: http://search-lucene.com/?q=score+compare+absolute+relative&fc_project=Lucene&fc_project=Solr Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucen

Re: dataimport.properties is not updated on delta-import

2010-06-24 Thread warb
Hello again! Upon further investigation it seems that something is amiss with delta-import after all, the delta-import does not actually import anything (I thought it did when I ran it previously but I am not sure that was the case any longer.) It does complete successfully as seen from the front

MoreLikeThis (mlt) : use the match's maxScore for result score normalization

2010-06-24 Thread Chantal Ackermann
Hi there, consider the following response extract for a MoreLikeThis request: The first result element is the document that was input and for which to return "more like this" results. The second result element contains the results returned by the handler. As they both come with a different ma

underscore, comma in terms.prefix

2010-06-24 Thread stockii
Hello. this is my filterchain for suggestion with termsComponent:

Re: fuzzy query performance

2010-06-24 Thread Peter Karich
wow! indeed a lot faster (~order of a magnitude). Hopefully we do not encounter a bug with the trunk :-) So, Thanks and congrats for that awesome piece of software! > On Wed, Jun 23, 2010 at 3:34 PM, Peter Karich wrote: > > >> So, you mean I should try it out her: >> http://svn.apache.org/vie

Re: Solr 1.4 - Image-Highlighting and Payloads

2010-06-24 Thread MitchK
Sebastian, sounds like an exciting project. > We've found the argument "TokenGroup" in method "highlightTerm" > implemented in SimpleHtmlFormatter. "TokenGroup" provides the method > "getPayload()", but the returned value is always "NULL". > No, Token provides this method, not TokenGroup. Bu

Re: anyone use hadoop+solr?

2010-06-24 Thread Marc Sturlese
Hi Otis, just for curiosity, wich strategy do you use? Index in the map or reduce side? Do you use it to build shards or a single monolitic index? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p919335.html Sent from the Solr - User mai

Re: fuzzy query performance

2010-06-24 Thread Peter Karich
Thanks, Robert and Otis! will try it out now. Peter. > Btw. here you can see Robert's presentation on what he did to speed up fuzzy > queries: http://www.slideshare.net/otisg > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > >> So, you mean I should try it out he

Re: Alphabetic range

2010-06-24 Thread Sophie M.
Hello Otis, this morning, instead of http://localhost:8983/solr/music/select?indent=on&version=2.2&q=ArtistSort:mi*&fq=&start=0&rows=10&fl=ArtistSort&qt=standard&wt=standard&explainOther=&hl.fl= I tried : http://localhost:8983/solr/music/select?indent=on&version=2.2&q=ArtistSort:Mi*&fq=&start=

Re: Field missing when use distributed search + dismax

2010-06-24 Thread Scott Zhang
I believe I especially set it to fl=id,type. No luck. I believe there is something wrong when solr merge the results. On Thu, Jun 24, 2010 at 12:41 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > Make sure you list it in ...&fl=ID,type or set it in the defaults section > of your hand

SOLR-236 Patch

2010-06-24 Thread Amdebirhan, Samson, VF-Group
Hi Trying to apply the SOLR-236 patch to a trunk i get what follows. Can anyone help me understanding what I am missing ? . svn checkout http://svn.a