RE: single word Vs multiple word search

2007-12-04 Thread Dilip.TS
Hi, This is in continuation with my previous mail. Iam using the SOLRInputDocument to perform the index operation. So, my question if a field to be indexed contains multiple values, then does the SOLRInputDocument performs the index for each word for that field or does it does for the set of wor

single word Vs multiple word search

2007-12-04 Thread Dilip.TS
Hi, Consider the scenario: I have indexed a document with a field1 having the values as "Test solr search " (having multiple words) And when i perform the keyword search as "Test solr search" i do get the results, whereas when i do the search for the "Test", i dont get any results, Any quick inp

Re: Tomcat6 env-entry

2007-12-04 Thread Yousef Ourabi
Tomcat unpacks the jar into the webapps directory based off the context name anyway... What was the original thinking behind not having solr/home set in the web.xml -- seems like an easier way to deal with this. I would imagine most people are more familiar with setting params in web.xml than

Re: Tomcat6 env-entry

2007-12-04 Thread Chris Hostetter
: It works excellently in Tomcat 6. The toughest thing I had to deal with is : discovering that the environment variable in web.xml for solr/home is : essential. If you skip that step, it won't come up. no, there's no reason why you should need to edit the web.xml file ... the solr/home property

Re: 1.2 commit script chokes on 1.2 response format

2007-12-04 Thread Chris Hostetter
: It's a trivial fix, and it seems like it's already been done in trunk: : : http://svn.apache.org/viewvc/lucene/solr/trunk/src/scripts/commit?r1=543259&r2=555612&view=patch : : The change has not been applied to 1.2. It might be nice if it were. i'm not sure what you mean by "applied to 1

Re: Distribution without SSH?

2007-12-04 Thread Chris Hostetter
: I recently set up Solr with distribution on a couple of servers. I just : learned that our network policies do not permit us to use SSH with : passphraseless keys, and the snappuller script uses SSH to examine the master : Solr instance's state before it pulls the newest index via rsync. you ma

Re: LowerCaseFilterFactory and spellchecker

2007-12-04 Thread Chris Hostetter
: It does make some sense, but I'm not sure that it should be blindly analyzed : without adding logic to handle certain cases (like the QueryParser does). : What happens if the analyzer produces two tokens? The spellchecker has to : deal with this appropriately. Spell checkers should be able to

RE: SOLR sorting - question

2007-12-04 Thread Kasi Sankaralingam
Thanks a ton, that worked -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 04, 2007 3:08 PM To: solr-user@lucene.apache.org Subject: Re: SOLR sorting - question Kasi Sankaralingam wrote: > Do I need to select the fields in the query that I am trying

solr + maven?

2007-12-04 Thread Ryan McKinley
Is anyone managing solr projects with maven? I see: https://issues.apache.org/jira/browse/SOLR-19 but that is >1 year old If someone has a current pom.xml, can you post it on SOLR-19? I just started messing with maven, so I don't really know what I am doing yet. thanks ryan

Re: SOLR sorting - question

2007-12-04 Thread Ryan McKinley
Kasi Sankaralingam wrote: Do I need to select the fields in the query that I am trying to sort on?, for example if I want sort on update date then do I need to select that field? I don't think so... are you getting an error? I run queries like: /select?q=*:*&fl=name&sort=added desc without p

Re: SOLR sorting - question

2007-12-04 Thread climbingrose
I don't think you have to. Just try the query on the REST interface and you will know. On Dec 5, 2007 9:56 AM, Kasi Sankaralingam <[EMAIL PROTECTED]> wrote: > Do I need to select the fields in the query that I am trying to sort on?, > for example if I want sort on update date then do I need to se

SOLR sorting - question

2007-12-04 Thread Kasi Sankaralingam
Do I need to select the fields in the query that I am trying to sort on?, for example if I want sort on update date then do I need to select that field? Thanks,

Re: synonyms

2007-12-04 Thread Laurent Gilles
Hi, I had to work with this kind of sides effects reguarding multiwords synonyms. We installed solr on our project that extensively uses synonyms, a big list that sometimes could bring out some wrong match as the one noticed by Anuvenk for instance > dui => drunk driving defense > or > dui,drunk

Re: out of heap space, every day

2007-12-04 Thread Charles Hornberger
It seems to me that another way to write the formula -- borrowing Python syntax -- is: 4 * numDocs + 38 * len(uniqueTerms) + 2 * sum([len(t) for t in uniqueTerms]) That's 4 bytes per document, plus 38 bytes per term, plus 2 bytes * the sum of the lengths of the terms. (Numbers taken from http://m

Re: Cache use

2007-12-04 Thread Matthew Phillips
Thanks for the suggestion, Dennis. I decided to implement this as you described on my collection of about 400,000 documents, but I did not receive the results I expected. Prior to putting the indexes on a tmpfs, I did a bit of benchmarking and found that it usually takes a little under two sec

Re: out of heap space, every day

2007-12-04 Thread Charles Hornberger
> See Lucene's FieldCache.StringIndex To understand just what's getting stored for each string field, you may also want to look at the createValue() method of the inner Cache object instantiated as stringsIndexCache in FieldCacheImpl.java (line 399 in HEAD): http://svn.apache.org/viewvc/lucene/ja

Re: out of heap space, every day

2007-12-04 Thread Yonik Seeley
On Dec 4, 2007 3:11 PM, Norskog, Lance <[EMAIL PROTECTED]> wrote: > "String[nTerms()]": Does this mean that you compare the first term, then > the second, etc.? Otherwise I don't understand how to compare multiple > terms in two records. Lucene sorting only supports a single term per document for

RE: out of heap space, every day

2007-12-04 Thread Norskog, Lance
"String[nTerms()]": Does this mean that you compare the first term, then the second, etc.? Otherwise I don't understand how to compare multiple terms in two records. Lance -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Tuesday, Decemb

Re: out of heap space, every day

2007-12-04 Thread Brian Whitman
int[maxDoc()] + String[nTerms()] + size_of_all_unique_terms. Then double that to allow for a warming searcher. This is great, but can you help me parse this? Assume 8M docs and I'm sorting on an int field that is unix time (seonds since epoch.) For the purposes of the experiment assume eve

RE: out of heap space, every day

2007-12-04 Thread Norskog, Lance
Thanks! I've seen a few formulae like this go by over the months. Can someone please make a wiki page for memory and processing estimation with locality properties? Or is there a Lucene page we can use? Lance -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behal

RE: How to delete records that don't contain a field?

2007-12-04 Thread Norskog, Lance
Oops, I should explain. *:* means all records. This trick puts a positive query in front of your negative query, and that allows it to work. Lance -Original Message- From: Rob Casson [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 04, 2007 7:44 AM To: solr-user@lucene.apache.org Subj

Re: Cache use

2007-12-04 Thread evgeniy . strokin
Thanks, this is very interesting idea. But my index folder is about 30Gb. Max RAM I could get probably is 16Gb. Rest could be in swap, but I think it will kill the whole idea.. May be it will be useful to put just some files from index folder to RAM? If this is possible at all))... - Origi

Re: Cache use

2007-12-04 Thread evgeniy . strokin
Any suggestions are helpful to me,. even general.. Here is the info from my index: How big is the index on disk (the most important files are .frq, and .prx if you do phrase queries? - Total index folder size is 30.7 Gb - .frq is 12.2 Gb - .prx is 6 Gb How big and what exactly is a record in

Re: Cache use

2007-12-04 Thread Mike Klaas
On 4-Dec-07, at 8:43 AM, Evgeniy Strokin wrote: Hello,... we have 110M records index under Solr. Some queries takes a while, but we need sub-second results. I guess the only solution is cache (something else?)... We use standard LRUCache. In docs it says (as far as I understood) that it lo

Re: SOLR 1.3 trunk error

2007-12-04 Thread Matthew Runo
Wow. So I feel stupid. Sorry to waste your time =p --Matthew On Dec 4, 2007, at 10:36 AM, Ryan McKinley wrote: did you try 'ant clean' before running 'ant dist'? the method signature for SortSpec changed recently Matthew Runo wrote: Ooops, I get this error when I try to search an index

Re: SOLR 1.3 trunk error

2007-12-04 Thread Ryan McKinley
did you try 'ant clean' before running 'ant dist'? the method signature for SortSpec changed recently Matthew Runo wrote: Ooops, I get this error when I try to search an index with a few documents in it. ie.. http://dev14.zappos.com:8080/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&

Re: out of heap space, every day

2007-12-04 Thread Mike Klaas
On 4-Dec-07, at 8:10 AM, Brian Carmalt wrote: Hello, I am also fighting with heap exhaustion, however during the indexing step. I was able to minimize, but not fix the problem by setting the thread stack size to 64k with "-Xss64k". The minimum size is os specific, but the VM will tell you

Re: SOLR 1.3 trunk error

2007-12-04 Thread Matthew Runo
Ooops, I get this error when I try to search an index with a few documents in it. ie.. http://dev14.zappos.com:8080/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on caching : true numDocs : 5 maxDoc : 5 readerImpl : MultiReader readerDir : org.apache.lucene.store.FSDirectory@/opt/so

SOLR 1.3 trunk error

2007-12-04 Thread Matthew Runo
Hello! I'm trying to make use of SOLR 1.3, svn trunk, and get the following error. SEVERE: java.lang.NoSuchMethodError: org.apache.solr.search.QParser.getSort(Z)Lorg/apache/solr/search/ QueryParsing$SortSpec; at org .apache .solr.handler.component.QueryComponent.prepare(QueryComponent

Re: Cache use

2007-12-04 Thread Yonik Seeley
The first step is to look at what searches are taking too long, and see if there is a way to structure them so they don't take as long. The whole index doesn't have to be in memory to get good search performance, but 100M documents on a single server is big. We are working on distributed search (

Tomcat6 env-entry

2007-12-04 Thread Gary Harris
It works excellently in Tomcat 6. The toughest thing I had to deal with is discovering that the environment variable in web.xml for solr/home is essential. If you skip that step, it won't come up. solr/home java.lang.String F:\Tomcat-6.0.14\webapps\solr - Origi

Re: Cache use

2007-12-04 Thread Dennis Kubes
One way to do this if you are running on linux is to create a tempfs (which is ram) and then mount the filesystem in the ram. Then your index acts normally to the application but is essentially served from Ram. This is how we server the Nutch lucene indexes on our web search engine (www.visvo

Cache use

2007-12-04 Thread Evgeniy Strokin
Hello,... we have 110M records index under Solr. Some queries takes a while, but we need sub-second results. I guess the only solution is cache (something else?)... We use standard LRUCache. In docs it says (as far as I understood) that it loads view of index in to memory and next time works with

Re: Faceting mutiple fields with different limits

2007-12-04 Thread Erik Hatcher
On Dec 4, 2007, at 10:37 AM, Wagner,Harry wrote: Anyone know of a problem with faceting on more than 1 field and using a different facet.limit for each field? I'm using a query like: ...facet=true&facet.mincount=1&facet.limit=15&facet.field=fpn&facet.li mi t=-1&facet.field=ln&facet.limit=

Re: out of heap space, every day

2007-12-04 Thread Brian Carmalt
Hello, I am also fighting with heap exhaustion, however during the indexing step. I was able to minimize, but not fix the problem by setting the thread stack size to 64k with "-Xss64k". The minimum size is os specific, but the VM will tell you if you set the size too small. You can try it, it

Re: out of heap space, every day

2007-12-04 Thread Yonik Seeley
On Dec 4, 2007 10:59 AM, Brian Whitman <[EMAIL PROTECTED]> wrote: > > > > For faceting and sorting, yes. For normal search, no. > > > > Interesting you mention that, because one of the other changes since > last week besides the index growing is that we added a sort to an > sint field on the queri

Re: out of heap space, every day

2007-12-04 Thread Brian Whitman
For faceting and sorting, yes. For normal search, no. Interesting you mention that, because one of the other changes since last week besides the index growing is that we added a sort to an sint field on the queries. Is it reasonable that a sint sort would require over 2.5GB of heap on

Re: out of heap space, every day

2007-12-04 Thread Yonik Seeley
On Dec 4, 2007 10:46 AM, Brian Whitman <[EMAIL PROTECTED]> wrote: > Are there 'native' memory requirements for solr as a function of > index size? For faceting and sorting, yes. For normal search, no. -Yonik

out of heap space, every day

2007-12-04 Thread Brian Whitman
This maybe more of a general java q than a solr one, but I'm a bit confused. We have a largish solr index, about 8M documents, the data dir is about 70G. We're getting about 500K new docs a week, as well as about 1 query/second. Recently (when we crossed about the 6M threshold) resin has

Re: How to delete records that don't contain a field?

2007-12-04 Thread Rob Casson
i'm using this: *:* -[* TO *] which is what lance suggested..works just fine. fyi: https://issues.apache.org/jira/browse/SOLR-381 On Dec 3, 2007 8:09 PM, Norskog, Lance <[EMAIL PROTECTED]> wrote: > Wouldn't this be: *:* AND "negative query" > > > -Original Message- > From: [EMAIL PROTEC

Re: Invalid character in search results

2007-12-04 Thread Yonik Seeley
On Dec 4, 2007 5:02 AM, Maciej Szczytowski <[EMAIL PROTECTED]> wrote: > Hi, I use Solr 1.1 application for indexing russian documents. Sometimes > I've got as search results docs with invalid character. > > For example I've indexed "иго" but search returned "и��о". It's strange > because something

Re: Issues using keyword searching and facet search together in a search operation

2007-12-04 Thread Yonik Seeley
On Dec 4, 2007 5:39 AM, Dilip.TS <[EMAIL PROTECTED]> wrote: > When i use both the Keyword search and the facet search together in a same > search operation, > I dont get any results whereas if i perform them seperately, i could get > back the results. add debugQuery=on to your requests (and chan

Field seperater for highlighting multi-value fields

2007-12-04 Thread Wagner,Harry
Hi, The default field separator seems to be a '.' when highlighting multi-value fields. Can this be overridden in 1.2 to another character? Thanks! harry

RE: Issues using keyword searching and facet search together in a search operation

2007-12-04 Thread Dilip.TS
Hi, Considering the following scenario where i need to use keyword search on fields title and description with the keyword typed as testing And using the search on fields price, publisher and tag , the fields publisher and tag being selected for the facet searching If the constructed queryStr

Re: Issues using keyword searching and facet search together in a search operation

2007-12-04 Thread Erick Erickson
I can't answer the question, but I *can* guarantee that the people who can will give you *much* better responses if you include some details. Like which analyzers you use, how you submit the query, samples of the two queries that work and the one that doesn't. Imagine you're on the receiving end i

Issues using keyword searching and facet search together in a search operation

2007-12-04 Thread Dilip.TS
Hi, When i use both the Keyword search and the facet search together in a same search operation, I dont get any results whereas if i perform them seperately, i could get back the results. Is it a constraint from the SOLR point of view? Thanks in advance. Regards, Dilip TS

Invalid character in search results

2007-12-04 Thread Maciej Szczytowski
Hi, I use Solr 1.1 application for indexing russian documents. Sometimes I've got as search results docs with invalid character. For example I've indexed "иго" but search returned "и��о". It's strange because something has changed 2 bytes into 6 bytes. иго - D0 B8 D0 B3 D0 BE и��о - D0 B8 EF