Re: JVM OOM when using field collapse component

2009-10-02 Thread Martijn v Groningen
No I have not encountered OOM exception yet with current field collapse patch. How large is your configured JVM heap space (-Xmx)? Field collapsing requires more memory then regular searches so. Does Solr run out of memory during the first search(es) or does it run out of memory after a while when

Re: populating synonyms.txt

2009-10-02 Thread Michael Engesgaard
I understand that synonyms are domain-specific, although I could still see a benefit of having standardized synonyms.txt files (a thesaurus) for general use. Just like the ones you can download or is already embedded in word processors like Open Office Writer or MS Word. I can understand that you

Re: "Only one usage of each socket address" error

2009-10-02 Thread Steinar Asbjørnsen
Tried running solr on jetty now, and I still get the same error:(. Steinar Den 1. okt. 2009 kl. 16.23 skrev Steinar Asbjørnsen: Hi. This situation is still bugging me. I thought i had it fixed yday, but no... Seems like this goes both for deleting and adding, but I'll explain the delete-si

Re: field collapsing sums

2009-10-02 Thread Martijn v Groningen
Well that is odd. How have you configured field collapsing with the dismax request handler? The collapse counts should X - 1 (if collapse.threshold=1). Martijn 2009/10/1 Joe Calderon : > thx for the reply, i just want the number of dupes in the query > result, but it seems i dont get the correct

Re: best way to get the size of an index

2009-10-02 Thread Grant Ingersoll
On Oct 1, 2009, at 12:18 PM, Phillip Farber wrote: Resuming this discussion in a new thread to focus only on this question: What is the best way to get the size of an index so it does not get too big to be optimized (or to allow a very large segment merge) given space limits? I alrea

yellow pages navigation kind menu. howto take every 100th row from resultset

2009-10-02 Thread Julian Davchev
Hi, Long story short: how can I take every 100th row from solr resultset. What would syntax for this be. Long story: Currently I have lots of say documents(articles) indexed. They all have field title with corresponding value. atitle btitle . *title How do I build menu so I can search

debugQuery different score for same query. dismax

2009-10-02 Thread Julian Davchev
Hi, I run debug on a query to examine the score as I was surprised of results. Here is the diff of same explain section of two different rows that I found troubling. It looks for "pari" in ancestorName field but first row looks in 241135 records and the second row it's just 187821 records

debugQuery rows get different score for same field same value

2009-10-02 Thread Julian Davchev
Hi, I run debug on a query to examine the score as I was surprised of results. Here is the diff of same explain section of two different rows that I found troubling. It looks for "pari" in ancestorName field but first row looks in 241135 records and the second row it's just 187821 records

Re: Keepwords Schema

2009-10-02 Thread Shalin Shekhar Mangar
On Thu, Oct 1, 2009 at 7:37 PM, matrix_psj wrote: > > > An example: > My schema is about web files. Part of the syntax is a text field of authors > that have worked on each file, e.g. > >login.php > 2009-01-01 > alex, brian, carl carlington, dave alpha, eddie, dave > beta > > > When I p

Re: Query filters/analyzers

2009-10-02 Thread Shalin Shekhar Mangar
On Thu, Oct 1, 2009 at 7:59 PM, Claudio Martella wrote: > > About the copyField issue in general: as it copies the content to the > other field, what is the sense to define analyzers for the destination > field? The source is already analyzed so i guess that the RESULT of the > analysis is copied

Re: best way to get the size of an index

2009-10-02 Thread Mark Miller
Phillip Farber wrote: > > Resuming this discussion in a new thread to focus only on this question: > > What is the best way to get the size of an index so it does not get > too big to be optimized (or to allow a very large segment merge) given > space limits? > > I already have the largest 15,000rp

Re: "Only one usage of each socket address" error

2009-10-02 Thread Mauricio Scheffer
Did you try this? http://blogs.msdn.com/dgorti/archive/2005/09/18/470766.aspx Also, please post the full exception stack trace. 2009/10/2 Steinar Asbjørnsen > Tried running solr on jetty now, and I still get the same error:(. > > Stein

Re: Question on modifying solr behavior on indexing xml files..

2009-10-02 Thread Shalin Shekhar Mangar
On Thu, Oct 1, 2009 at 3:10 PM, Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340 wrote: > 1. In my playing around with > sending in an XML document within a an XML CDATA tag, > with termVectors="true" > > I noticed the following behavior: > peter > collapses to the term > personpeterperson > inste

conditional sorting

2009-10-02 Thread Bojan Šmid
Hi all, I need to perform sorting of my query hits by different criterion depending on the number of hits. For instance, if there are < 10 hits, sort by date_entered, otherwise, sort by popularity. Does anyone know if there is a way to do that with a single query, or I'll have to send another que

Re: Query filters/analyzers

2009-10-02 Thread Fergus McMenemie
>On Thu, Oct 1, 2009 at 7:59 PM, Claudio Martella > wrote: > >> >> About the copyField issue in general: as it copies the content to the >> other field, what is the sense to define analyzers for the destination >> field? The source is already analyzed so i guess that the RESULT of the >> analysis i

Re: trie fields and sortMissingLast

2009-10-02 Thread Yonik Seeley
On Thu, Oct 1, 2009 at 2:54 PM, Lance Norskog wrote: > Trie fields also do not support faceting. Only those that index multiple tokens per value to speed up range queries. > They also take more ram in > some operations. Should be less memory on average. -Yonik http://www.lucidimagination.com

Re: Query filters/analyzers

2009-10-02 Thread Shalin Shekhar Mangar
On Fri, Oct 2, 2009 at 6:44 PM, Fergus McMenemie wrote: > >The copy is done before analysis. The original text is sent to the > copyField > >which can choose to do analysis differently from the source field. > > > I have been wondering about this as well. The WIKI is not explicit about > what hap

Re: Solr Trunk Heap Space Issues

2009-10-02 Thread Jeff Newburn
Ah yes we do have some warming queries which would look like a search. Did that side change enough to push up the memory limits where we would run out like this? Also, would FastLRU cache make a difference? -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 > From

Re: Solr Trunk Heap Space Issues

2009-10-02 Thread Yonik Seeley
On Fri, Oct 2, 2009 at 9:54 AM, Jeff Newburn wrote: > Ah yes we do have some warming queries which would look like a search.  Did > that side change enough to push up the memory limits where we would run out > like this? What does the warming request(s) look like, and what are the field types for

Re: Solr Trunk Heap Space Issues

2009-10-02 Thread Mark Miller
Jeff Newburn wrote: > that side change enough to push up the memory limits where we would run out > like this? > Yes - now give us the FieldCache section from the stats section please :) Its not likely gonna do you any good, but it could be good information for us. -- - Mark http://www.luci

Re: Solr Trunk Heap Space Issues

2009-10-02 Thread Yonik Seeley
On Fri, Oct 2, 2009 at 10:02 AM, Mark Miller wrote: > Jeff Newburn wrote: >> that side change enough to push up the memory limits where we would run out >> like this? >> > Yes - now give us the FieldCache section from the stats section please :) And the fieldValueCache section too (used for multi

RE: Solr and Garbage Collection

2009-10-02 Thread siping liu
Hi, I read pretty much all posts on this thread (before and after this one). Looks like the main suggestion from you and others is to keep max heap size (-Xmx) as small as possible (as long as you don't see OOM exception). This brings more questions than answers (for me at least. I'm new to So

Re: Solr and Garbage Collection

2009-10-02 Thread Mark Miller
siping liu wrote: > Hi, > > I read pretty much all posts on this thread (before and after this one). > Looks like the main suggestion from you and others is to keep max heap size > (-Xmx) as small as possible (as long as you don't see OOM exception). This > brings more questions than answers (fo

Re: conditional sorting

2009-10-02 Thread Uri Boness
If the threshold is only 10, why can't you always sort by popularity and if the result set is <10 then resort on the client side based on date_entered? Uri Bojan Šmid wrote: Hi all, I need to perform sorting of my query hits by different criterion depending on the number of hits. For instanc

Re: How to access the information from SolrJ

2009-10-02 Thread Paul Tomblin
Nope, that just gets you the number of results returned, not how many there could be. Like I said, if you look at the XML returned, you'll see something like but only 10 returned. getNumFound returns 10 in that case, not 1251. 2009/10/2 Noble Paul നോബിള്‍ नोब्ळ् : > QueryResponse#getResults(

Re: "Only one usage of each socket address" error

2009-10-02 Thread Steinar Asbjørnsen
Ur the man Mauricio! Adding and setting MaxUserPort and TCPTimedWaitDelay in the registry sure helps! Over the wend I'll look into doing this programatically. Thanks! Steinar Den 2. okt. 2009 kl. 14.47 skrev Mauricio Scheffer: Did you try this? http://blogs.msdn.com/dgorti/archive/2005/09/

Re: conditional sorting

2009-10-02 Thread Bojan Šmid
I tried to simplify the problem, but the point is that I could have really complex requirements. For instance, "if in the first 5 results none are older than one year, use sort by X, otherwise sort by Y". So, the question is, is there a way to make Solr recognize complex situations and apply diffe

Re: Solr Trunk Heap Space Issues

2009-10-02 Thread Jeff Newburn
The warmers return 11 fields: 3 Strings 2 booleans 2 doubles 2 longs 1 sint (solr.SortableIntField) Let me know if you need the fields actually be searched on. name:  fieldCache   class:  org.apache.solr.search.SolrFieldCacheMBean   version:  1.0   description:  Provides introspection of the Luce

Re: Problem with Wildcard...

2009-10-02 Thread Christian Zambrano
Another thing to remember about wildcard and fuzzy searches is that none of the token filters will be applied. If you are using the LowerCaseFilterFactory at index time, then "RI-MC50034-1" gets converted to "ri-mc50034-1" which is never going to match "RI-MC5000*" Also, I would probably use

Re: JVM OOM when using field collapse component

2009-10-02 Thread Joe Calderon
heap space is 4gb set to grow up to 8gb, usage is normally ~1-2gb, seems to happen within a few searches. if its just me ill try to isolate it, it could be some other part of my implementation thx much On Fri, Oct 2, 2009 at 1:18 AM, Martijn v Groningen wrote: > No I have not encountered OOM ex

TermVector term frequencies for tag cloud

2009-10-02 Thread aodhol
Hello, I'm trying to create a tag cloud from a term vector, but the array returned (using JSON wt) is quite complex and takes an inordinate amount of time to process. Is there a better way to retrieve terms and their document TF? The TermVectorComponent allows for retrieval of tf and df though I'm

snapshot creation and distribution

2009-10-02 Thread Robert . Kay
Hello, A couple questions with regard to snapshots and distribution: 1. If two snapshots are created in between a snappull, are the changes from the first snapshot "missed" by the slave, as it only pulls the most recent snapshot? 2. When triggering snapshooter from the "postCommit" hook, does a

Google Side-By-Side UI

2009-10-02 Thread Lance Norskog
http://googleenterprise.blogspot.com/2009/08/compare-enterprise-search-relevance.html This is really cool, and a version for Solr would help in doing relevance experiments. We don't need the "select A or B" feature, just seeing search result sets side-by-side would be great. -- Lance Norskog gok

Re: best way to get the size of an index

2009-10-02 Thread Mark Miller
Mark Miller wrote: > Phillip Farber wrote: > >> Resuming this discussion in a new thread to focus only on this question: >> >> What is the best way to get the size of an index so it does not get >> too big to be optimized (or to allow a very large segment merge) given >> space limits? >> >> I al

Re: Solr Trunk Heap Space Issues

2009-10-02 Thread Jeff Newburn
I reran the test to try to ensure that other cores on the instance didn't have searches against them. This time I get NPE errors just trying to get into the stats after the system hits its limit. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 > From: Jeff Newbu

Re: snapshot creation and distribution

2009-10-02 Thread Bill Au
A snapshot is a copy of the index at a particular moment in time. So changes in earlier snapshots are in the latest one as well. Nothing is missed by pulling the latest snapshot. When triggering snapshooter with the postCommit hook, a commit always results in a snapshot being created. Bill On

Re: Google Side-By-Side UI

2009-10-02 Thread Yao Ge
Yes. I think would be very helpful tool for tunning search relevancy - you can do a controlled experiment with your target audiences to understand their responses to the parameter changes. We plan to use this feature to benchmark Lucene/SOLR against our in-house commercial search engine - it will

Re: TermVector term frequencies for tag cloud

2009-10-02 Thread Bill Au
Have you considered using facet counts for your tag cloud? Bill On Fri, Oct 2, 2009 at 11:34 AM, wrote: > Hello, > > I'm trying to create a tag cloud from a term vector, but the array > returned (using JSON wt) is quite complex and takes an inordinate > amount of time to process. Is there a bet

Question about PatternReplace filter and automatic Synonym generation

2009-10-02 Thread Prasanna Ranganathan
Does the PatternReplaceFilter have an option where you can keep the original token in addition to the modified token? From what I looked at it does not seem to but I want to confirm the same. Alternatively, is there a filter available which takes in a pattern and produces additional forms of the

Question regarding synonym

2009-10-02 Thread darniz
Hi i have a question regarding synonymfilter i have a one way mapping defined austin martin, astonmartin => aston martin what baffling me is that if i give at query time the word austin martin it first goes through white space and generate two words in analysis page "austin" and "martin" th

Re: How to access the information from SolrJ

2009-10-02 Thread Shalin Shekhar Mangar
On Fri, Oct 2, 2009 at 8:11 PM, Paul Tomblin wrote: > Nope, that just gets you the number of results returned, not how many > there could be. Like I said, if you look at the XML returned, you'll > see something like > > but only 10 returned. getNumFound returns 10 in that case, not 1251. > >

Re: How to access the information from SolrJ

2009-10-02 Thread Paul Tomblin
On Fri, Oct 2, 2009 at 3:13 PM, Shalin Shekhar Mangar wrote: > On Fri, Oct 2, 2009 at 8:11 PM, Paul Tomblin wrote: > >> Nope, that just gets you the number of results returned, not how many >> there could be.  Like I said, if you look at the XML returned, you'll >> see something like >> >> but o

Re: How to access the information from SolrJ

2009-10-02 Thread Adam Allgaier
We have the same issue as Paul. We currently parse the XML manually to pull out the numFound from the response. Cheers! Adam - Original Message From: Paul Tomblin To: solr-user@lucene.apache.org Sent: Friday, October 2, 2009 2:39:01 PM Subject: Re: How to access the information from

search by some functionality

2009-10-02 Thread Elaine Li
Hi, My doc has three fields, say field1, field2, field3. My search would be q=field1:string1 && field2:string2. I also need to do some computation and comparison of the string1 and string2 with the contents in field3 and then determine if it is a hit. What can I do to implement this? Thanks. E

Invoke "expungeDeletes" using SolrJ's SolrServer.commit()

2009-10-02 Thread Jibo John
Hello, I know I can invoke expungeDeletes using updatehandler ( curl update - F stream.body=' ' ), however, I was wondering if it is possible to invoke it using SolrJ. It looks like, currently, there are no SolrServer.commit(..) methods that I can use for this purpose. Any input will be

Advantages of different Servlet Containers

2009-10-02 Thread Simon Wistow
I know that the Solr FAQ says "Users should decide for themselves which Servlet Container they consider the easiest/best for their use cases based on their needs/experience. For high traffic scenarios, investing time for tuning the servlet container can often make a big difference." but is th

Specifying "all except field" in field list?

2009-10-02 Thread Paul Rosen
Hi, Is there a way to request all fields in an object EXCEPT a particular one? In other words, the following pseudo code is what I'd like to express: req = Solr::Request::Standard.new(:start => page*size, :rows => size, :query => my_query, :field_list => [ ALL EXCEPT 'text' ]) Is there a way

Re: Advantages of different Servlet Containers

2009-10-02 Thread Lajos
Just go for Tomcat. For all its problems, and I should know having used it since it was originally JavaWebServer, it is perfectly capable of handling high-end production environments provided you tune it correctly. We use it with our customized Solr 1.3 version without any problems. Lajos S

RE: Advantages of different Servlet Containers

2009-10-02 Thread Walter Underwood
Netflix uses Tomcat throuought and they tail the log to figure out whether it has started, except they look for a message from Solr to see whether Solr is ready to go to work. wunder -Original Message- From: Lajos [mailto:la...@protulae.com] Sent: Friday, October 02, 2009 1:35 PM To: s

RE: Question regarding synonym

2009-10-02 Thread Ensdorf Ken
> Hi > i have a question regarding synonymfilter > i have a one way mapping defined > austin martin, astonmartin => aston martin > ... > > Can anybody please explain if my observation is correct. This is a very > critical aspect for my work. That is correct - the synonym filter can recognize mul

Re: How to access the information from SolrJ

2009-10-02 Thread Shalin Shekhar Mangar
On Sat, Oct 3, 2009 at 1:09 AM, Paul Tomblin wrote: > >> > > Nope. Check again. getNumFound will definitely give you 1251. > > SolrDocumentList#size() will give you 10. > > I don't have to check again. I put this log into my query code: >QueryResponse resp = solrChunkServer.query(que

Re: Invoke "expungeDeletes" using SolrJ's SolrServer.commit()

2009-10-02 Thread Shalin Shekhar Mangar
On Sat, Oct 3, 2009 at 1:35 AM, Jibo John wrote: > Hello, > > I know I can invoke expungeDeletes using updatehandler ( curl update -F > stream.body=' ' ), however, I was wondering > if it is possible to invoke it using SolrJ. > > It looks like, currently, there are no SolrServer.commit(..) metho

Re: How to access the information from SolrJ

2009-10-02 Thread Paul Tomblin
LucidityWorks.com is my client.  The similarity to lucid is purely coincidental - the client didn't even know I was going to choose Solr.  I am using Solr trunk, last updated and compiled a few weeks ago. -- Sent from my Palm Prē Shalin Shekhar Mangar wrote: On Sat, Oct 3, 2009 at 1:09 AM, Paul

Re: Advantages of different Servlet Containers

2009-10-02 Thread Shalin Shekhar Mangar
AOL uses Tomcat for all Solr deployments. Our load balancers use a ping query to put a box back into rotation. On Sat, Oct 3, 2009 at 2:15 AM, Walter Underwood wrote: > Netflix uses Tomcat throuought and they tail the log to figure out whether > it has started, except they look for a message from

Re: best way to get the size of an index

2009-10-02 Thread Phillip Farber
Thanks, Mark. I really appreciate your confirmation. Phil Mark Miller wrote: Phillip Farber wrote: Resuming this discussion in a new thread to focus only on this question: What is the best way to get the size of an index so it does not get too big to be optimized (or to allow a very large seg

Re: Invoke "expungeDeletes" using SolrJ's SolrServer.commit()

2009-10-02 Thread Yonik Seeley
You can always add arbitrary parameters to an update request: UpdateRequest ureq = new UpdateRequest(); ureq.add(doc); ureq.setParam("expungeDeletes","true"); NamedList rsp = server.request(ureq); -Yonik http://www.lucidimagination.com On Fri, Oct 2, 2009 at 4:05 PM, Jibo John

Re: Invoke "expungeDeletes" using SolrJ's SolrServer.commit()

2009-10-02 Thread Jibo John
Created jira issue https://issues.apache.org/jira/browse/SOLR-1487 Thanks, -Jibo On Oct 2, 2009, at 2:17 PM, Shalin Shekhar Mangar wrote: On Sat, Oct 3, 2009 at 1:35 AM, Jibo John wrote: Hello, I know I can invoke expungeDeletes using updatehandler ( curl update -F stream.body=' ' ), ho

Re: conditional sorting

2009-10-02 Thread Lance Norskog
Doing a second search immediately after the first one is consistently under 100 ms for me, usually under 25, on cheap hardware. Even while sorting the results, you should have no problems. If necessary, you could run Solr with the embedded client and do one search right after the other, avoid the

Re: How to access the information from SolrJ

2009-10-02 Thread Paul Tomblin
On Fri, Oct 2, 2009 at 5:04 PM, Shalin Shekhar Mangar wrote: > Can you try this with the Solrj client > in the official 1.3 release or even trunk? I did a svn update to 821188 and that seems to have fixed the problem. (The jar files changed from -1.3.0 to -1.4-dev) I guess it's been longer sinc

RE: Question regarding synonym

2009-10-02 Thread darniz
This is not working when i search documents i have a document which contains text aston martin when i search carDescription:"austin martin" i get a match but when i dont give double quotes like carDescription:austin martin there is no match in the analyser if i give austin martin with out quote

Re: Question regarding synonym

2009-10-02 Thread Christian Zambrano
When you use a field qualifier(fieldName:valueToLookFor) it only applies to the word right after the semicolon. If you look at the debug infomation you will notice that for the second word it is using the default field. carDescription:austin *text*:martin the following should word: carDescri

Re: Question regarding synonym

2009-10-02 Thread darniz
Thanks As i said it even works by giving double quotes too. like carDescription:"austin martin" So is that the conclusion that in order to map two word synonym i have to always enclose in double quotes, so that it doen not split the words Christian Zambrano wrote: > > When you use a

Re: Specifying "all except field" in field list?

2009-10-02 Thread Lance Norskog
No, there is only "list of fields", star, and score. You can choose to index it and not store it, and then have your application fetch it from the original data store. This is a common system design pattern to avoid storing giant text blobs in the index. http://wiki.apache.org/solr/FieldAliasesAn

Re: Specifying "all except field" in field list?

2009-10-02 Thread Paul Rosen
Thanks, Lance, for the quick reply. Well, unfortunately, we need the highlighting feature on that field, so I think we have to store it. It's not a big deal, it just seemed like something that would be useful and probably be easy to implement, so I figured I just missed it. Alternately, is

Re: Advantages of different Servlet Containers

2009-10-02 Thread Joshua Tuberville
Simon, Have you tried the bin/jetty.sh script that comes with Jetty distributions? It contains the standard start|stop|restart functions. Joshua On Oct 2, 2009, at 1:11 PM, Simon Wistow wrote: > I know that the Solr FAQ says > > "Users should decide for themselves which Servlet Container the

Re: Specifying "all except field" in field list?

2009-10-02 Thread Lance Norskog
Maybe the TermsComponent? You can't ask for facets with a wildcard in the field name. This would do the trick. It's an issue in JIRA, if you want to vote for it. http://issues.apache.org/jira/browse/SOLR-247 http://issues.apache.org/jira/browse/SOLR-1387 On Fri, Oct 2, 2009 at 6:36 PM, Paul Rose