RE: Index-time vs. search-time boosting performance

2010-06-04 Thread Jonathan Rochkind
The SolrRelevancyFAQ does suggest that both index-time and search-time boosting can be used to boost the score of newer documents, but doesn't suggest what reasons/contexts one might choose one vs the other. It only provides an example of search-time boost though, so it doesn't answer the quest

Re: Index-time vs. search-time boosting performance

2010-06-04 Thread Asif Rahman
It seems like it would be far more efficient to calculate the boost factor once and store it rather than calculating it for each request in real-time. Some of our queries match tens of thousands if not hundreds of thousands of documents in a 15GB index. However, I'm not well-versed in lucene inter

Re: Does SolrJ support nested annotated beans?

2010-06-04 Thread Thomas J. Buhr
+1 Good question, my use of Solr would benefit from nested annotated beans as well. Awaiting the reply, Thom On 2010-06-03, at 1:35 PM, Peter Hanning wrote: > > When modeling documents with a lot of fields (hundreds) the bean class used > with SolrJ to interact with the Solr index tends to g

Re: Faceted Search Slows Down as index gets larger

2010-06-04 Thread Yonik Seeley
On Fri, Jun 4, 2010 at 7:33 PM, Andy wrote: > Yonik, > > Just curious why does using enum improve the facet performance. > > Furkan was faceting on a text field with each word being a facet value. I'd > imagine that'd mean there's a large number of facet values. According to the > documentation

Re: Index-time vs. search-time boosting performance

2010-06-04 Thread Jay Hill
I've done a lot of recency boosting to documents, and I'm wondering why you would want to do that at index time. If you are continuously indexing new documents, what was "recent" when it was indexed becomes, over time "less recent". Are you unsatisfied with your current performance with the boost f

Re: Help with Shingled queries

2010-06-04 Thread Robert Muir
the queryparser first splits on whitespace. so each individual word of your query: short,red,evil,fox gets its own tokenstream, and therefore isn't shingled. On Fri, Jun 4, 2010 at 6:21 PM, Greg Bowyer wrote: > Hi all > > Interesting and by the looks of things very solid project you have here >

Re: Index-time vs. search-time boosting performance

2010-06-04 Thread Asif Rahman
Perhaps I should have been more specific in my initial post. I'm doing date-based boosting on the documents in my index, so as to assign a higher score to more recent documents. Currently I'm using a boost function to achieve this. I'm wondering if there would be a performance improvement if ins

Re: Faceted Search Slows Down as index gets larger

2010-06-04 Thread Andy
Yonik, Just curious why does using enum improve the facet performance. Furkan was faceting on a text field with each word being a facet value. I'd imagine that'd mean there's a large number of facet values. According to the documentation (http://wiki.apache.org/solr/SimpleFacetParameters#facet

Re: Index-time vs. search-time boosting performance

2010-06-04 Thread Erick Erickson
Index time boosting is different than search time boosting, so asking about performance is irrelevant. Paraphrasing Hossman from years ago on the Lucene list (from memory). ...index time boosting is a way of saying this documents' title is more important than other documents' titles. Search time

Help with Shingled queries

2010-06-04 Thread Greg Bowyer
Hi all Interesting and by the looks of things very solid project you have here with SOLR, however .. I have an index that contains a large number of "phrases" that I need to search for over, each of these phrases is fairly small being on average about 4 words long. The search terms that I am

RE: general debugging techniques?

2010-06-04 Thread Chris Hostetter
: That is still really small for 5MB documents. I think the default solr : document cache is 512 items, so you would need at least 3 GB of memory : if you didn't change that and the cache filled up. that assumes that the extracted text tika extracts from each document is the same size as the o

Re: general debugging techniques?

2010-06-04 Thread Chris Hostetter
: to format the data from my sources. I can read through the catalina : log, but this seems to just log requests; not much info is given about : errors or when the service hangs. Here are some examples: if you are only seeing one log line per request, then you are just looking at the "request"

Re: Range query on long value

2010-06-04 Thread David
On 10-06-04 05:11 PM, Ahmet Arslan wrote: I have an issue with range queries on a long value in our dataset (the dataset is fairly large, but i believe the problem still exists for smaller datasets). When i query the index with a range, as such: id:[1 TO 2000], I get values back that are wel

Re: Range query on long value

2010-06-04 Thread Ahmet Arslan
> I have an issue with range queries on a long value in our > dataset (the dataset is fairly large, but i believe the > problem still exists for smaller datasets).  When i > query the index with a range, as such: id:[1 TO 2000], I get > values back that are well outside that range.  Its as > if th

Index-time vs. search-time boosting performance

2010-06-04 Thread Asif Rahman
Hi, What are the performance ramifications for using a function-based boost at search time (through bf in dismax parser) versus an index-time boost? Currently I'm using boost functions on a 15GB index of ~14mm documents. Our queries generally match many thousands of documents. I'm wondering if I

Re: Need help to install Solr on JBoss

2010-06-04 Thread Juan Pedro
Check the wiki 1. Do I need to copy the entire example folder from my local machine to Solr home on Sun Solaris box? http://wiki.apache.org/solr/SolrJBoss 2. How can I have multiple cores on the Sun Solaris box? http://wiki.apache.org/solr/CoreAdmin Regards Juan www.linebee.com B

Need help with document format

2010-06-04 Thread Moazzam Khan
Hi guys, I have a list of consultants and the users (people who work for the company) are supposed to be able to search for consultants based on the time frame they worked for, for a company. For example, I should be able to search for all consultants who worked for Bear Stearns in the month of j

Range query on long value

2010-06-04 Thread David
Hi, I have an issue with range queries on a long value in our dataset (the dataset is fairly large, but i believe the problem still exists for smaller datasets). When i query the index with a range, as such: id:[1 TO 2000], I get values back that are well outside that range. Its as if the r

RE: index growing with updates

2010-06-04 Thread Chris Hostetter
: Ok so I think that Solr (lucene) will only remove deleted/updated : documents from the disk after an optimize or after an 'expungeDeletes' : request. Is there a way to trigger the expunsion (new word) across the : entire index? I tried : deletes are removed when segments are merged -- an opt

Re: TikaEntityProcessor not working?

2010-06-04 Thread Brad Greenlee
You are my hero. I replaced the Tika 0.8 snapshots that were included with Solr with 0.6 and it works now. Thank you! Brad On Jun 3, 2010, at 6:22 AM, David George wrote: > > Which version of Tika do you have? There was a problem introduced somewhere > between Tika 0.6 and Tika 0.7 whereby the

conditional Document Boost

2010-06-04 Thread MitchK
Hello out there, I am searching for a solution for conditional Document Boosting. During analyzing the fields of a document, I want to create a document boost based on some metrics. There are two approaches: First: I preprocess the data. The main problem with this is, that I need to take care ab

RE: String Sort Nor Working

2010-06-04 Thread Patrick Wilson
Very informative - thank you! I think it might be useful to have this feature - maybe have an interface for plugins to register a XSD or otherwise declare its expected xml elements and attributes. I'm not sure if there's enough demand for this to justify the time it would take to make this chan

RE: String Sort Nor Working

2010-06-04 Thread Ahmet Arslan
> P.S. Might it be helpful for Solr to complain about invalid > XML during startup? Does it do this and I'm just not > noticing? Chris's explanation about a similar topic: http://search-lucene.com/m/11JWX1hxL4u/

Need help to install Solr on JBoss

2010-06-04 Thread Bondiga, Murali
I installed Solr on my local machine and it works fine with Jetty. I am trying to install on JBoss which is running on a Sun Solaris box and I have the following questions: 1. Do I need to copy the entire example folder from my local machine to Solr home on Sun Solaris box? 2. How can I ha

RE: String Sort Nor Working

2010-06-04 Thread Patrick Wilson
That did it. Thank you =) P.S. Might it be helpful for Solr to complain about invalid XML during startup? Does it do this and I'm just not noticing? -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Friday, June 04, 2010 12:18 PM To: solr-user@lucene.apache.org Subje

Re: String Sort Nor Working

2010-06-04 Thread Ahmet Arslan
> > Simple lowercase F is causing this. It should be

Re: Highlighting a field with a certain value

2010-06-04 Thread Koji Sekiguchi
(10/05/25 0:31), n...@frameweld.com wrote: Hello, How am I able to highlight a field that contains a specific value? If I have a field called type, how am I able to highlight the rows whose values contain something like "title"? http://localhost:8983/solr/select?q=title&hl=on&hl.fl=type

RE: index growing with updates

2010-06-04 Thread Nagelberg, Kallin
Ok so I think that Solr (lucene) will only remove deleted/updated documents from the disk after an optimize or after an 'expungeDeletes' request. Is there a way to trigger the expunsion (new word) across the entire index? I tried : final UpdateRequest request = new UpdateRequest() request.setPar

Re: Faceted Search Slows Down as index gets larger

2010-06-04 Thread Furkan Kuru
I am using 1.4 version. I have tried your suggestion, it takes around 25-30 seconds now. Thank you, On Fri, Jun 4, 2010 at 5:54 PM, Yonik Seeley wrote: > Faceting on a full-text field is hard. > What version of Solr are you using? > > If it's 1.4 or later, try setting > facet.method=enum > >

String Sort Nor Working

2010-06-04 Thread Patrick Wilson
All, I am trying to sort on a text field and can't get it to work. I try sorting on "sortTitle" and get no errors, it just doesn't appear to sort. The pertinent parts of my schema: ... lots of filters that do work...

Re: OverlappingFileLockException when using startup

2010-06-04 Thread rabahb
Hi Guys, I'm experiencing the same issue with a single war. I'm using a brand new Solr war built from yestertay's version of the trunk. I've got one master with 2 cores and one slave with a single core. I'm using one core from master as the master of the second core (which is configured as a re

Re: Faceted Search Slows Down as index gets larger

2010-06-04 Thread Yonik Seeley
Faceting on a full-text field is hard. What version of Solr are you using? If it's 1.4 or later, try setting facet.method=enum And to use the filterCache less, try facet.enum.cache.minDf=100 -Yonik http://www.lucidimagination.com On Fri, Jun 4, 2010 at 10:31 AM, Furkan Kuru wrote: > Hello, > >

Faceted Search Slows Down as index gets larger

2010-06-04 Thread Furkan Kuru
Hello, I have been dealing with real-time data. As the number of total indexed documents gets larger (now 5 M) a faceted search on a text field limited by the creation time, which we use to find the most used word in all these text fields, gets slow down. query string: created_time:[NOW-1HOUR

Re: MultiValue Exclusion

2010-06-04 Thread Geert-Jan Brits
I guess the following works. A. similar to your option 2, but using the filtercache fq=-item_id:001 -item_id:002 B. similar to your option 3, but using the filtercache fq=-users_excluded_field: the advantage being that the filter is cached independently from the rest of the query so it can be re

MultiValue Exclusion

2010-06-04 Thread homerlex
How would you model this? We have a table of news items that people can view in their news stream and comment on. Users have the ability to "mute" item so they never see them in their feed or search results. >From what I can see there are a couple ways to accomplish this. 1 - Post process the

Re: exclude docs with null field

2010-06-04 Thread Geert-Jan Brits
Additionally, I should have mentioned that you can instead do: fq=field_3:[* TO *], which uses the filtercache. The method presented by Chris will probably outperform the above method but only on the first request, from then on the filtercache takes over. >From a performance standpoint it's probab

Re: exclude docs with null field

2010-06-04 Thread bluestar
nice one! thanks. > >> i could be wrong but it seems this >> way has a performance hit? >> >> or i am missing something? > > Did you read Chris's message in http://search-lucene.com/m/1o5mEk8DjX1/ > He proposes alternative (more efficient) way other than [* TO *] > > > >

Re: Logs for Java Replication in Solr

2010-06-04 Thread Peter Karich
Hoss, thanks a lot! (We are using tomcat so the logging properties file is fine.) Do you know what the reason of the mentioned exception could be? It seems to me that if this exception accurs that even the replication for that index does not work. If I then remove the data director + reload + poll

Re: exclude docs with null field

2010-06-04 Thread Ahmet Arslan
> i could be wrong but it seems this > way has a performance hit? > > or i am missing something? Did you read Chris's message in http://search-lucene.com/m/1o5mEk8DjX1/ He proposes alternative (more efficient) way other than [* TO *]

Re: exclude docs with null field

2010-06-04 Thread bluestar
i could be wrong but it seems this way has a performance hit? or i am missing something? > field1:"new york"+field2:"new york"+field3:[* TO *] > > 2010/6/4 bluestar > >> hi there, >> >> say my search query is "new york", and i am searching field1 and field2 >> for it, how do i specify that i wan

Re: exclude docs with null field

2010-06-04 Thread Geert-Jan Brits
field1:"new york"+field2:"new york"+field3:[* TO *] 2010/6/4 bluestar > hi there, > > say my search query is "new york", and i am searching field1 and field2 > for it, how do i specify that i want to exlude docs where field3 doesnt > exist? > > thanks > >

Re: exclude docs with null field

2010-06-04 Thread Ahmet Arslan
> say my search query is "new york", and i am searching > field1 and field2 > for it, how do i specify that i want to exlude docs where > field3 doesnt > exist? http://search-lucene.com/m/1o5mEk8DjX1/

Multi word synonyms + highlighting

2010-06-04 Thread Xavier Schepler
Hi, Here's a field type using synonyms : synonyms="french-synonyms.txt" ignoreCase="true" expand="true"/> mapping="mapping-ISOLatin1Accent.txt"/> mapping="mapping-ISOLatin1Accent.txt"/> Here are the contents of 'french-synonyms.txt' that I used for testing : PC,parti co