Performance-Issues and raising numbers of "cumulative inserts"

2010-02-15 Thread Bohnsack, Sven
Hey IT-Crowd! I'm dealing with some performance issues during warmup the queryResultCache. Normally it tooks about 11 Minutes (~700.000 ms), but now it tooks about 4 MILLION and more ms. All I can see in the solr.log ist that the number of cumulative_inserts ascends from from ~250.000 to ~670.0

Re: Realtime search and facets with very frequent commits

2010-02-15 Thread Janne Majaranta
Hey Dipti, Basically query optimizations + setting cache sizes to a very high level. Other than that, the config is about the same as the out-of-the-box config that comes with the Solr download. I haven't found a magic switch to get very fast query responses + facet counts with the frequency of c

Re: too often delta imports performance effect

2010-02-15 Thread Nick Jenkin
Yes, the old data will show until there has been a commit executed. 50 docs isn't many so you should be fine -Nick On Mon, Feb 15, 2010 at 11:41 AM, adeelmahmood wrote: > > thank you .. that helps .. actually its not that many updates .. close to 10 > fields probably and may be 50 doc updates per

How to retrieve relevance "explain" info in code?

2010-02-15 Thread uwdanny
Hi, I was trying to get the detailed "explain" info in (java) code using the APIs, see codes below, - ResponseBuilder rb (from some inherited process function) SolrIndexSearcher searcher = rb.req.getSearcher(); Query query = rb.getQuery(); Explanation epl = searcher.explain(quer

Discovering Slaves

2010-02-15 Thread wojtekpia
Is there a way to 'discover' slaves using ReplicationHandler? I'm writing a quick dashboard, and don't have access to a list of slaves, but would like to show some stats about their health. -- View this message in context: http://old.nabble.com/Discovering-Slaves-tp27601334p27601334.html Sent fr

getting unexpected statscomponent values

2010-02-15 Thread solr-user
Has anyone encountered the following issue? I wanted to understand the statscomponent better, so I setup a simple test index with a few thousand docs. In my schema I have: - an indexed multivalue sint field (StatsFacetField) that can contain values 0 thru 5 that I want to use as my stats.

Re: regarding ranking

2010-02-15 Thread Ahmet Arslan
> 1) Does Solr (Lucene) consider exact match to be something > more important ? I mean if the query is  > "description:organisation", then > which one of the following would be returned? >        Document A, consiting  > just "description:organisation" , where > as Document B consisting "descriptio

Re: persistent cache

2010-02-15 Thread Tom Burton-West
Hi Tim, Due to our performance needs we optimize the index early in the morning and then run the cache-warming queries once we mount the optimized index on our servers. If you are indexing and serving using the same Solr instance, you shouldn't have to re-run the cache warming queries when you a

Re: Custom SearchComponent, only getting numFound back

2010-02-15 Thread cmose
A little more info. Doing some more digging it appears that result.getDocList().size() is returning 0 which would explain why I'm not getting any documents in my result. I'm not quite sure how/why that would be returning 0 while result.getDocList().matches() is returning > 0? It also looks li

Re: SolrJ Beginner (NEED HELP URGENT)

2010-02-15 Thread Erick Erickson
This link will answer many of your questions: http://wiki.apache.org/solr/SolrInstall About JSON: See SOLR-1690 (Jira) at: https://issues.apache.org/jira/browse/SOLR-1690 WARNING: I have no clue what the condition of this patch is, and have never used it. But at least someone else is thinking

Re: Question on Index Replication

2010-02-15 Thread Erick Erickson
Caveats: <1> I don't know either. <2> I think you can just fire off auto-warming queries at each SOLR instance. the main caching is on the server machine as far as SOLR search speed is concerned. But I'd really recommend thinking about just replicating the indexes, disk space is very cheap. Probab

Re: Apache Tika/ Solar Cell

2010-02-15 Thread Erick Erickson
<> Well, It Depends (tm). What do you want to accomplish? Do you want searches to get results from both the database import AND the imported documents? Or are these orthogonal data sets? If they are orthogonal, then putting them in their own core if probably conceptually easiest (there's no require

Re: and DisMaxRequestHandler

2010-02-15 Thread Joe Calderon
no but you can set a default for the qf parameter with the same value On 02/15/2010 01:50 AM, Steve Radhouani wrote: Hi there, Can the option be used by the DisMaxRequestHandler? Thanks, -Steve

Custom SearchComponent, only getting numFound back

2010-02-15 Thread cmose
I'm attempting to write a custom SearchComponent that utilizes some custom filters but i'm obviously missing something key. I extend SearchComponent and override the prepare and process methods and then set the results on the result builder a la: SolrIndexSearcher.QueryCommand cmd = rb.getQueryCo

SolrJ Beginner (NEED HELP URGENT)

2010-02-15 Thread muneeb
Hey All, I have gone through the tutorial and ran Solrj example code. It worked fine. I want to now implement my own full text search engine for my documents. I am not sure how should i start about doing this, since in example code I ran start.jar and post.jar? do I have to run start.jar even

regarding ranking

2010-02-15 Thread Smith G
Hello All, I know that in most of the cases there is no need to edit the ranking formula and I hope for the same in our case. So to make sure that there is no need, I have following queries. 1) Does Solr (Lucene) consider exact match to be something more important ? I mean if the query

Re: Question on Index Replication

2010-02-15 Thread abhishes
What you say makes perfect sense. However i can offset the risk of disk i/o and latency by having good amount of RAM say 64 GB and 64 bit OS. 2 caveats being that 1. I have no clue if J2EE servers can use this much RAM (64 bit OS and JVM). 2. I have no idea on how can cache be auto-warmed. s

Apache Tika/ Solar Cell

2010-02-15 Thread Lee Smith
Hey All, Hope someone can advise me and a way to go. I have a my Solr setup and working well. I am using DIH to handle all my data input. Now I need to add content from word docs pdf's meta data etc and looking to use Solar Cell A few questions regarding this. Would it be best to add these to

Lock error when indexing with curl

2010-02-15 Thread nabil rabhi
when posting documents to solr using curl, I get the following error: Posting file File.xml to http://localhost:8983/solr/update/ Error 500 HTTP ERROR: 500Lock obtain timed out: nativefsl...@./solr/data/index/lucene-bd553072dd77e805bcb4e83a6d8ca389-write.lock: java.io.FileNotFoundException: .

Re: schema design - catch all field question

2010-02-15 Thread adeelmahmood
I am just trying to understand the difference between the two options to know which one to choose .. it sounds like I probably should just merge all data in the content field to maximize search results Erick Erickson wrote: > > The obvious answer is that you won't get any hits for terms > in ti

Re: Force Solr to use special response-rules

2010-02-15 Thread Ahmet Arslan
> Now, I want to create an extra rule: > If the query contains on 9 words, I want to make sure, that > 6 of them have > to occur within a document or else it would not be > responsed to the user. > I think you are asking DisMaxRequestHandler's mm (Minimum 'Should' Match) parameter. http://wi

Re: Question on Index Replication

2010-02-15 Thread Erick Erickson
Sure, you can do that. But you're making a change that kind of defeats the purpose. The underlying Lucene engine can be very disk intensive, and any network latency will adversely affect the search speed. Which is the point of replicating the indexes, to get them local to the SOLR/ Lucene instance

VelocityResponseWriter: Image References

2010-02-15 Thread Chantal Ackermann
Hi all, Google didn't come up with any helpful hits, so I'm wondering whether this is either too simple for me to grok, or I've got some obvious mistake in my code. Problem: Images that I want to load in the velocity templates (including those referenced in CSS/JS files) for the VelocityRespon

Re: schema design - catch all field question

2010-02-15 Thread Erick Erickson
The obvious answer is that you won't get any hits for terms in titles when you search the content field. But that's not very informative. What are you trying to accomplish? That is, what's the high-level issue you're trying to address with a change like that? Best Erick On Sun, Feb 14, 2010 at 9

Re: Filtering a string containing a certain value with fq ?

2010-02-15 Thread Fredouille91
Koji Sekiguchi-2 wrote: > > Fredouille91 wrote: >> Hello, >> >> I have a field (named "countries") containing a list of values separated >> with comma to wich belongs each document. >> This field looks like this : france,germany,italy >> It means that tjhis document is related to France, German

Re: Filtering a string containing a certain value with fq ?

2010-02-15 Thread Koji Sekiguchi
Fredouille91 wrote: Hello, I have a field (named "countries") containing a list of values separated with comma to wich belongs each document. This field looks like this : france,germany,italy It means that tjhis document is related to France, Germany and Italy. If you can have country field

Re: Facet search concept problem

2010-02-15 Thread Ranveer Kumar
Hi Eric, Raju Thanks for reply.. It means I need to index separate table data (news, article and blog), currently I am joining the table and making a single rows for all three table. One other thing I want to know that, in this case (if i indexing table data separately then) some column of the t

Phrase similarity - "more like this" feature for small set of terms

2010-02-15 Thread Xavier Schepler
Hi, there is an indexed field in my Solr's schema, in which one phrase is stored per document. I have to implement a feature that will allow users to have "more like this" results, based on the contents of this field. I think that the Solr's built in "more like this" feature requires too many

Re: Realtime search and facets with very frequent commits

2010-02-15 Thread dipti khullar
Hey Janne Can you please let me know what other optimizations are you talking about here. Because in our application we are committing in about 5 mins but still the response time is very low and at times there are some connection time outs also. Just wanted to confirm if you have done some major

Highlighting and field types

2010-02-15 Thread Jan
Hi all, After analysing the highlighting inconsistency [Highlighting Inconsistency email tree] I was wondering if I should open a jira issue? Can you advise me if that's a sensible thing to do? So the issue is: * A query is done on a certain field (i.e. title) which is unstemmed,

Force Solr to use special response-rules

2010-02-15 Thread MitchK
Hello community, with the help of the sloppy pharse query [1] I can say, that queried words have to occur within a special number of words. Now, I want to create an extra rule: If the query contains on 9 words, I want to make sure, that 6 of them have to occur within a document or else it would

RE: persistent cache

2010-02-15 Thread Toke Eskildsen
From: Tim Terlegård [tim.terleg...@gmail.com] > If the index size is more than you can have in RAM, do you recommend > to split the index to several servers so it can all be in RAM? > > I do expect phrase queries. Total index size is 107 GB. *prx files are > total 65GB and *frq files 38GB. It's pro

Updating index: Replacing data directory recommended?

2010-02-15 Thread Peter Karich
Hi solr community! Is it recommended to replace the data directory of a heavy used solr instance? (I am aware of the http queries, but that will be too slow) I need a fast way to push development data to production servers. I tried the following with success even under load of the index: mv da

Re: NullPointerException in ReplicationHandler.postCommit + question about compression

2010-02-15 Thread Shalin Shekhar Mangar
On Sat, Jan 30, 2010 at 5:08 AM, Chris Hostetter wrote: > > : never keep a 0. > : > : It is better to leave not mention the deletionPolicy at all. The > : defaults are usually fine. > > if setting the "keep" values to 0 results in NPEs we should do one (if not > both) of the following... > > 1) ch

and DisMaxRequestHandler

2010-02-15 Thread Steve Radhouani
Hi there, Can the option be used by the DisMaxRequestHandler? Thanks, -Steve

Re: Facet search concept problem

2010-02-15 Thread NarasimhaRaju
Hi, you should have a new field in your index say 'type' which will have values 'news','article' and 'blog' for documents news,article and blog respectively. when searching with facet's elabled make use of this 'type' field then you will get what you wanted. Regards, P.N.Raju, ___

Re: persistent cache

2010-02-15 Thread Tim Terlegård
Hi Tom, 1600 warming queries, that's quite many. Do you run them every time a document is added to the index? Do you have any tips on warming? If the index size is more than you can have in RAM, do you recommend to split the index to several servers so it can all be in RAM? I do expect phrase qu

Filtering a string containing a certain value with fq ?

2010-02-15 Thread Fredouille91
Hello, I have a field (named "countries") containing a list of values separated with comma to wich belongs each document. This field looks like this : france,germany,italy It means that tjhis document is related to France, Germany and Italy. I 'm trying to add a filter to list, for example, all

spellcheck all time

2010-02-15 Thread michaelnazaruk
I have a little problem with spellcheck! I get suggestions all time even the word is correct! I use dictionary from file! Here my configuration: explicit true file false false 1 false query spellcheck