Re: Best way to check Solr index for completeness

2010-09-28 Thread Dennis Gearon
How soon do you need to know? Couldn't you just regenerate the index using some kind of 'nice' factor to not use too much processor/disk/etc? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yer

deadlock in solrj?

2010-09-28 Thread Michal Stefanczak
Hello! I' using solrj 1.4.0 with java 1.6, on two occasions when indexing ~18000 documents we got the following problem: (trace from jconsole) Name: pool-1-thread-1 State: WAITING on java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@11 e464a Total blocked: 25 Tota

Re: multiple local indexes

2010-09-28 Thread Brent Palmer
Thanks for your comments, Jonathon. Here is some information that gives a brief overview of the eGranary Platform in order to quickly outline the need for a solution for bringing multiple indexes into one searchable collection. http://www.widernet.org/egranary/info/multipleIndexes Thanks, B

Solr with example Jetty and score problem

2010-09-28 Thread Floyd Wu
Hi there I have a problem, the situation is when I issue a query to single instance, Solr response XML like following as you can see, the score is normal() === 0 23 _l_title,score 0 _l_unique_key:12 * true 999 1.9808292 GTest 12 === But wh

Why the query performance is so different for queries?

2010-09-28 Thread newsam
Hi guys, I have posted a thread "The search response time is too long". The SOLR searcher instance is deployed with Tomcat 5.5.21. . The index file is 8.2G. The doc num is 6110745. DELL Server has Intel(R) Xeon(TM) CPU (4 cores) 3.00GHZ and 6G RAM. In SOLR back-end, "query=key:*" costs alm

Re: How to tell whether a plugin is loaded?

2010-09-28 Thread Chris Hostetter
: then in method createParser() add the following: : : req.getCore().getInfoRegistry().put(getName(), this); that doesn't seem like a good idea -- createParser will be called every time a string needs to be parsed, you're overwriting the same entry in the infoRegistry over and over and over ag

Re: Best way to check Solr index for completeness

2010-09-28 Thread Erick Erickson
Have you looked at SOLRs TermComponent? Assuming you have a unique key, I think you could use TermsComponent to walk that field for comparing against your database rather then getting all the documents. HTH Erick On Tue, Sep 28, 2010 at 5:11 PM, dshvadskiy wrote: > > That will certainly work fo

Re: Re:The search response time is too loong

2010-09-28 Thread newsam
Thx. I will let you know the latest status. >From: Lance Norskog >Reply-To: solr-user@lucene.apache.org >To: solr-user@lucene.apache.org, newsam >Subject: Re: Re:The search response time is too loong >Date: Tue, 28 Sep 2010 13:34:53 -0700 > >Copy the index. Delete half of the documents. Optimize.

SolrCore / Index Searcher Instances

2010-09-28 Thread entdeveloper
This may seem like a stupid question, but why on the info / stats pages do we see two instances on SolrIndexSearcher? The reason I ask is that we've implemented SOLR-465 to try and serve our index from a RAMDirectory, but it appears that our index is being loaded into memory twice, as our JVM hea

RE: Re: Solr Deduplication and Field Collpasing

2010-09-28 Thread Markus Jelsma
 Correction, Java heap size should be RAM buffer size if i'm not too mistaken.   -Original message- From: Markus Jelsma Sent: Wed 29-09-2010 01:17 To: solr-user@lucene.apache.org; Subject: RE: Re: Solr Deduplication and Field Collpasing If you can set the digest field for your `non-nutc

RE: Re: Solr Deduplication and Field Collpasing

2010-09-28 Thread Markus Jelsma
If you can set the digest field for your `non-nutch` documents easily, that would be a more quicker approach indeed. No need to create a custom update processor or anything like that. But to do so, you would have to reindex the whole bunch again. There is no way to update a document without comp

Re: Solr Deduplication and Field Collpasing

2010-09-28 Thread Nemani, Raj
I have the digest field already in the schema because the index is shared between nutch docs and others. I do not know if the second approach is the quickest in my case. I can set the digest value to something unique for non nutch documets easily (I have an I'd field that I can use to populate

RE: multiple local indexes

2010-09-28 Thread Jonathan Rochkind
Honestly, I think just putting everything in the same index is your best bet. Are you sure your "particular needs of your project" can't be served by one combined index? You can certainly still query on just a portion of the index when needed using fq -- you can even create a request handler (

RE: Solr Deduplication and Field Collpasing

2010-09-28 Thread Markus Jelsma
You could create a custom update processor that adds a digest field for newly added documents that do not have the digest field themselves. This way, the documents that are not added by Nutch get a proper non-empty digest field so the deduplication processor won't create the same empty hash and

multiple local indexes

2010-09-28 Thread Brent Palmer
In our application, we need to be able to search across multiple local indexes. We need this not so much for performance reasons, but because of the particular needs of our project. But the indexes, while sharing the same schema can be vary different in terms of size and distribution of docum

Re: Conditional Function Queries

2010-09-28 Thread Jan Høydahl / Cominvent
Ok, I created the issues: IF function: SOLR-2136 AND, OR, NOT: SOLR-2137 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 28. sep. 2010, at 19.36, Yonik Seeley wrote: > On Tue, Sep 28, 2010 at 11:33 AM, Jan Høydahl / Cominvent > wrote: >> Have anyone written any co

Re: Using separate Analyzers for querying and indexing.

2010-09-28 Thread James Norton
Excellent, exactly what I needed. Thanks, James On Sep 28, 2010, at 4:28 PM, Luke Crouch wrote: > Yeah. You can specify two analyzers in the same fieldType: > > > > ... > > > ... > > > > -L > > On Tue, Sep 28, 2010 at 2:31 PM, James Norton wrote: > >> Hello, >> >> I am migrating fro

Solr Deduplication and Field Collpasing

2010-09-28 Thread Nemani, Raj
All, I have setup Nutch to submit the crawl results to Solr index. I have some duplicates in the documents generated by the Nutch crawl. There is filed 'digest' that Nutch generates that is same for those documents that are duplicates. While setting up the the dedupe processor in the Solr co

Re: Dismax Request handler and Solrconfig.xml

2010-09-28 Thread Luke Crouch
I notice we don't have the default=true, instead we manually specify qt=dismax in our queries. HTH. -L On Tue, Sep 28, 2010 at 4:24 PM, Luke Crouch wrote: > What you have is exactly what I have on 1.4.0: > > > > > dismax > > And it has worked fine. We copied our solrconfig.xml from

Re: Dismax Request handler and Solrconfig.xml

2010-09-28 Thread Luke Crouch
What you have is exactly what I have on 1.4.0: dismax And it has worked fine. We copied our solrconfig.xml from the examples and changed them for our purposes. You might compare your solrconfig.xml to some of the examples. -L On Tue, Sep 28, 2010 at 4:19 PM, Thumuluri, Sai < sai.th

RE: Dismax Request handler and Solrconfig.xml

2010-09-28 Thread Thumuluri, Sai
Can I please get some help here? I am in a tight timeline to get this done - any ideas/suggestions would be greatly appreciated. -Original Message- From: Thumuluri, Sai [mailto:sai.thumul...@verizonwireless.com] Sent: Tuesday, September 28, 2010 12:15 PM To: solr-user@lucene.apache.org S

Re: is EmbeddedSolrServer thread safe ?

2010-09-28 Thread Reuben A Christie
No it is not same for EmbeddedSolrServer, we learned it hard way, I guess you would have also learned it by now. at SolrJ wiki page : http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer "CommonsHttpSolrServer is thread-safe and if you are using the following constructor, you *MUST* re-use t

Re: Best way to check Solr index for completeness

2010-09-28 Thread dshvadskiy
That will certainly work for most recent updates but I need to compare entire index. Dmitriy Luke Crouch wrote: > > Is there a 1:1 ratio of db records to solr documents? If so, couldn't you > simply select the most recent updated record from the db and check to make > sure the corresponding sol

Re: Concurrent access to EmbeddedSolrServer

2010-09-28 Thread Reuben A Christie
we learned it hard way, Wish I had read this before http://wiki.apache.org/solr/EmbeddedSolr it is not threadsafe. start seeing concurrent modification exception as soon as within 100 Samples, when you load it with more than 1 Concurrent Users ( I ha

Re: Best way to check Solr index for completeness

2010-09-28 Thread Luke Crouch
Is there a 1:1 ratio of db records to solr documents? If so, couldn't you simply select the most recent updated record from the db and check to make sure the corresponding solr doc has the same timestamp? -L On Tue, Sep 28, 2010 at 3:48 PM, Dmitriy Shvadskiy wrote: > Hello, > What would be the b

Best way to check Solr index for completeness

2010-09-28 Thread Dmitriy Shvadskiy
Hello, What would be the best way to check Solr index against original system (Database) to make sure index is up to date? I can use Solr fields like Id and timestamp to check against appropriate fields in database. Our index currently contains over 2 mln documents across several cores. Pulling all

Re: Search Interface

2010-09-28 Thread Lance Norskog
There is already a simple Velocity app. Just hit http://localhost:8983/solr/browse. You can configure some handy parameters to make walkable facets in solrconfig.xml. On Tue, Sep 28, 2010 at 5:23 AM, Antonio Calo' wrote: >  Hi > > You could try to use the Velocity framework to build GUIs in a  qu

Re: Re:The search response time is too loong

2010-09-28 Thread Lance Norskog
Copy the index. Delete half of the documents. Optimize. Copy the index. Delete the other half of the documents. Optimize. 2010/9/28 newsam : > I guess you are correct. We used the default SOLR cache configuration. I will > change the cache configuration. > > BTW, I want to deploy several shards f

Re: Using separate Analyzers for querying and indexing.

2010-09-28 Thread Luke Crouch
Yeah. You can specify two analyzers in the same fieldType: ... ... -L On Tue, Sep 28, 2010 at 2:31 PM, James Norton wrote: > Hello, > > I am migrating from a pure Lucene application to using solr. For legacy > reasons I must support a somewhat obscure query feature: lowercase words in >

Using separate Analyzers for querying and indexing.

2010-09-28 Thread James Norton
Hello, I am migrating from a pure Lucene application to using solr. For legacy reasons I must support a somewhat obscure query feature: lowercase words in the query should match lowercase or uppercase in the index, while uppercase words in the query should only match uppercase words in the ind

SolrException: Bad Request

2010-09-28 Thread Pavel Minchenkov
Hi, I'm getting a rather strange exception after long web server idle (TomCat 7.0.2). If I immediately run the same request -- no errors are occurred. In what may be the problem? All server settings are defaults. Exception: ... at sun.reflect.GeneratedMethodAccessor101.invoke(Unknown Source) at

Re: Conditional Function Queries

2010-09-28 Thread Yonik Seeley
On Tue, Sep 28, 2010 at 11:33 AM, Jan Høydahl / Cominvent wrote: > Have anyone written any conditional functions yet for use in Function Queries? Nope - but it makes sense and has been on my list of things to do for a long time. -Y http://lucenerevolution.org Lucene/Solr Conference, Boston Oct

RE: Dismax Request handler and Solrconfig.xml

2010-09-28 Thread Thumuluri, Sai
I removed default=true from standard request handler -Original Message- From: Luke Crouch [mailto:lcro...@geek.net] Sent: Tuesday, September 28, 2010 12:50 PM To: solr-user@lucene.apache.org Subject: Re: Dismax Request handler and Solrconfig.xml Are you removing the standard default requ

Re: Dismax Request handler and Solrconfig.xml

2010-09-28 Thread Luke Crouch
Are you removing the standard default requestHandler when you do this? Or are you specifying two requestHandler's with default="true" ? -L On Tue, Sep 28, 2010 at 11:14 AM, Thumuluri, Sai < sai.thumul...@verizonwireless.com> wrote: > Hi, > > I am using Solr 1.4.1 with Nutch to index some of our

Dismax Request handler and Solrconfig.xml

2010-09-28 Thread Thumuluri, Sai
Hi, I am using Solr 1.4.1 with Nutch to index some of our intranet content. In Solrconfig.xml, default request handler is set to "standard". I am planning to change that to use dismax as the request handler but when I set "default=true" for dismax - Solr does not return any results - I get results

Conditional Function Queries

2010-09-28 Thread Jan Høydahl / Cominvent
Hi, Have anyone written any conditional functions yet for use in Function Queries? I see the use for a function which can run different sub functions depending on the value of a field. Say you have three documents: A: title=Sports car, color=red B: title=Boring car, color=green B: title=Big car

Re: What's the difference between TokenizerFactory, Tokenizer, & Analyzer?

2010-09-28 Thread Ahmet Arslan
> 1) KeywordTokenizerFactory seems to be a "tokenizer > factory" while CJKTokenizer seems to be just a tokenizer. > Are they the same type of things at all? > Could I just replace > > with > class="org.apache.lucene.analysis.cjk.CJKTokenizer"/> > ?? You should use org.apache.solr.analysis.CJK

RE: Need help with spellcheck city name

2010-09-28 Thread Dyer, James
You might want to look at SOLR-2010. This patch works with the "collation" feature, having it test the collations it returns to ensure they'll return hits. So if a user types "san jos" it will know that the combination "san jose" is in the index and "san ojos" is not. James Dyer E-Commerce Sy

Re: Is Solr right for our project?

2010-09-28 Thread Jan Høydahl / Cominvent
Yes, in the latest released version (1.4.1), there is a shards= parameter but the client needs to fill it, i.e. the client needs to know what servers are indexers, searchers, shard masters and shard replicas... The SolrCloud stuff is still not committed and only available as a patch right now.

RE: Limitations of prohibited clausses in sub-expression - pure negative query

2010-09-28 Thread Patrick Sauts
Maybe SOLR-80 jira issue ? As written in Solr 1.4 book; "pure negative query doesn't work correctly ." you have to add 'AND *:* ' thx From: Patrick Sauts [mailto:patrick.via...@gmail.com] Sent: mardi 28 septembre 2010 11:53 To: 'solr-user@lucene.apache.org' Subject: Limitations o

Re: Limitations of prohibited clausses in sub-expression - pure negative query

2010-09-28 Thread Erick Erickson
Please explain what you want to *do*, your message is so terse it makes it really hard to figure out what you're asking. A couple of example queries would help a lot. Best Erick On Tue, Sep 28, 2010 at 5:53 AM, Patrick Sauts wrote: > I can find the answer but is this problem solved in Solr 1.4.1

Re: Search Interface

2010-09-28 Thread Antonio Calo'
Hi You could try to use the Velocity framework to build GUIs in a quick and efficent manner. Solr come with a velocity handler already integrated that could be the best solution in your case: http://wiki.apache.org/solr/VelocityResponseWriter Also take these hints on the same topic: htt

Re: is multi-threads searcher feasible idea to speed up?

2010-09-28 Thread Li Li
yes, there is a multisearcher in lucene. but it's idf in 2 indexes are not global. maybe I can modify it and also the index like: term1 df=5 doc1 doc3 doc5 term1 df=5 doc2 doc4 2010/9/28 Li Li : > hi all >    I want to speed up search time for my application. In a query, the > time is la

Re: is multi-threads searcher feasible idea to speed up?

2010-09-28 Thread Michael McCandless
This is an excellent idea! And, desperately needed. It's high time Lucene can take advantage of concurrency when running a single query. Machines have tons of cores these days! (My dev box has 24!). Note that one simple way to do this is use ParallelMultiSearcher: it uses one thread per segmen

is multi-threads searcher feasible idea to speed up?

2010-09-28 Thread Li Li
hi all I want to speed up search time for my application. In a query, the time is largly used in reading postlist(io with frq files) and calculate scores and collect result(cpu, with Priority Queue). IO is hardly optimized or already part optimized by nio. So I want to use multithreads to utili

Limitations of prohibited clausses in sub-expression - pure negative query

2010-09-28 Thread Patrick Sauts
I can find the answer but is this problem solved in Solr 1.4.1 ? Thx for your answers.

Re: Is Solr right for our project?

2010-09-28 Thread Mike Thomsen
Interesting. So what you are saying, though, is that at the moment it is NOT there? On Mon, Sep 27, 2010 at 9:06 PM, Jan Høydahl / Cominvent wrote: > Solr will match this in version 3.1 which is the next major release. > Read this page: http://wiki.apache.org/solr/SolrCloud for feature descriptio

What's the difference between TokenizerFactory, Tokenizer, & Analyzer?

2010-09-28 Thread Andy
Could someone help me to understand the differences between TokenizerFactory, Tokenizer, & Analyzer? Specifically, I'm interested in implementing auto-complete for tags that could contain both English & Chinese. I read this article (http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-f

Re: Re:The search response time is too loong

2010-09-28 Thread newsam
I guess you are correct. We used the default SOLR cache configuration. I will change the cache configuration. BTW, I want to deploy several shards from the existing 8G index file, such as 4G per shards. Is there any tool to generate two shards from one 8G index file? >From: kenf_nc >Reply-To: