Re: Distributed grouping issue

2012-04-02 Thread Martijn v Groningen
I tried the to reproduce this. However the matches always returns 4 in my case (when using rows=1 and rows=2). In your case the 2 documents on each core do belong to the same group, right? I did find something else. If I use rows=0 then an error occurs. I think we need to further investigate this.

Re: Merging results from two queries

2012-04-02 Thread Karthick Duraisamy Soundararaj
@Eric By threshold, all I mean is the count of the documents returned and I am not going to play with score. So if I have to commit my code to svn, whats the best way to go about it? I know I have to discuss my design here which would take atleast a couple of days. But is there special instructions

Re: Tags and Folksonomies

2012-04-02 Thread Chris Hostetter
: Suppose I have content which has title and description. Users can tag content : and search content based on tag, title and description. Tag has more : weightage. : : Any inputs on how indexing and retrieval will work given there is content : and tags using Solr? Has anyone implemented search ba

Re: viewing the terms indexed for a specific document

2012-04-02 Thread Erick Erickson
If you add &explainOther=, see: http://wiki.apache.org/solr/SolrRelevancyFAQ you might get some hints. You can use the TermsComponent to see if the synonyms are getting in the index, but you'll have to have a very restricted input set (like one doc) for that to be helpful for a specific document.

Re: Merging results from two queries

2012-04-02 Thread Erick Erickson
Part of it depends on what you mean by "threshold". If it's just the number of matches, then fine. But if you're talking score here, be very, very careful. Scores are not an absolute measure of anything, they only tell you that "for _this_ query, the docs should be order this way". So I'd advise a

Re: pattern error in PatternReplaceCharFilterFactory

2012-04-02 Thread Chris Hostetter
: It seems to be an unrecognisable pattern, this is from the log, last : paragraph says "unknown character block name". The java version is : "1.6.0_31": Did you read the rest of my reply? about testing if java recognizes your block name independent of Solr ... because that error is coming direc

RE: Distributed grouping issue

2012-04-02 Thread Young, Cody
Okay, I've played with this a bit more. Found something interesting: When the groups returned do not include results from a core, then the core is excluded from the count. (I have 1 group, 2 documents per core) Example: http://localhost:8983/solr/core0/select/?q=*:*&shards=localhost:8983/solr/c

Re: Distributed grouping issue

2012-04-02 Thread Martijn v Groningen
> > All documents of a group exist on a single shard, there are no cross-shard > groups. > You only have to partition documents by group when the groupCount and some other features need to be accurate. For the "matches" this is not necessary. The matches are summed up during merging the shared resp

Re: viewing the terms indexed for a specific document

2012-04-02 Thread karthik
A few more details to this thread - when i try the analysis tab from the admin console I see that the synonym is kicking in & its matching the text in the document that I am expecting to see as part of the results. However the actual search is not returning that document. Also I used the termcomp

SolrJ updating indexed documents?

2012-04-02 Thread Mike O'Leary
I am working on a component for indexing documents from a database that contains medical records. The information is organized across several tables and I am supposed to index records for varying sizes of sets of patients for others to do IR experiments with. Each patient record has one or more

Re: Merging results from two queries

2012-04-02 Thread John Chee
Karthick, The solution that I use to this problem is to perform query1 and query2 and boost results matching query1. Then solr takes care of all the deduplication (not necessarily merging) automatically, would this work for your situation? I stole this idea from this slide deck: "Make sure all r

Re: Open deleted index file failing jboss shutdown with Too many open files Error

2012-04-02 Thread Michael McCandless
Hmm, unless the ulimits are low, or the default mergeFactor was changed, or you have many indexes open in a single JVM, or you keep too many IndexReaders open, even in an NRT or frequent commit use case, you should not run out of file descriptors. Frequent commit/reopen should be perfectly fine, a

Re: Open deleted index file failing jboss shutdown with Too many open files Error

2012-04-02 Thread Gopal Patwa
Here is SolrConfig.xml, and I am using Lucene NRT with soft commit and update the index every 5 seconds, soft commit every 1 second and hard commit every 15 minutes > SolrConfig.xml: > > > >false >10 >2147483647 >1 >

RE: Distributed grouping issue

2012-04-02 Thread Young, Cody
In the case of group=false: numFound="26" In the case of group=true: 34000 As a note, the grouped number changes when I hit refresh. It seems to display the count from any single shard. (The top match also changes). I haven't tried this in other versions of solr. All documents of a group

Thanks All, that worked (both via SOLRJ and the admin UI)

2012-04-02 Thread vybe3142
The query in question should be: -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-localparams-joins-using-SolrJ-and-or-the-Admin-GUI-tp3872088p3877927.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: ExtractingRequestHandler

2012-04-02 Thread spring
> Solr Cell is great for proof-of-concept, but for heavy-duty > applications, > you're offloading all the processing on the Solr server, > which can be a > problem. Good point! Thank you

How to determine memory consumption per core

2012-04-02 Thread Martin Grotzke
Hi, is it possible to determine the memory consumption (heap space) per core in solr trunk (4.0-SNAPSHOT)? I just unloaded a core and saw the difference in memory usage, but it would be nice to have a smoother way of getting the information without core downtime. It would also be interesting, wh

Re: Problems with indexing of huge textfiles (drupal/tika/solr)

2012-04-02 Thread Erick Erickson
And probably 10,000 tokens (words). See maxFieldLength in solrconfig.xml. Best Erick On Mon, Apr 2, 2012 at 8:57 AM, Sandro Feuillet wrote: > Hi, > > We have troubles indexing big text files with Solr. > We extract PDF files with Tika and try to index them with Solr. > But Solr doesn't index the

Re: Solr caching memory consumption Problem

2012-04-02 Thread Shawn Heisey
On 3/31/2012 4:30 AM, Suneel wrote: Hello friends, I am using DIH for solr indexing. I have 60 million records in SQL which need to upload on solr. i started caching its smoothly working and memory consumption is normal, But after some time incrementally memory consumption going high and process

Re: A little mild abuse of SearchHandler

2012-04-02 Thread Benson Margulies
I've answered my own question, but it left me with a lot of curiosity. Why is the convention to build strings joined with commas (e.g in SolrQuery.addValueToParam) rather than to use the array option? All these params are Map, so why cram multiples into the first slot with commas ?

A little mild abuse of SearchHandler

2012-04-02 Thread Benson Margulies
I've got a prototype of a RequestHandler that embeds, within itself, a SearchHandler. Yes, I read the previous advice to be a query component, but I found it a lot easier to chart my course. I'm having some trouble with sorting. I came up with the following. 'args' is the usual Map. firstpassSort

Problems with indexing of huge textfiles (drupal/tika/solr)

2012-04-02 Thread Sandro Feuillet
Hi, We have troubles indexing big text files with Solr. We extract PDF files with Tika and try to index them with Solr. But Solr doesn't index the entire text. As soon as a certain amount of text is reached Solr stopps indexing the rest. We haven't found a setting or parameter wich defines the amo

Re: Apache solr not indexing complete pdf file using tikka

2012-04-02 Thread Erick Erickson
You can index 2B tokens, so upping maxFieldLength should have fixed your problem at least as far as Solr is concerned. How many tokens get indexed? I'm not as familiar with Tika, but there may be some kind of parameter there (although I don't remember this coming up before)... Did you restart Solr

Re: Virtual Memory very high

2012-04-02 Thread Michael McCandless
Are you seeing a real problem here, besides just being alarmed by the big numbers from top? Consumption of virtual memory by itself is basically harmless, as long as you're not running up against any of the OS limits (and, you're running a 64 bit JVM). This is just "top" telling you that you've m

Re: Virtual Memory very high

2012-04-02 Thread Erick Erickson
Why do you care about virtual memory? It's after all, virtual. You can allocate as much as you want. For instance, MMapDirectory maps a load of virtual memory, but that has little relation to how much physical memory is being used. Consider looking at your app with something like jConsole and seei

Re: default operation for a field

2012-04-02 Thread Alexander Aristov
Ok. got it. thanks Best Regards Alexander Aristov On 2 April 2012 16:37, Erick Erickson wrote: > You can't set the default operator for a single field. This implies > you're using edismax? If that's the case, your app layer can > massage the query to something like > term1 term2 term3 field_x:

Re: default operation for a field

2012-04-02 Thread Erick Erickson
You can't set the default operator for a single field. This implies you're using edismax? If that's the case, your app layer can massage the query to something like term1 term2 term3 field_x:(term1 AND term2 AND term3). In which case field_x probably should not be in your qf parameter. Best Erick

Re: Open deleted index file failing jboss shutdown with Too many open files Error

2012-04-02 Thread Erick Erickson
How often are you committing index updates? This kind of thing can happen if you commit too often. Consider setting commitWithin to something like, say, 5 minutes. Or doing the equivalent with the autoCommit parameters in solrconfig.xml If that isn't relevant, you need to provide some more details

Re: SolrCloud

2012-04-02 Thread Erick Erickson
No, you don't have to run zookeeper on each replica. Zookeeper is a repository for your system (cluster) information. It knows about each replica, but ZK does not need to run on each shard. You can run one zookeeper instance for your entire cluster, no matter how many shards/replicas you have. He

Re: How do I use localparams/joins using SolrJ and/or the Admin GUI

2012-04-02 Thread Stefan Matheis
On Monday, April 2, 2012 at 2:00 PM, Stefan Matheis wrote: > On Friday, March 30, 2012 at 11:33 PM, vybe3142 wrote: > > When I paste the relevant part of the query into the SOLR admin UI query > > interface, > > {!join+from=join_id+to=id}attributes_AUTHORS.4:4, I fail to retrieve any > > documents

Re: How do I use localparams/joins using SolrJ and/or the Admin GUI

2012-04-02 Thread Stefan Matheis
On Saturday, March 31, 2012 at 6:01 PM, Yonik Seeley wrote: > Shouldn't that be the other way? The admin UI should do any necessary > escaping, so those "+" chars should instead be a spaces? We can, but is this really what you'd expect?

Re: How do I use localparams/joins using SolrJ and/or the Admin GUI

2012-04-02 Thread Stefan Matheis
On Friday, March 30, 2012 at 11:33 PM, vybe3142 wrote: > When I paste the relevant part of the query into the SOLR admin UI query > interface, > {!join+from=join_id+to=id}attributes_AUTHORS.4:4, I fail to retrieve any > documents Just go and paste the raw content into the form, then you'll get

Apache solr not indexing complete pdf file using tikka

2012-04-02 Thread Manoj Saini
Hello Guys, I am using apache solr 3.3.0 with Tikka 1.0. I have pdf files which I am pushing into solr for conent searching. Apache solr is indexing pdf files and I can see them in apache solr admin interface for search. But the issue is apache solr is not indexing whole file content. It is index

Re: Distributed grouping issue

2012-04-02 Thread Martijn v Groningen
The "matches" element in the response should return the number of documents that matched with the query and not the number of groups. Did you encountered this issue also with other Solr versions (3.5 or another nightly build)? Martijn On 2 April 2012 09:41, fbrisbart wrote: > Hi, > > when you w

Re: Empty facet counts

2012-04-02 Thread Youri Westerman
Alright well I discovered that php converts '.' in a variable name to '_' causing my request to contain a variable to a non-exsistent facet_field. 2012/3/30 William Bell > Can you also include a /select?q=*:*&wt=xml > > ? > > On Thu, Mar 29, 2012 at 11:47 AM, Erick Erickson > wrote: > > Hmmm, l

Using UIMA in Solr behind a firewall

2012-04-02 Thread kodo
Hi! I'm desperately trying to work out how to configure Solr in order to allow it to make calls to the Alchemy service through the UIMA analysis engines. Is there anybody who has been able to accomplish this? Cheers -- View this message in context: http://lucene.472066.n3.nabble.com/Using-UIMA-

Re: Virtual Memory very high

2012-04-02 Thread Suneel
Hello Everyone, On window server. I am facing same problem during indexing my memory consumption going very high based on above discussion i checked in my Solrconfig.xml file and found that "directoryFactory" not configured yet. if i configuring directoryfactory then its will help me reduce the c

Re: Solr caching memory consumption Problem

2012-04-02 Thread Suneel
Hello friends, I am using DIH for solr indexing. I have 60 million records in SQL which need to upload on solr. i started caching its smoothly working and memory consumption is normal, But after some time incrementally memory consumption going high and process reach more then 6 gb. that the reason

RE: Distributed grouping issue

2012-04-02 Thread fbrisbart
Hi, when you write "I get xxx results", does it come from 'numFound' ? Or you really display xxx results ? When using both field collapsing and sharding, the 'numFound' may be wrong. In that case, think about using 'shards.rows' parameter with a high value (be careful, it's bad for performance).

Re: pattern error in PatternReplaceCharFilterFactory

2012-04-02 Thread OliverS
Hi It seems to be an unrecognisable pattern, this is from the log, last paragraph says "unknown character block name". The java version is "1.6.0_31": *** SEVERE: null:org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType:Plugin init failure for [schema.xml] analyze