very slow commit. copy of index ?

2011-04-05 Thread stockii
Hello again ;-) after a full-import from 36M Doc`s my delta import dont work fine. if i starts my delta (which runs on another core very fast) the commit need vry long. I think, that solr copys the hole index and commit the new documents in the index and then reduce the index size after this

Re: how to start GarbageCollector

2011-04-05 Thread stockii
why is solr copy my complete index to somewhere when i start an delta-import? i copy one core, start an full-import from 35Million docs and then start an delta-import from the last hour (~2000Docs). dih/solr need start to copy the hole index... why ? i think he is copy the index, because my hdd-sp

Search Regression Testing

2011-04-05 Thread Mark Mandel
Hey guys, I'm wondering how people are managing regression testing, in particular with things like text based search. I.e. if you change how fields are indexed or change boosts in dismax, ensuring that doesn't mean that critical queries are showing bad data. The obvious answer to me was using un

Re: FW: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-05 Thread Jonathan DeMello
I third that request. Would greatly appreciate taking a look at that diagram! Regards, Jonathan On Wed, Apr 6, 2011 at 9:12 AM, Isan Fulia wrote: > Hi Ephraim/Jen, > > Can u share that diagram with all.It may really help all of us. > Thanks, > Isan Fulia. > > On 6 April 2011 10:15, Tirthankar

Synonym-time Reindexing Issues

2011-04-05 Thread Preston Marshall
Hello all, I am having an issue with Solr and the SynonymFilterFactory. I am using a library to interface with Solr called "sunspot." I realize that is not what this list is for, but I believe this may be an issue with Solr, not the library (plus the lib author doesn't know the answer). I am

Re: FW: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-05 Thread Isan Fulia
Hi Ephraim/Jen, Can u share that diagram with all.It may really help all of us. Thanks, Isan Fulia. On 6 April 2011 10:15, Tirthankar Chatterjee wrote: > Hi Jen, > Can you please forward the diagram attachment too that Ephraim sent. :-) > Thanks, > Tirthankar > > -Original Message- > Fro

Re: Embedded Solr constructor not returning

2011-04-05 Thread Greg Pendlebury
Hmmm, after being stuck on this for hours, I find the answer myself 15minutes after asking for help... as usual. :) For anyone interested, and no doubt this will not be a revelation for some, I need the servlet API in my app for it to work, despite being command line. So adding this to the maven P

Embedded Solr constructor not returning

2011-04-05 Thread Greg Pendlebury
Hi All, I'm hoping this is a reasonably trivial issue, but it's frustrating me to no end. I'm putting together a tiny command line app to write data into an index. It has no web based Solr running against it; the index will be moved at a later time to have a proper server instance start for respon

RE: FW: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-05 Thread Tirthankar Chatterjee
Hi Jen, Can you please forward the diagram attachment too that Ephraim sent. :-) Thanks, Tirthankar -Original Message- From: Jens Mueller [mailto:supidupi...@googlemail.com] Sent: Tuesday, April 05, 2011 10:30 PM To: solr-user@lucene.apache.org Subject: Re: FW: Very very large scale Solr

Re: Script to remove all index.* leftovers

2011-04-05 Thread William Bell
Thank you for pointing out #2. The commitsToKeep is interesting, but I thought each commit would create a segment (before optimized) and be self contained in the index.* directory? I would only run this on the slave. Bill On Tue, Apr 5, 2011 at 2:54 PM, Markus Jelsma wrote: > Hi, > > This seem

Synonym-time Reindexing Issues

2011-04-05 Thread Preston Marshall
Hello all, I am having an issue with Solr and the SynonymFilterFactory. I am using a library to interface with Solr called "sunspot." I realize that is not what this list is for, but I believe this may be an issue with Solr, not the library (plus the lib author doesn't know the answer). I am u

Re: FW: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-05 Thread Jens Mueller
Hello Ephraim, thank you so much for the great Document/Scaling-Concept!! First I think you really should publish this on the solr wiki. This approach is nowhere documented there and not really obvious for newbies and your document is great and explains this very well! Please allow me to further

Re: Eclipse: Invalid character constant

2011-04-05 Thread Eric Grobler
Hi Stefan, Thanks for the information. I used "Checkout Projects from SVN" inside eclipse which does not have the root build.xml file. What does this "eclipse" build actually do? Thanks & Regards Eric On Tue, Apr 5, 2011 at 11:34 PM, Stefan Matheis < matheis.ste...@googlemail.com> wrote: > Eri

Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-04-05 Thread Simon Wistow
On Wed, Apr 06, 2011 at 12:05:57AM +0200, Jan Høydahl said: > Just curious, was there any resolution to this? Not really. We tuned the GC pretty aggressively - we use these options -server -Xmx20G -Xms20G -Xss10M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSIncrementalMode -XX:+CMSIncrem

Re: Eclipse: Invalid character constant

2011-04-05 Thread Stefan Matheis
Eric, have a look at Line #67 in build.xml :) Regards Stefan Am 06.04.2011 00:28, schrieb Eric Grobler: Hi Robert, Thanks for the fast response! I used https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/ but did not find 'ant eclipse'. However setting my projects Resouce

Re: Eclipse: Invalid character constant

2011-04-05 Thread Eric Grobler
Hi Robert, Thanks for the fast response! I used https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/ but did not find 'ant eclipse'. However setting my projects Resouce encoding to UTF-8 worked. Thanks for your help and have a nice day :-) Regards Ericz On Tue, Apr 5, 2011 at

Re: Eclipse: Invalid character constant

2011-04-05 Thread Robert Muir
in eclipse you need to set your project's character encoding to UTF-8. if you are checking out the source code from svn, you can run 'ant eclipse' from the top level, and then hit refresh on your project. it will set your encoding and your classpath up. On Tue, Apr 5, 2011 at 6:10 PM, Eric Groble

Eclipse: Invalid character constant

2011-04-05 Thread Eric Grobler
Hi Everyone, Some language specific classes like GermanLightStemmer has invalid character compiler errors for code like: switch(s[i]) { case 'ä': case 'à ': case 'á': in Eclipse with JDK 1.6 How do I get rid of these errors? Thanks & Regards Ericz

Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-04-05 Thread Jan Høydahl
Hi, Just curious, was there any resolution to this? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 8. feb. 2011, at 03.40, Markus Jelsma wrote: > Do you have GC logging enabled? Tail -f the log file and you'll see what CMS > is > telling you. Tuning the occupati

Re: Keywords/terms mutual exclusion

2011-04-05 Thread Octavian Covalschi
Yes, you may be right sorry for the confusion. Our ultimate goal is to collect user entered data, with least possible interaction (users are lazy you know) from them. So basically users just point out where they found that particular item, and app's job is to index it and later show it in search r

Re: ConcurrentLRUCache$Stats error

2011-04-05 Thread Markus Jelsma
https://issues.apache.org/jira/browse/SOLR-1797 > I'm using solr 1.4.1 and just noticed a bunch of these errors in the > solr.log file: > > SEVERE: java.util.concurrent.ExecutionException: > java.lang.NoSuchMethodError: > org.apache.solr.common.util.ConcurrentLRUCache$Stats.add(Lorg/apache/solr/c

ConcurrentLRUCache$Stats error

2011-04-05 Thread Paul
I'm using solr 1.4.1 and just noticed a bunch of these errors in the solr.log file: SEVERE: java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.solr.common.util.ConcurrentLRUCache$Stats.add(Lorg/apache/solr/common/util/ConcurrentLRUCache$Stats;)V They appear to happen

what happens to docsPending if stop solr before commit

2011-04-05 Thread Robert Petersen
Hello fellow enthusiastic solr users, I tried to find the answer to this simple question online, but failed. I was wondering about this, what happens to uncommitted docsPending if I stop solr and then restart solr? Are they lost? Are they still there but still uncommitted? Do they get commit

RE: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread Nemani, Raj
Thank you so much. I will give this a try. Thanks again everybody for your help Raj -Original Message- From: lboutros [mailto:boutr...@gmail.com] Sent: Tuesday, April 05, 2011 2:28 PM To: solr-user@lucene.apache.org Subject: RE: question on solr.ASCIIFoldingFilterFactory this analyzer

Re: Keywords/terms mutual exclusion

2011-04-05 Thread Jonathan Rochkind
I don't completely understand. I think maybe you replaced your domain-specific actualities with another example in an attempt to be more general or not reveal your business, but just made your explanation even more confusing! But. At the point you are indexing, is it possible to know that "sh

Keywords/terms mutual exclusion

2011-04-05 Thread Octavian Covalschi
Hi there, I'm trying to use Solr in one of my projects and I've got a small problem that I can't figure out. Basically our application is collecting data submitted by users. Now the problem is that submitted data may contain some incorrect info, like some keywords that will mess up search results

apache-solr-3.1 slow stats component queries

2011-04-05 Thread Johannes Goll
Hi, thank you for making the new apache-solr-3.1 available. I have installed the version from http://apache.tradebit.com/pub//lucene/solr/3.1.0/ and am running into very slow stats component queries (~ 1 minute) for fetching the computed sum of the stats field url: ?q=*:*&start=0&rows=0&stats=

Re: dismax "boost query" not useful?

2011-04-05 Thread Chris Hostetter
Short answer: the existence is entirely historic. I added bq because i needed it, and then i added bf because the _val_:"..." syntax was anoying. : can't think of a useful case when I want to both *add* a component to : the ultimate score, and for that component to be a non-function query : (

RE: Problems indexing very large set of documents

2011-04-05 Thread Chris Hostetter
: It wasn't just a single file, it was dozens of files all having problems : toward the end just before I killed the process. ... : That is by no means all the errors, that is just a sample of a few. : You can see they all threw HTTP 500 errors. What is strange is, nearly : every file

Re: Script to remove all index.* leftovers

2011-04-05 Thread Markus Jelsma
Hi, This seems alright as it leaves the current index in place, doesn't mess with the spellchecker and leave the properties alone. But, there are two problems: 1. it doesn't take into account the commitsToKeep value set in the deletion policy, and; 2. it will remove any directory to which a cur

Script to remove all index.* leftovers

2011-04-05 Thread William Bell
There is a bug that leaves old index.* directories in the Solr data directory. Here is a script that will clean it up. I wanted to make sure this is okay, without doing a core reload. Thanks. #!/bin/bash DIR="/mnt/servers/solr/data" LIST=`ls $DIR` INDEX=`cat $DIR/index.properties | grep index\=

RE: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread lboutros
this analyzer seems to work : I used Spanish stemming, put the ASCIIFoldingFilterFactory before the stemming filter and added it in the que

Re: Problems indexing very large set of documents

2011-04-05 Thread Anuj Kumar
Hi Brandon, Sorry, I can't make out much here. The exception gives TIKA error that signifies the parsing issue with PDF. That's all I can make out. May be someone else on this mailing list can help. Sorry. - Anuj On Tue, Apr 5, 2011 at 6:35 PM, Brandon Waterloo < brandon.water...@matrix.msu.edu

RE: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread lboutros
Your analyzer contains these two filters : before : So two things : The words you are testing are not english words (no ?), so the stemming will have strange behavior. If you really want to remove accents, try to put the ASCIIFoldingFilterFactory before the two others. Ludovic. - Jo

Re: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread Markus Jelsma
It's not the ASCII folding filter but the stemmer that's removing some trailing characters. Something you can easily spot on the analysis page. > Here is the field type definition for ‘text’ field which is what I am using > for the indexed fields. Can you guys notice any obvious filter that coul

Re: Why did you trash Wiki page "Troubleshooting HTTP Status 404 - missing core name in path"?

2011-04-05 Thread Gabriele Kahlout
Oh I see. I unfortunately didn't see your earlier email. Thank you! On Tue, Apr 5, 2011 at 6:41 PM, Chris Hostetter wrote: > > : As I had the same problem I went to the wiki looking for the page to > solve > : my problem again, and there under recent changes I found that you had > : trashed it. >

RE: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread Nemani, Raj
Here is the field type definition for ‘text’ field which is what I am using for the indexed fields. Can you guys notice any obvious filter that could be the issue? ---

Re: help with Jetty log message

2011-04-05 Thread Kaufman Ng
Looks like you are using openjdk. Can you try using Sun jdk? On Mon, Apr 4, 2011 at 6:53 AM, Upayavira wrote: > This is not Solr crashing, per se, it is your JVM. I personally haven't > generally had much success debugging these kinds of failure - see > whether it happens again, and if it does,

Re: Indexing data with Trade Mark Symbol

2011-04-05 Thread Markus Jelsma
Any word delimiter filter will get rid of that symbol. Use a char pattern replace filter, that should work. > Use admin/analysis.jsp to see which filter is removing it. > Configure a field type appropriate to what you want to index. > > On Mon, Apr 4, 2011 at 9:55 AM, mechravi25 wrote: > > Hi,

Re: Indexing data with Trade Mark Symbol

2011-04-05 Thread Ben Davies
Use admin/analysis.jsp to see which filter is removing it. Configure a field type appropriate to what you want to index. On Mon, Apr 4, 2011 at 9:55 AM, mechravi25 wrote: > Hi, > Has anyone indexed the data with Trade Mark symbol??...when i tried to > index, the data appears as below. > > Data:

Problem with qf in solrconfig with an embedded server

2011-04-05 Thread belokys
Hello everyone! I need your help. I have tried to add a qf that agregate a boost to a field in my queries by solrconfig.xml. I have tested the solution in a solr server running in standalone mode and it runs perfectly but when I try to do it on a embedded server, the query doesn´t returns me nothi

Indexing data with Trade Mark Symbol

2011-04-05 Thread mechravi25
Hi, Has anyone indexed the data with Trade Mark symbol??...when i tried to index, the data appears as below. Data: 79797 - Siebel Research– AI Fund, 79797 - Siebel Research– AI Fund,l Original Data: 79797 - Siebel Research™ AI Fund, Please help me to resolve this Regards, Ravi -

Re: Why did you trash Wiki page "Troubleshooting HTTP Status 404 - missing core name in path"?

2011-04-05 Thread Chris Hostetter
: As I had the same problem I went to the wiki looking for the page to solve : my problem again, and there under recent changes I found that you had : trashed it. I'm confused -- the page did not have any troubleshooting suggestions or advice, it was just the details of a specific -- it seemed t

RE: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread Steven A Rowe
I added this test method locally to TestASCIIFoldingFilter.java in the Lucene/Solr 3.1.0 source tree, and it passed, so the filter is not the problem (and the Solr factory certainly isn't either - it's just a wrapper) - I second Ludovic's question - you must have other filters configured: pub

Re: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread lboutros
Is there any Stemming configured in for this field in your schema configuration file ? Ludovic. 2011/4/5 Nemani, Raj [via Lucene] < ml-node+2780463-48954297-383...@n3.nabble.com> > All, > > I am using solr.ASCIIFoldingFilterFactory to perform accent insensitive > search. One of the words that g

Re: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread Ben Davies
I can't remember where I read it, but I think MappingCharFilterFactory is prefered. There is an example in the example schema. >From this, I get: org.apache.solr.analysis.MappingCharFilterFactory {mapping=mapping-ISOLatin1Accent.txt} |text|despues| On Tue, Apr 5, 2011 at 5:06 PM, Nemani, Raj

question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread Nemani, Raj
All, I am using solr.ASCIIFoldingFilterFactory to perform accent insensitive search. One of the words that got indexed as part my indexing process is "después". Having used the ASCIIFoldingFilterFactory,I expected that If I searched for word "despues" I should have the document containing the

Re: Using MLT feature

2011-04-05 Thread Markus Jelsma
If you check the code for TextProfileSignature [1] your'll notice the init method reading params. You can set those params as you did. Reading Javadoc [2] might help as well. But what's not documented in the Javadoc is how QUANT is computed; it rounds. [1]: http://svn.apache.org/viewvc/lucene/

RE: Using MLT feature

2011-04-05 Thread Frederico Azeiteiro
Thank you, I'll try to create a c# method to create the same sig of SOLR, and then compare both sigs before index the doc. This way I can avoid the indexation of existing docs. If anyone needs to use this parameter (as this info is not on the wiki), you can add the option 5 On the processor t

Re: Matching on a multi valued field

2011-04-05 Thread Renaud Delbru
Hi, you could try the SIREn plugin [1] which supports multi-valued fields. [1] http://siren.sindice.com -- Renaud Delbru On 29/03/11 21:57, Brian Lamb wrote: Hi all, I have a field set up like this: And I have some records: RECORD1 man's best friend pooch RECORD2 man's worst

RE: Problems indexing very large set of documents

2011-04-05 Thread Brandon Waterloo
It wasn't just a single file, it was dozens of files all having problems toward the end just before I killed the process. IPADDR - - [04/04/2011:17:17:03 +] "POST /solr/update/extract?literal.id=32-130-AFB-84&commit=false HTTP/1.1" 500 4558 IPADDR - - [04/04/2011:17:17:05 +] "POST /

Different Result for the same query depending on using SolrServer or SolrCore ?

2011-04-05 Thread Amel Fraisse
Hello every body, I am using Solr for indexing and searching. I am using 2 classes for searching document: In the first one I'm instanciating a SolrServer to search documents as follows : server = new EmbeddedSolrServer(coreContainer, ""); server.add(doc); query.setQuery("id:"+idDoc); server.que

Re: Matching on a multi valued field

2011-04-05 Thread Michael Sokolov
Could you try creating fields dynamically: common_names_1, common_names_2, etc. Keep track of the max number of fields and generate queries listing all the fields? Gross, but it handles all the cases mentioned in the thread (wildcards, phrases, etc). -Mike On 3/29/2011 4:57 PM, Brian Lamb

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-05 Thread François Schiettecatte
And if you have control over machine placement, split them across racks so that a power outage on one rack does not take out your search cluster. François On Apr 5, 2011, at 3:19 AM, Ephraim Ofir wrote: > I'm not sure about the scale you're aiming for, but you probably want to > do both shardin

normalizing the score

2011-04-05 Thread Paul Libbrecht
Hello list, I did not find a wiki page about normalization. All I found was: http://search.lucidimagination.com/search/document/9d06882d97db5c59/a_question_about_solr_score where Hoss suggests to normalize depending on the maxScore. I am not comfortable with that since, at least, I want that a

Re: Using MLT feature

2011-04-05 Thread Markus Jelsma
On Tuesday 05 April 2011 12:19:33 Frederico Azeiteiro wrote: > Sorry, the reply I made yesterday was directed to Markus and not the > list... > > Here's my thoughts on this. At this point I'm a little confused if SOLR > is a good option to find near duplicate docs. > > >> Yes there is, try set

Re: Solrj performance bottleneck

2011-04-05 Thread rahul
Thanks Stefan and Victor ! we are using GWT for front end. We stopped issuing multiple asynchronous queries and issue a request and fetch results and then filter the results based on what has been typed subsequent to the request and then re trigger the request only if we don't get the expected resu

RE: Using MLT feature

2011-04-05 Thread Frederico Azeiteiro
Sorry, the reply I made yesterday was directed to Markus and not the list... Here's my thoughts on this. At this point I'm a little confused if SOLR is a good option to find near duplicate docs. >> Yes there is, try set overwriteDupes to true and documents yielding the same signature will be over

Solrj and display which Solr version is used

2011-04-05 Thread Marc SCHNEIDER
Hi, I'm wondering how to find out which version of Solr is currently running using the Solrj library? Thanks, Marc.

Re: Mongo REST interface and full data import

2011-04-05 Thread andrew_s
Hi Stefan, Thanks for clear explanation. I've used XPathEntityProcessor as an example, because didn't found JSON entity processor. I'll write a script to generate XML file for data import. Regards, Andrew -- View this message in context: http://lucene.472066.n3.nabble.com/Mongo-REST-interface

Re: Mongo REST interface and full data import

2011-04-05 Thread Stefan Matheis
andrew, you're really wondering why the XPathEntityProcessor does not work well, with a JSON-Structure !? The Links Erick posted are stating, that you could push JSON-structured Data to a Solr-HTTP Interface .. but not, that the DataImport Handler will work with them. IIRC there is no way for proc

Why did you trash Wiki page "Troubleshooting HTTP Status 404 - missing core name in path"?

2011-04-05 Thread Gabriele Kahlout
Hello, As I had the same problem I went to the wiki looking for the page to solve my problem again, and there under recent changes I found that you had trashed it. I can still solve my problem but why don't you keep it for others to benefit from too? As linked it's a recurring problem for several

RE: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-05 Thread Ephraim Ofir
I'm not sure about the scale you're aiming for, but you probably want to do both sharding and replication. There's no central server which would be the bottleneck. The guidelines should probably be something like: 1. Split your index to enough shards so it can keep up with the update rate. 2. Have