RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-13 Thread Upayavira
The second commit will bring in all changes, from both syncs. Think of the sync part as a glorified rsync of files on disk. So the files will have been copied to disk, but the in memory index on the slave will not have noticed that those files have changed. The commit is intended to remedy that -

RAM usage issues

2010-12-13 Thread Cameron Hurst
hello all, I am a new user to Solr and I am having a few issues with the setup and wondering if anyone had some suggestions. I am currently running this as just a test environment before I go into production. I am using a tomcat6 environment for my servlet and solr 1.4.1 as the solr build. I set u

Solr Memory Usage

2010-12-13 Thread Cameron Hurst
Hello all, I am a new user to Solr and am currently in a testing phase before I try and take my server into production. For my system I have a tomcat6 servlet running solr 1.4.1. Everything is running currently on my local computer and it is parsing data from a local dump of the production MySQL s

Re: Very high load

2010-12-13 Thread Mark
Changing the subject. Its not related to after replication. It only appeared after indexing an extra field which increased our index size from 12g to 20g+ On 12/13/10 7:57 AM, Mark wrote: Markus, My configuration is as follows... ... false 2 ... false 64 10 false true No cache warming

RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-13 Thread Jonathan Rochkind
Sorry, I guess I don't understand the details of replication enough. So slave tries to replicate. It pulls down the new index files. It tries to do a commit but fails. But "the next commit that does succeed will have all the updates." Since it's a slave, it doesn't get any commits of it's own.

Re: Userdefined Field type - Faceting

2010-12-13 Thread Yonik Seeley
Perhaps try overriding indexedToReadable() also? -Yonik http://www.lucidimagination.com On Mon, Dec 13, 2010 at 10:00 PM, Viswa S wrote: > > Hello, > > We implemented an IP-Addr field type which internally stored the ips as > hex-ed string (e.g. "192.2.103.29" will be stored as "c002671d"). My

Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-13 Thread Yonik Seeley
On Mon, Dec 13, 2010 at 9:27 PM, Jonathan Rochkind wrote: > Yonik, how will maxWarmingSearchers in this scenario effect replication?  If > a slave is pulling down new indexes so quickly that the warming searchers > would ordinarily pile up, but maxWarmingSearchers is set to 1 what > happens

Re: SpatialTierQueryParserPlugin Loading Error

2010-12-13 Thread Erick Erickson
This page shows you how to apply a patch: http://wiki.apache.org/solr/HowToContribute However, are you aware that this is a patch to the *source* code and then you have to compile it? A simpler approach would be to just grab either the trunk build or a v

SpatialTierQueryParserPlugin Loading Error

2010-12-13 Thread Adam Estrada
All, Can anyone shed some light on this error. I can't seem to get this class to load. I am using the distribution of Solr from Lucid Imagination and the Spatial Plugin from here https://issues.apache.org/jira/browse/SOLR-773. I don't know how to apply a patch but the jar file is in there. What el

Userdefined Field type - Faceting

2010-12-13 Thread Viswa S
Hello, We implemented an IP-Addr field type which internally stored the ips as hex-ed string (e.g. "192.2.103.29" will be stored as "c002671d"). My "toExternal" and "toInternal" methods for appropriate conversion seems to be working well for query results, but however when faceting on this fie

RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-13 Thread Jonathan Rochkind
Wow, really, it's that easy? I could swear there's a wiki page somewhere that suggests otherwise, but I believe Yonik today over a wiki page last edited wherever. But this should be well-publisized, it's a pretty easy solution that will at least give you "as up to date as your Solr can handl

Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-13 Thread Shawn Heisey
On 12/13/2010 3:38 PM, Jonathan Rochkind wrote: But if the problem really is just GC issues and not actually too much RAM being used, try this JVM setting: -XX:+UseConcMarkSweepGC That's I use on my shards, I've never had any visible problems with memory or garbage collection delays. I have

RE: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-13 Thread Jonathan Rochkind
ConcMarkSweep probably won't help. Solr 1.4 is not very good at 'near real time' committing. There are some features post-1.4, that I don't know if they are in trunk yet or still just patches, that I have not investigated myself, but google (or JIRA search) for 'near real time'. http://wiki.a

Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-13 Thread Yonik Seeley
On Mon, Dec 13, 2010 at 8:47 PM, John Russell wrote: > Wow, you read my mind.  We are committing very frequently.  We are trying to > get as close to realtime access to the stuff we put in as possible.  Our > current commit time is... ahem every 4 seconds. > > Is that insane? Not necessarily

Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-13 Thread John Russell
Wow, you read my mind. We are committing very frequently. We are trying to get as close to realtime access to the stuff we put in as possible. Our current commit time is... ahem every 4 seconds. Is that insane? I'll try the ConcMarkSweep as well and see if that helps. On Mon, Dec 13, 2010

Re: [pubDate] is not converting correctly

2010-12-13 Thread Adam Estrada
My first submission ;-) https://issues.apache.org/jira/browse/SOLR-2286 Adam On Mon, Dec 13, 2010 at 5:14 PM, Lance Norskog wrote: > Create an account at > https://issues.apache.org/jira/secure/Dashboard.jspa and do 'Create > New Issue' for the

Re: JMX Cache values are wrong

2010-12-13 Thread Chris Hostetter
: I've used three different JMX clients to query ... : beans and they appear to return old cache information. : : As new searchers come online, the newer caches dosen't appear to be : registered perhaps? : I can see this when I query JMX for the 'description' attribute and : the regenerat

Re: Problem with loading a class

2010-12-13 Thread Chris Hostetter
: Caused by: java.lang.ClassNotFoundException: : solr.StempelPolishStemFilterFactory : : So I tried putting : contrib/analysis-extras/lucene-libs/lucene-stempel-3.1-2010-12-06_10-23-49.jar : in ./lib and ./lucene-libs - same result. The lucene-stempel-*.jar file contains the StempelPolishStemFil

Re: access to environment variables in solrconfig.xml and/or schema.xml?

2010-12-13 Thread Koji Sekiguchi
(10/12/14 4:28), Burton-West, Tom wrote: I see variables used to access java system properties in solrconfig.xml and schema.xml: http://wiki.apache.org/solr/SolrConfigXml#System_property_substitution ${solr.data.dir:} or ${solr.abortOnConfigurationError:true} Is there a way to access environme

Re: SolrEventListeners are instantiated twice

2010-12-13 Thread Chris Hostetter
: SolrEventListener. Even though I only register the listener in the query : section of solrconfig.xml, listening to the firstSearcher event, the : listener is also attached to the UpdateHandler and thus the init-method runs : twice because there is two instances of the class. To eliminate any oth

Re: Separate Lines Like Google

2010-12-13 Thread Koji Sekiguchi
(10/12/14 5:06), Alejandro Delgadillo wrote: Koji, Thank you for helping me with my questions, but I still don't get it how it's done, let's say I search for the term "love" and I get something like this: LoveLove may also refer to: Contents. 1 Film and television. As you can see the second t

Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-13 Thread Jonathan Rochkind
Forgive me if I've said this in this thread already, but I'm beginning to think this is the main 'mysterious' cause of Solr RAM/gc issues. Are you committing very frequently? So frequently that you commit faster than it takes for warming operations on a new Solr index to complete, and you're

Re: How to get all the search results?

2010-12-13 Thread Solr User
Hi Shawn, Yes you did. I tried and did not work so I asked the same question again. Now I understood and tried directly on the Solr admin and I got all the search results. I will implement the same on the website. Thank you so much Shawn. On Mon, Dec 13, 2010 at 5:16 PM, Shawn Heisey wrote:

Re: How to get all the search results?

2010-12-13 Thread Shawn Heisey
On 12/13/2010 9:59 AM, Solr User wrote: Hi, I tried *:* using dismax and I get no results. Is there a way that I can get all the search results using dismax? For dismax, use q= or simply leave the q parameter off the URL entirely. It appears that you need to have q.alt set to *:* for this t

Re: [pubDate] is not converting correctly

2010-12-13 Thread Lance Norskog
Create an account at https://issues.apache.org/jira/secure/Dashboard.jspa and do 'Create New Issue' for the Solr project. On Mon, Dec 13, 2010 at 2:13 PM, Lance Norskog wrote: > Please file a JIRA requesting this. > > On Mon, Dec 13, 2010 at 6:29 AM, Adam Estrada wrote: >> +1  If I knew enough a

Re: [pubDate] is not converting correctly

2010-12-13 Thread Lance Norskog
Please file a JIRA requesting this. On Mon, Dec 13, 2010 at 6:29 AM, Adam Estrada wrote: > +1  If I knew enough about how to do this in Java I would but I do not > s.What is the correct way to add or suggest enhancements to Solr > core? > > Adam > > On Sun, Dec 12, 2010 at 11:38 PM, Lance

Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-13 Thread John Russell
Thanks for the response. The date types are defined in our schema file like this Which appears to be what you mentioned. Then we use them in fields like this So I think we have the right datatypes for the dates. Most of the other ones are strings. As for the doc we a

Re: Separate Lines Like Google

2010-12-13 Thread Alejandro Delgadillo
Koji, Thank you for helping me with my questions, but I still don't get it how it's done, let's say I search for the term "love" and I get something like this: LoveLove may also refer to: Contents. 1 Film and television. As you can see the second term is from the same document but it is from a d

Re: Is it possible to assign default value for a particular record when using multivalued field type?

2010-12-13 Thread bbarani
Hi, Is there a template transformer which can act on each and every record of multivalued attribute? The issue is that some of the records might have null data in source and I want those data to be replaced with some default value. Also if the value is blank I could just see in the XML. Any id

Re: Strange replication problem

2010-12-13 Thread Xin Li
did you double check http://machine:port/solr/website/admin/replication/ to see the "master" is indeed a master? On Mon, Dec 13, 2010 at 1:01 PM, Ralf Mattes wrote: > On Mon, 13 Dec 2010 12:31:27 -0500, Xin Li wrote: > >> " indexversion returned by the indexversion command is 0 while the same >>

access to environment variables in solrconfig.xml and/or schema.xml?

2010-12-13 Thread Burton-West, Tom
I see variables used to access java system properties in solrconfig.xml and schema.xml: http://wiki.apache.org/solr/SolrConfigXml#System_property_substitution ${solr.data.dir:} or ${solr.abortOnConfigurationError:true} Is there a way to access environment variables or does everything have to be

Re: Geospatial search w/polygon bounding box?

2010-12-13 Thread Erick Erickson
It really doesn't look like it. As part of some other work I'm doing I ran across this: https://issues.apache.org/jira/browse/SOLR-2155 which seems to speak (on cursory glance) at the polygon-bounding-box issue. But as you see it hasn't been committe

Re: Indexing pdf files - question.

2010-12-13 Thread Wodek Siebor
The sample /docs/tutorial.pdf does not require OCR. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-pdf-files-question-tp2079505p2080307.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Strange replication problem

2010-12-13 Thread Ralf Mattes
On Mon, 13 Dec 2010 12:31:27 -0500, Xin Li wrote: > " indexversion returned by the indexversion command is 0 while the same > information from the details command is 292192351652 ..." > > This only happens to a Slave machine. For a Master machine, indexversion > returns the same number as details

Re: Strange replication problem

2010-12-13 Thread Xin Li
" indexversion returned by the indexversion command is 0 while the same information from the details command is 292192351652 ..." This only happens to a Slave machine. For a Master machine, indexversion returns the same number as details command. On Mon, Dec 13, 2010 at 11:06 AM, Ralf Mattes

Re: How to get all the search results?

2010-12-13 Thread Erick Erickson
Can we see the results with &debugQuery=on? As well as the entire http string you use? Also, are you sure you've put documents in your index and committed afterwards? Best Erick On Mon, Dec 13, 2010 at 11:59 AM, Solr User wrote: > Hi, > > I tried *:* using dismax and I get no results. > > Is t

Re: How to get all the search results?

2010-12-13 Thread Solr User
Hi, I tried *:* using dismax and I get no results. Is there a way that I can get all the search results using dismax? Thanks, Murali On Mon, Dec 6, 2010 at 11:17 AM, Savvas-Andreas Moysidis < savvas.andreas.moysi...@googlemail.com> wrote: > Hello, > > shouldn't that query syntax be *:* ? > > R

Re: Indexing pdf files - question.

2010-12-13 Thread Adam Estrada
Hi, I use the following command to post PDF files. $ curl "http://localhost:8983/solr/update/extract?stream.file=C :\temp\document.docx&stream.contentType=application/msword&literal.id =esc.doc&commit=true" $ curl "http://localhost:8983/solr/update/extract?stream.file=C :\temp\features.pdf&stream

Re: full text search in multiple fields

2010-12-13 Thread PeterKerk
whoops :) It was directed at iorixxx, in the first post before me -- View this message in context: http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2079581.html Sent from the Solr - User mailing list archive at Nabble.com.

Strange replication problem

2010-12-13 Thread Ralf Mattes
Hello list, I'm trying to set up a replicating solr system (one master, one slave) here. Everything _looks_ o.k. but replication fails. A little debugging shows the following: r...@slave:~# curl 'http://master:8180/solr/website/replication?command=indexversion&wt=json' && echo '' {"responseH

Indexing pdf files - question.

2010-12-13 Thread Siebor, Wlodek [USA]
HI, Can sombody, please, send me a command for indexing a sample pdf with ExtractngRequestHandler file available in the /docs directory. I have lucidworks solr installed on linux, with standard schema.xml and solrconfig.xml files (unchanged). I want to pass as the unique id the name of the file.

Re: Taxonomy and Faceting

2010-12-13 Thread Tommaso Teofili
With the SOLR-2129 patch you enable an Apache UIMA [1] pipeline to enrich documents being indexed. The base pipeline provided with the patch uses the following blocks (see OverridingParamsExtServicesAE.xml): AggregateSentenceAE OpenCalaisAnnotator TextKeywordExtractionAED

Re: Very high load after replicating

2010-12-13 Thread Mark
Markus, My configuration is as follows... ... false 2 ... false 64 10 false true No cache warming queries and our machines have 8g of memory in them with about 5120m of ram dedicated to so Solr. When our index is around 10-11g in size everything runs smoothly. At around 20g+ it just fall

Re: Newbie: Indexing unrelated MySQL tables

2010-12-13 Thread Erick Erickson
Warning: I haven't tried this, but maybe it's relevant. See: http://wiki.apache.org/solr/DataImportHandler particularly the "multiple datasources" section. I'm thinking here that you have to define a different data source for each separate table you wa

Re: How to implement and a system based on IMAP auth

2010-12-13 Thread Peter Sturge
imap has no intrinsic functionality for logging in as a user then 'impersonating' someone else. What you can do is setup your email server so that your administrator account or similar has access to other users via shared folders (this is supported in imap2 servers - e.g. Exchange). This is done al

Re: Concurrent DIH calls

2010-12-13 Thread Stefan Matheis
I don't know if this is helpful .. but there is http://wiki.apache.org/solr/DataImportHandler#EventListeners which would trigger on 'onImportEnd'

Re: Taxonomy and Faceting

2010-12-13 Thread webdev1977
Based on this: VALID_ALCHEMYAPI_KEY VALID_ALCHEMYAPI_KEY VALID_ALCHEMYAPI_KEY VALID_ALCHEMYAPI_KEY VALID_ALCHEMYAPI_KEY VALID_OPENCALAIS_KEY ...this can't be used unless you use some sort of processing engine? I am playing around with some other open so

Re: Solr replication, HAproxy and data management

2010-12-13 Thread Paolo Castagna
Paolo Castagna wrote: Hi, we are using Solr v1.4.x with multi-cores and a master/slaves configuration. We also use HAProxy [1] to load balance search requests amongst slaves. Finally, we use MapReduce to create new Solr indexes. I'd like to share with you what I am doing when I need to: 1. a

Re: Concurrent DIH calls

2010-12-13 Thread Juan Manuel Alvarez
Thanks for the answer Barani! I was doing the same thing (queuing requests and querying solr status), but I was hoping some flag/configuration would do the trick. I will continue with that approach then! =o) Thanks! Juan M. On Sat, Dec 11, 2010 at 3:50 AM, bbarani wrote: > > Hi, > > As far as I

Re: Separate Lines Like Google

2010-12-13 Thread Koji Sekiguchi
(10/12/13 23:00), Alejandro Delgadillo wrote: Hi everybody, I¹m having some troubles trying to figure out how to separate lines in a paragraph from a search result, I¹m indexing PDF¹s but when I search the highlight terms I can not know when the first line ends and the next one begins, Is ther

Re: How to implement and a system based on IMAP auth

2010-12-13 Thread Erick Erickson
I don't see where the MailEntityProcessor really has anything built into it for indexing somebody else's mail, so you're probably going to need to go down the SolrJ route. SolrJ is actually quite easy to use, there are only a very few classes you'll need, so I'd go there The "Usage" section here w

Re: Newbie: Indexing unrelated MySQL tables

2010-12-13 Thread Stefan Matheis
And yes, sorry for the short answer .. http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer would be good for that :)

Re: Newbie: Indexing unrelated MySQL tables

2010-12-13 Thread Stefan Matheis
To avoid overwrites in your case, use a combined id - f.e. $table_$id which results in user_1, job_1 and so on ..

How to implement and a system based on IMAP auth

2010-12-13 Thread milomalo2...@libero.it
Hi Guys, i am new in Solr world and i was trying to figure out how to implement an application which would be able to connect to our business mail server throug IMAP connection (1000 users) and to index the information related e-mail contents. I tried to use DH- import with the preconfigured i

Newbie: Indexing unrelated MySQL tables

2010-12-13 Thread Jaakko Rajaniemi
Hello, Alright, let's describe the situation. I have a website and the website has a database with at least three tables. - "users" table - (id, firstname, lastname) - "artwork" table - (id, user, name, description) - "jobs" table - (id, company, position, location, description) I want to i

Re: [pubDate] is not converting correctly

2010-12-13 Thread Adam Estrada
+1 If I knew enough about how to do this in Java I would but I do not s.What is the correct way to add or suggest enhancements to Solr core? Adam On Sun, Dec 12, 2010 at 11:38 PM, Lance Norskog wrote: > Nice find! This is Apache 2.0, copyright SUN. > > O Great Apache Elders: Is it kos

Re: Solr on Google App Engine

2010-12-13 Thread Praveen Agrawal
Thanks Dave.. On Mon, Dec 13, 2010 at 4:06 PM, Dave Searle wrote: > EC2 installations are just windows/linux machines, so this would just be a > normal setup. I have a solr server running on a small instance with 1.7gb > ram mounted to an EBS volume of 50gb, seems to run fine. Costs about $115 a

Separate Lines Like Google

2010-12-13 Thread Alejandro Delgadillo
Hi everybody, I¹m having some troubles trying to figure out how to separate lines in a paragraph from a search result, I¹m indexing PDF¹s but when I search the highlight terms I can not know when the first line ends and the next one begins, Is there a way to put a [...] like google o a Paragrap

Re: Rebuild Spellchecker based on cron expression

2010-12-13 Thread Martin Grotzke
Hi Erick, thanx for your advice! I'll check the options with our client and see how we'll proceed. My spare time right now is already full with other open source stuff, otherwise it'd be fun contributing s.th. to solr! :-) Cheers, Martin On Mon, Dec 13, 2010 at 2:46 PM, Erick Erickson wrote: >

Re: Rebuild Spellchecker based on cron expression

2010-12-13 Thread Martin Grotzke
On Mon, Dec 13, 2010 at 4:01 AM, Erick Erickson wrote: > I'm shooting in the dark here, but according to this: > http://wiki.apache.org/solr/SolrReplication > after the slave pulls the index > down, it issues a commit. So if your > slave is configured t

Re: gotchas, issues with document deletions/replacements/edits

2010-12-13 Thread Erick Erickson
You're right, updates are really deletes/adds. Deleted documents are NOT found in future queries, so that's not a problem. However, the #terms# in a deleted document still affect the relevance calculations. But in most cases you'll never notice this. By that I mean that the term frequency counts a

Re: Rebuild Spellchecker based on cron expression

2010-12-13 Thread Erick Erickson
*** Just wondering what's the reason that this patch receives that little interest. Anything wrong with it? *** Nobody got behind it and pushed I suspect. And since it's been a long time since it was updated, there's no guarantee that it would apply cleanly any more. Or that it will perform as int

Re: Highlighting for non-stored fields

2010-12-13 Thread Alessandro Benedetti
We developed a custom Highlighter to solve this issue. We added a "url" field in the solr schema doc for our domain and when highlighting is called, we access the file, extract the information and send them to the custom highlighter. If you still need some help, I can provide you, our solution in

RE: Solr on Google App Engine

2010-12-13 Thread Dave Searle
EC2 installations are just windows/linux machines, so this would just be a normal setup. I have a solr server running on a small instance with 1.7gb ram mounted to an EBS volume of 50gb, seems to run fine. Costs about $115 a month -Original Message- From: Praveen Agrawal [mailto:pkal...@

When all cores are ready to be used?

2010-12-13 Thread De Stefano, Giovanni, VF-Group
Hello all, I have a component that uses SolrServer(s) in a multicore environment. I would like to cache these solr servers in a map (core name, server). When is the right time to create this map? I tried with a custom ContextListener but it seems that the cores are not ready yet: I have to

Re: Solr on Google App Engine

2010-12-13 Thread Praveen Agrawal
Thanks a lot, Mauricio. Does anyone has any experience on Amazon EC2, or can point me to existing discussions? Appreciate your help. Thanks. Praveen On Thu, Dec 9, 2010 at 6:20 PM, Mauricio Scheffer < mauricioschef...@gmail.com> wrote: > Solr on GAE has been discussed a couple of times, see the

gotchas, issues with document deletions/replacements/edits

2010-12-13 Thread Dennis Gearon
I am about to set up a live edit of database contents that get indexed in a Solr Instance. I seem to remember that edits in the index are actually deletes and replacements? The deleted items don't really disappear, right? What about queries do they affect? Counts? Return results? ? Dennis

Re: Rebuild Spellchecker based on cron expression

2010-12-13 Thread Peter Karich
Building on optimize is not possible as index optimization is done on the master and the slaves don't even run an optimize but only fetch the optimized index. isn't the spellcheck index replicated to the slaves too? -- http://jetwick.com open twitter search