Severe errors in solr configuration

2009-02-04 Thread David Trainor
Hello, I am running Ubuntu 8.10, with Tomcat 6.0.18 installed via the package manager, and I am trying to get Solr 1.3.0 up and running, with no success. I believe I am having the same problem described here: http://www.nabble.com/Severe-errors-in-solr-configuration-td21829562.html When I attemp

Re: Latest on DataImportHandler and Tika?

2009-02-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
We have not taken up anything yet. The idea is to create another contrib which will contain extensions to DIH which has external dependencies as SOLR-934. TikaEntityProcessor is something we wish to do but our limited bandwidth has been the problem On Thu, Feb 5, 2009 at 5:15 AM, Chris Harris wro

Re: instanceDir value is incorrect in multicore environment

2009-02-04 Thread Mark Ferguson
I looked at the core status page and it looks like the problem isn't actually the instanceDir property, but rather dataDir. It's not being appended to instanceDir so its path is relative to cwd. I'm using a patched version of Solr with some of my own custom changes relating to dataDir, so this is

instanceDir value is incorrect in multicore environment

2009-02-04 Thread Mark Ferguson
Hello, I have a problem with setting the instanceDir property for the cores in solr.xml. When I set the value to be relative, it sets it as relative to the location from which I started the application, instead of relative to the solr.home property. I am using Tomcat and I am creating a context f

Query on Level of Access to lucene in Solr

2009-02-04 Thread Nick
Hello there, I'm a solr newbie but i've used lucene for some complex IR projects before. Can someone please help me understand the extent to which solr allows access to lucene? To elaborate, say, i'm considering the use of solr for all its wonderful properties like scaling,

Re: Highlighting Oddities

2009-02-04 Thread ashokc
This problem went away when I updated to use the latest nightly release (2009-02-04) - ashok ashokc wrote: > > I have seen some of these oddities that Chris is referring to. In my case, > terms that are NOT in the query get highlighted. For example searching for > 'Intel' highlights 'Microsot C

Re: Spell checking not returning "full" terms

2009-02-04 Thread Rupert Fiasco
Awesome! After reading up on the links you sent me I got it all working. Thanks! FYI - I did previously come across one of the links you sent over: http://wiki.apache.org/solr/SpellCheckerRequestHandler But what threw me off is that when I started reading about that yesterday, in the first parag

Maximum Term Frequency and Minimum Document Length

2009-02-04 Thread Jonah Schwartz
We want to configure solr so that fields are indexed with a maximum term frequency and a minimum document length. If a term appears more than N times in a field it will be considered to have appeared only N times. If a document length is under M terms, it will be considered to exactly M terms. We h

Re: Queued Requests during GC

2009-02-04 Thread Walter Underwood
On 2/4/09 3:44 PM, "Chris Hostetter" wrote: > I don't thinkg the Query class implementations themselves changed in > anyway that would have made them larger -- but if you switched from the > standard parser to dismax parser, or started using lots of boost > queries, or started using prefix or wil

Latest on DataImportHandler and Tika?

2009-02-04 Thread Chris Harris
Back in November, Shalin and Grant were discussing integrating DataImportHandler and Tika. Shalin's estimation about the best way to do this was as follows: ** I think the best way would be a TikaEntityProcessor which knows how to handle documents. I guess a typical use-case would be FileListEnti

Re: Queued Requests during GC

2009-02-04 Thread Chris Hostetter
: >> Aha! I bet that the full Query object became a lot more complicated : >> between Solr 1.1 and 1.3. That would explain why we did 4X as much GC : >> after the upgrade. I don't thinkg the Query class implementations themselves changed in anyway that would have made them larger -- but if you s

Re: Queued Requests during GC

2009-02-04 Thread Walter Underwood
On 2/4/09 3:17 PM, "Mark Miller" wrote: > Walter Underwood wrote: >> Aha! I bet that the full Query object became a lot more complicated >> between Solr 1.1 and 1.3. That would explain why we did 4X as much GC >> after the upgrade. >> >> Items evicted from cache are tenured, so they contribute t

Re: Highlighting Oddities

2009-02-04 Thread ashokc
I have seen some of these oddities that Chris is referring to. In my case, terms that are NOT in the query get highlighted. For example searching for 'Intel' highlights 'Microsot Corp' as well. I do not have them as synonyms either. Do these filter factories add some extra intelligence to the inde

Re: Queued Requests during GC

2009-02-04 Thread Mark Miller
Walter Underwood wrote: Aha! I bet that the full Query object became a lot more complicated between Solr 1.1 and 1.3. That would explain why we did 4X as much GC after the upgrade. Items evicted from cache are tenured, so they contribute to the full GC. With an HTTP cache in front, there is hard

Re: Queued Requests during GC

2009-02-04 Thread Walter Underwood
Aha! I bet that the full Query object became a lot more complicated between Solr 1.1 and 1.3. That would explain why we did 4X as much GC after the upgrade. Items evicted from cache are tenured, so they contribute to the full GC. With an HTTP cache in front, there is hardly anything left to be cac

Re: Queued Requests during GC

2009-02-04 Thread Yonik Seeley
On Wed, Feb 4, 2009 at 5:52 PM, Walter Underwood wrote: > I have not had the time to pin it down, but I suspect that items > evicted from the query result cache contain a lot of objects. > Are the keys a full parse tree? That could be big. Yes, keys are full Query objects. It would be non-trivial

Re: Queued Requests during GC

2009-02-04 Thread Walter Underwood
On 2/4/09 2:48 PM, "Mark Miller" wrote: > If there are spots in Lucene/Solr that are producing so much garbage > that we can't keep up, perhaps work can be done to address this upon > pinpointing the issues. > > - Mark I have not had the time to pin it down, but I suspect that items evicted fro

Re: Queued Requests during GC

2009-02-04 Thread Mark Miller
Walter Underwood wrote: Also, only use as much heap as you really need. A larger heap means longer GCs. Right. Ideally you want to figure out how to get longer pauses down. There is a lot of fiddling that you can do to improve gc times. On a multiprocessor machine you can parallelize collec

Re: Queued Requests during GC

2009-02-04 Thread Walter Underwood
This is when a load balancer helps. The requests sent around the time that the GC starts will be stuck on that server, but later ones can be sent to other servers. We use a "least connections" load balancing strategy. Each connection represents a request in progress, so this is the same as equaliz

Re: Custom Sorting Algorithm

2009-02-04 Thread Yonik Seeley
On Wed, Feb 4, 2009 at 4:45 PM, wojtekpia wrote: > Ok, so maybe a better question is: should I bother trying to change the > "sorting" algorithm? I'm concerned that with large data sets, sorting > becomes a severe bottleneck (this is an assumption, I haven't profiled > anything to verify). No...

Re: Queued Requests during GC

2009-02-04 Thread Yonik Seeley
On Wed, Feb 4, 2009 at 3:12 PM, Otis Gospodnetic wrote: > I'd be curious if you could reproduce this in Jetty All application threads are blocked... it's going to be the same in Jetty or Tomcat or any other container that's pure Java. There is an OS level listening queue that has a certain d

Re: Custom Sorting Algorithm

2009-02-04 Thread wojtekpia
Ok, so maybe a better question is: should I bother trying to change the "sorting" algorithm? I'm concerned that with large data sets, sorting becomes a severe bottleneck (this is an assumption, I haven't profiled anything to verify). Does it become a severe bottleneck? Do you know if alternate sor

Re: Total count of facets

2009-02-04 Thread Yonik Seeley
On Wed, Feb 4, 2009 at 3:47 PM, Erik Hatcher wrote: > What about using the luke request handler to get the distinct values count? That wouldn't restrict results by the base query and filters. -Yonik

Re: Custom Sorting Algorithm

2009-02-04 Thread Mark Miller
It would not be simple to use a new algorithm. The current implementation takes place at the Lucene level and uses a priority queue. When you ask for the top n results, a priority queue of size n is filled with all of the matching documents. The ordering in the priority queue is the sort. The o

Re: Total count of facets

2009-02-04 Thread Erik Hatcher
What about using the luke request handler to get the distinct values count? Although it is pretty seriously heavy on a big index, so probably not quite workable in your case. Erik On Feb 4, 2009, at 12:54 PM, Yonik Seeley wrote: On Wed, Feb 4, 2009 at 5:42 AM, Bruno Aranda wrote

Re: Custom Sorting Algorithm

2009-02-04 Thread wojtekpia
That's not quite what I meant. I'm not looking for a custom comparator, I'm looking for a custom sorting algorithm. Is there a way to use quick sort or merge sort or... rather than the current algorithm? Also, what is the current algorithm? Otis Gospodnetic wrote: > > > You can use one of the

Re: Queued Requests during GC

2009-02-04 Thread Otis Gospodnetic
Wojtek, I'm not familiar with the details of Tomcat configuration, but this definitely sounds like a container issue, closely related to the JVM. Doing a thread dump for the Java process (the JVM your TOmcat runs in) while the GC is running will show you which threads are blocked and in turn th

Re: Custom Sorting Algorithm

2009-02-04 Thread Otis Gospodnetic
Hi, You can use one of the exiting function queries (if they fit your need) or write a custom function query to reorder the results of a query. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: wojtekpia > To: solr-user@lucene.apache.org >

Re: Differences in output of spell checkers

2009-02-04 Thread Grant Ingersoll
On Feb 4, 2009, at 11:02 AM, Marcus Stratmann wrote: Hello, I'm trying to learn how to use the spell checkers of solr (1.3). I found out that FileBasedSpellChecker and IndexBasedSpellChecker produce different outputs. IndexBasedSpellChecker says

Re: exceeded limit of maxWarmingSearchers

2009-02-04 Thread Otis Gospodnetic
Jon, If you can, don't commit on every update and that should help or fully solve your problem. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Jon Drukman > To: solr-user@lucene.apache.org > Sent: Wednesday, February 4, 2009 1:09:00 PM >

Re: Queued Requests during GC

2009-02-04 Thread Sridhar Basam
That is the expected behaviour, all application threads are paused during GC (CMS collector being an exception, there are smaller pauses but the application threads continue to mostly run). The number of connections that could end up being queued would depend on your acceptCount setting in th

Re: Spell checking not returning "full" terms

2009-02-04 Thread Grant Ingersoll
I'm guessing the field you are checking against is being stemmed. The field you spell check against should have minimal analysis done to it, i.e. tokenization and probably downcasing. See http://wiki.apache.org/solr/SpellCheckComponent and http://wiki.apache.org/solr/SpellCheckerRequestHand

Queued Requests during GC

2009-02-04 Thread wojtekpia
During full garbage collection, Solr doesn't acknowledge incoming requests. Any requests that were received during the GC are timestamped the moment GC finishes (at least that's what my logs show). Is there a limit to how many requests can queue up during a full GC? This doesn't seem like a Solr s

Spell checking not returning "full" terms

2009-02-04 Thread Rupert Fiasco
We are using Solr 1.3 and trying to get spell checking functionality. FYI, our index contains a lot of medical terms (which might or might not make a difference as they are not English-y words, if that makes any sense?) If I specify a spellcheck query of "spellcheck.q=diabtes" I get suggestions

Custom Sorting Algorithm

2009-02-04 Thread wojtekpia
Is an easy way to choose/create an alternate sorting algorithm? I'm frequently dealing with large result sets (a few million results) and I might be able to benefit domain knowledge in my sort. -- View this message in context: http://www.nabble.com/Custom-Sorting-Algorithm-tp21837721p21837721.ht

Multiple uniqueKey problems

2009-02-04 Thread Bruno Mateus
Hello, I'm facing some problems in generating a compound unique key. I'm indexing some database tables not related with each other. In my data-config.xml I have the following Column "alias" and "id" don't exist on

Re: DIH using values from solrconfig.xml inside data-config.xml

2009-02-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
The implementation assumed that most of the users have xml with a fixed schema. . In that case giving absolute path is not hard. This helps us deal with a large subset of usecases rather easily. We have not added all the features which are possible with a streaming parser. It is wiser to piggyback

Re: exceeded limit of maxWarmingSearchers

2009-02-04 Thread Jon Drukman
Otis Gospodnetic wrote: That should be fine (but apparently isn't), as long as you don't have some very slow machine or if your caches are are large and configured to copy a lot of data on commit. this is becoming more and more problematic. we have periods where we get 10 of these exceptio

Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau
Am 04.02.2009 um 15:50 schrieb Anto Binish Kaspar: Yes I removed, still I have the same issue. Any idea what may be cause of this issue? Have you solved your problem? Olivier -- Olivier Dobberkau Je TYPO3, desto d.k.d d.k.d Internet Service GmbH Kaiserstr. 79 D 60329 Frankfurt/Main Regi

Re: Total count of facets

2009-02-04 Thread Yonik Seeley
On Wed, Feb 4, 2009 at 5:42 AM, Bruno Aranda wrote: > Unfortunately, after some tests listing all the distinct surnames or other > fields is too slow and too memory consuming with our current infrastructure. > Could someone confirm that if I wanted to add this functionality (just count > the total

Differences in output of spell checkers

2009-02-04 Thread Marcus Stratmann
Hello, I'm trying to learn how to use the spell checkers of solr (1.3). I found out that FileBasedSpellChecker and IndexBasedSpellChecker produce different outputs. IndexBasedSpellChecker says 1 0

Highlighting on Prefix-Search Bug/Workaround (Re: query with stemming, prefix and fuzzy?)

2009-02-04 Thread Gert Brinkmann
Mark Miller wrote: >> Currently I think about dropping the stemming and only use >> prefix-search. But as highlighting does not work with a prefix "house*" >> this is a problem for me. The hint to use "house?*" instead does not >> work here. >> > Thats because wildcard queries are also not high

RE: Severe errors in solr configuration

2009-02-04 Thread Anto Binish Kaspar
Yes I removed, still I have the same issue. Any idea what may be cause of this issue? - Anto Binish Kaspar -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Wednesday, February 04, 2009 7:42 PM To: solr-user@lucene.apache.org Subject: Re: Severe error

Re: Severe errors in solr configuration

2009-02-04 Thread Shalin Shekhar Mangar
According to http://wiki.apache.org/solr/SolrTomcat, the JNDI context should be: Notice that in the snippet you posted, the name was "/solr/home" (an extra leading '/') http://wiki.apache.org/solr/SolrTomcat#head-7036378fa48b79c0797cc8230a8aa0965412fb2e On Wed, Feb 4, 2009 at 6:59 PM, An

Re: Boost function

2009-02-04 Thread Erick Erickson
>From Hossman... <<>> Search time boosts, as the name implies, factor into the scoring of documents, increasing the score assigned to documents that match on the boosted term, thus tending to score the entire document higher. So these documents tend to be returned earlier in the results when sor

RE: Severe errors in solr configuration

2009-02-04 Thread Anto Binish Kaspar
Now it’s a giving a different message Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: false in null - java

Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau
A slash? Olivier Von meinem iPhone gesendet Am 04.02.2009 um 14:06 schrieb Anto Binish Kaspar : I am using Context file, here is my solr.xml $ cat /var/lib/tomcat6/conf/Catalina/localhost/solr.xml I change the ownership of the folder (usr/local/solr/solr-1.3/solr) to tomcat6:tomcat6

Boost function

2009-02-04 Thread Tushar_Gandhi
Hi, I want to know about boosting. What is the use ? How we can implement that? and How it will affect my search results? Thanks, Tushar -- View this message in context: http://www.nabble.com/Boost-function-tp21829651p21829651.html Sent from the Solr - User mailing list archive at Nabble.com

RE: Severe errors in solr configuration

2009-02-04 Thread Anto Binish Kaspar
I am using Context file, here is my solr.xml $ cat /var/lib/tomcat6/conf/Catalina/localhost/solr.xml I change the ownership of the folder (usr/local/solr/solr-1.3/solr) to tomcat6:tomcat6 from root:root Anything I am missing? - Anto Binish Kaspar -Original Message- From: Ol

Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau
Am 04.02.2009 um 13:54 schrieb Anto Binish Kaspar: Hi Olivier Thanks for your quick reply. I am using the release 1.3 as war file. - Anto Binish Kaspar OK. As far a i understood you need to make sure that your solr home is set. this needs to be done in Quting: http://wiki.apache.org/solr/

RE: Severe errors in solr configuration

2009-02-04 Thread Anto Binish Kaspar
Hi Olivier Thanks for your quick reply. I am using the release 1.3 as war file. - Anto Binish Kaspar -Original Message- From: Olivier Dobberkau [mailto:olivier.dobber...@dkd.de] Sent: Wednesday, February 04, 2009 6:20 PM To: solr-user@lucene.apache.org Subject: Re: Severe errors in sol

Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau
Am 04.02.2009 um 13:33 schrieb Anto Binish Kaspar: Hi, I am trying to configure solr on ubuntu server and I am getting the following exception. I can able work it on windows box. Hi Anto. Have you installed the solr package 1.2 from ubuntu? Or the release 1.3 as war file? Olivier -- Oli

Severe errors in solr configuration

2009-02-04 Thread Anto Binish Kaspar
Hi, I am trying to configure solr on ubuntu server and I am getting the following exception. I can able work it on windows box. message Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration

Re: DIH using values from solrconfig.xml inside data-config.xml

2009-02-04 Thread Fergus McMenemie
>: > The solr data field is populated properly. So I guess that bit works. >: > I really wish I could use xpath="//para" > >: The limitation comes from streaming the XML instead of creating a DOM. >: XPathRecordReader is a custom streaming XPath parser implementation and >: streaming is easy only b

Re: Total count of facets

2009-02-04 Thread Bruno Aranda
Unfortunately, after some tests listing all the distinct surnames or other fields is too slow and too memory consuming with our current infrastructure. Could someone confirm that if I wanted to add this functionality (just count the total of different facets) what I should do is to subclass the Sim

Re: DIH, assigning multiple xpaths to the same solr field: solved

2009-02-04 Thread Fergus McMenemie
Thanks Shalin, Using the following appears to work properly! Regards Fergus >On Wed, Feb 4, 2009 at 1:35 AM, Fergus McMenemie wrote: > >> > dataSource="myfilereader" >> processor="XPathEntityProcessor" >> url="${jc.fileAbsolutePath}" >> stream="false" >>

Re: Total count of facets

2009-02-04 Thread Bruno Aranda
Thanks, I will try that though I am talking in my case about 100,000+ distinct surnames/towns maximum per query and I just needed the count and not the whole list. In any case, this brute-force approach is still something I can try but I wonder how this will behave speed and memory wise when there

Re: Total count of facets

2009-02-04 Thread Shalin Shekhar Mangar
On Wed, Feb 4, 2009 at 2:53 PM, Bruno Aranda wrote: > Mmh, thanks for your answer but with that I get the count of names starting > with A*, but I would like to get the count of distinct surnames (or town > names, or any other field that is not the name...) for the people with name > starting wit

Re: Total count of facets

2009-02-04 Thread Bruno Aranda
Mmh, thanks for your answer but with that I get the count of names starting with A*, but I would like to get the count of distinct surnames (or town names, or any other field that is not the name...) for the people with name starting with A*. Is that possible? Thanks! Bruno 2009/2/4 Shalin Shekh

Re: New wiki pages

2009-02-04 Thread Lance Norskog
I've added them to http://wiki.apache.org/solr/FrontPage under "Search and Indexing". I declare open season on them. That is, anyone can edit them for any reason. I'm sure I got some things wrong in memory sizing and sorting. These tips and opinions came from my experience on an index with hundred

Re: Total count of facets

2009-02-04 Thread Shalin Shekhar Mangar
On Wed, Feb 4, 2009 at 2:14 PM, Bruno Aranda wrote: > Maybe I am not clear, but I am not able to find anything on the net. > Basically, if I had in my index millions of names starting with A* I would > like to know how many distinct surnames are present in the resultset > (similar to a distinct S

Re: Total count of facets

2009-02-04 Thread Bruno Aranda
Maybe I am not clear, but I am not able to find anything on the net. Basically, if I had in my index millions of names starting with A* I would like to know how many distinct surnames are present in the resultset (similar to a distinct SQL query). I will attempt to have a look at the SOLR sources t

Re: DIH using values from solrconfig.xml inside data-config.xml

2009-02-04 Thread Lance Norskog
There are two xml library projects that do streaming xpath reads with full expression evaluation: Nux and dom4j. Nux is from LBL and is an "kinda like BSD" license and dom4j is BSD license. http://dom4j.org/dom4j-1.6.1/project-info.html http://acs.lbl.gov/nux/ The licensing probably kills these,

Re: DIH - Example of using $nextUrl and $hasMore

2009-02-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
currently the initial counter is not set , so the value becomes an empty string http://subdomain.site.com/boards.rss?page=${blogs.n} becomes http://subdomain.site.com/boards.rss?page= we need to fix this. Unfortunately the transformer is invoked only after the first chunk is fetched. the best bet