Hello,
I am running Ubuntu 8.10, with Tomcat 6.0.18 installed via the package
manager, and I am trying to get Solr 1.3.0 up and running, with no success.
I believe I am having the same problem described here:
http://www.nabble.com/Severe-errors-in-solr-configuration-td21829562.html
When I attemp
We have not taken up anything yet. The idea is to create another
contrib which will contain extensions to DIH which has external
dependencies as SOLR-934.
TikaEntityProcessor is something we wish to do but our limited
bandwidth has been the problem
On Thu, Feb 5, 2009 at 5:15 AM, Chris Harris wro
I looked at the core status page and it looks like the problem isn't
actually the instanceDir property, but rather dataDir. It's not being
appended to instanceDir so its path is relative to cwd.
I'm using a patched version of Solr with some of my own custom changes
relating to dataDir, so this is
Hello,
I have a problem with setting the instanceDir property for the cores in
solr.xml. When I set the value to be relative, it sets it as relative to the
location from which I started the application, instead of relative to the
solr.home property.
I am using Tomcat and I am creating a context f
Hello there,
I'm a solr newbie but i've used lucene for some complex
IR projects before.
Can someone please help me understand the extent to which solr allows
access to lucene?
To elaborate, say, i'm considering the use of solr for all its wonderful
properties like scaling,
This problem went away when I updated to use the latest nightly release
(2009-02-04)
- ashok
ashokc wrote:
>
> I have seen some of these oddities that Chris is referring to. In my case,
> terms that are NOT in the query get highlighted. For example searching for
> 'Intel' highlights 'Microsot C
Awesome! After reading up on the links you sent me I got it all working. Thanks!
FYI - I did previously come across one of the links you sent over:
http://wiki.apache.org/solr/SpellCheckerRequestHandler
But what threw me off is that when I started reading about that
yesterday, in the first parag
We want to configure solr so that fields are indexed with a maximum term
frequency and a minimum document length. If a term appears more than N times
in a field it will be considered to have appeared only N times. If a
document length is under M terms, it will be considered to exactly M terms.
We h
On 2/4/09 3:44 PM, "Chris Hostetter" wrote:
> I don't thinkg the Query class implementations themselves changed in
> anyway that would have made them larger -- but if you switched from the
> standard parser to dismax parser, or started using lots of boost
> queries, or started using prefix or wil
Back in November, Shalin and Grant were discussing integrating
DataImportHandler and Tika. Shalin's estimation about the best way to
do this was as follows:
**
I think the best way would be a TikaEntityProcessor which knows how to
handle documents. I guess a typical use-case would be
FileListEnti
: >> Aha! I bet that the full Query object became a lot more complicated
: >> between Solr 1.1 and 1.3. That would explain why we did 4X as much GC
: >> after the upgrade.
I don't thinkg the Query class implementations themselves changed in
anyway that would have made them larger -- but if you s
On 2/4/09 3:17 PM, "Mark Miller" wrote:
> Walter Underwood wrote:
>> Aha! I bet that the full Query object became a lot more complicated
>> between Solr 1.1 and 1.3. That would explain why we did 4X as much GC
>> after the upgrade.
>>
>> Items evicted from cache are tenured, so they contribute t
I have seen some of these oddities that Chris is referring to. In my case,
terms that are NOT in the query get highlighted. For example searching for
'Intel' highlights 'Microsot Corp' as well. I do not have them as synonyms
either. Do these filter factories add some extra intelligence to the inde
Walter Underwood wrote:
Aha! I bet that the full Query object became a lot more complicated
between Solr 1.1 and 1.3. That would explain why we did 4X as much GC
after the upgrade.
Items evicted from cache are tenured, so they contribute to the full GC.
With an HTTP cache in front, there is hard
Aha! I bet that the full Query object became a lot more complicated
between Solr 1.1 and 1.3. That would explain why we did 4X as much GC
after the upgrade.
Items evicted from cache are tenured, so they contribute to the full GC.
With an HTTP cache in front, there is hardly anything left to be
cac
On Wed, Feb 4, 2009 at 5:52 PM, Walter Underwood wrote:
> I have not had the time to pin it down, but I suspect that items
> evicted from the query result cache contain a lot of objects.
> Are the keys a full parse tree? That could be big.
Yes, keys are full Query objects.
It would be non-trivial
On 2/4/09 2:48 PM, "Mark Miller" wrote:
> If there are spots in Lucene/Solr that are producing so much garbage
> that we can't keep up, perhaps work can be done to address this upon
> pinpointing the issues.
>
> - Mark
I have not had the time to pin it down, but I suspect that items
evicted fro
Walter Underwood wrote:
Also, only use as much heap as you really need. A larger heap
means longer GCs.
Right. Ideally you want to figure out how to get longer pauses down.
There is a lot of fiddling that you can do to improve gc times.
On a multiprocessor machine you can parallelize collec
This is when a load balancer helps. The requests sent around the
time that the GC starts will be stuck on that server, but later
ones can be sent to other servers.
We use a "least connections" load balancing strategy. Each connection
represents a request in progress, so this is the same as equaliz
On Wed, Feb 4, 2009 at 4:45 PM, wojtekpia wrote:
> Ok, so maybe a better question is: should I bother trying to change the
> "sorting" algorithm? I'm concerned that with large data sets, sorting
> becomes a severe bottleneck (this is an assumption, I haven't profiled
> anything to verify).
No...
On Wed, Feb 4, 2009 at 3:12 PM, Otis Gospodnetic
wrote:
> I'd be curious if you could reproduce this in Jetty
All application threads are blocked... it's going to be the same in
Jetty or Tomcat or any other container that's pure Java. There is an
OS level listening queue that has a certain d
Ok, so maybe a better question is: should I bother trying to change the
"sorting" algorithm? I'm concerned that with large data sets, sorting
becomes a severe bottleneck (this is an assumption, I haven't profiled
anything to verify). Does it become a severe bottleneck? Do you know if
alternate sor
On Wed, Feb 4, 2009 at 3:47 PM, Erik Hatcher wrote:
> What about using the luke request handler to get the distinct values count?
That wouldn't restrict results by the base query and filters.
-Yonik
It would not be simple to use a new algorithm. The current
implementation takes place at the Lucene level and uses a priority
queue. When you ask for the top n results, a priority queue of size n is
filled with all of the matching documents. The ordering in the priority
queue is the sort. The o
What about using the luke request handler to get the distinct values
count? Although it is pretty seriously heavy on a big index, so
probably not quite workable in your case.
Erik
On Feb 4, 2009, at 12:54 PM, Yonik Seeley wrote:
On Wed, Feb 4, 2009 at 5:42 AM, Bruno Aranda
wrote
That's not quite what I meant. I'm not looking for a custom comparator, I'm
looking for a custom sorting algorithm. Is there a way to use quick sort or
merge sort or... rather than the current algorithm? Also, what is the
current algorithm?
Otis Gospodnetic wrote:
>
>
> You can use one of the
Wojtek,
I'm not familiar with the details of Tomcat configuration, but this definitely
sounds like a container issue, closely related to the JVM.
Doing a thread dump for the Java process (the JVM your TOmcat runs in) while
the GC is running will show you which threads are blocked and in turn th
Hi,
You can use one of the exiting function queries (if they fit your need) or
write a custom function query to reorder the results of a query.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: wojtekpia
> To: solr-user@lucene.apache.org
>
On Feb 4, 2009, at 11:02 AM, Marcus Stratmann wrote:
Hello,
I'm trying to learn how to use the spell checkers of solr (1.3). I
found out that FileBasedSpellChecker and IndexBasedSpellChecker
produce different outputs.
IndexBasedSpellChecker says
Jon,
If you can, don't commit on every update and that should help or fully solve
your problem.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Jon Drukman
> To: solr-user@lucene.apache.org
> Sent: Wednesday, February 4, 2009 1:09:00 PM
>
That is the expected behaviour, all application threads are paused
during GC (CMS collector being an exception, there are smaller pauses
but the application threads continue to mostly run). The number of
connections that could end up being queued would depend on your
acceptCount setting in th
I'm guessing the field you are checking against is being stemmed. The
field you spell check against should have minimal analysis done to it,
i.e. tokenization and probably downcasing. See http://wiki.apache.org/solr/SpellCheckComponent
and http://wiki.apache.org/solr/SpellCheckerRequestHand
During full garbage collection, Solr doesn't acknowledge incoming requests.
Any requests that were received during the GC are timestamped the moment GC
finishes (at least that's what my logs show). Is there a limit to how many
requests can queue up during a full GC? This doesn't seem like a Solr
s
We are using Solr 1.3 and trying to get spell checking functionality.
FYI, our index contains a lot of medical terms (which might or might
not make a difference as they are not English-y words, if that makes
any sense?)
If I specify a spellcheck query of "spellcheck.q=diabtes"
I get suggestions
Is an easy way to choose/create an alternate sorting algorithm? I'm
frequently dealing with large result sets (a few million results) and I
might be able to benefit domain knowledge in my sort.
--
View this message in context:
http://www.nabble.com/Custom-Sorting-Algorithm-tp21837721p21837721.ht
Hello,
I'm facing some problems in generating a compound unique key. I'm
indexing some database tables not related with each other. In my
data-config.xml I have the following
Column "alias" and "id" don't exist on
The implementation assumed that most of the users have xml with a
fixed schema. . In that case giving absolute path is not hard. This
helps us deal with a large subset of usecases rather easily.
We have not added all the features which are possible with a
streaming parser. It is wiser to piggyback
Otis Gospodnetic wrote:
That should be fine (but apparently isn't), as long as you don't have some very
slow machine or if your caches are are large and configured to copy a lot of
data on commit.
this is becoming more and more problematic. we have periods where we
get 10 of these exceptio
Am 04.02.2009 um 15:50 schrieb Anto Binish Kaspar:
Yes I removed, still I have the same issue. Any idea what may be
cause of this issue?
Have you solved your problem?
Olivier
--
Olivier Dobberkau
Je TYPO3, desto d.k.d
d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main
Regi
On Wed, Feb 4, 2009 at 5:42 AM, Bruno Aranda wrote:
> Unfortunately, after some tests listing all the distinct surnames or other
> fields is too slow and too memory consuming with our current infrastructure.
> Could someone confirm that if I wanted to add this functionality (just count
> the total
Hello,
I'm trying to learn how to use the spell checkers of solr (1.3). I found
out that FileBasedSpellChecker and IndexBasedSpellChecker produce
different outputs.
IndexBasedSpellChecker says
1
0
Mark Miller wrote:
>> Currently I think about dropping the stemming and only use
>> prefix-search. But as highlighting does not work with a prefix "house*"
>> this is a problem for me. The hint to use "house?*" instead does not
>> work here.
>>
> Thats because wildcard queries are also not high
Yes I removed, still I have the same issue. Any idea what may be cause of this
issue?
- Anto Binish Kaspar
-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
Sent: Wednesday, February 04, 2009 7:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Severe error
According to http://wiki.apache.org/solr/SolrTomcat, the JNDI context should
be:
Notice that in the snippet you posted, the name was "/solr/home" (an extra
leading '/')
http://wiki.apache.org/solr/SolrTomcat#head-7036378fa48b79c0797cc8230a8aa0965412fb2e
On Wed, Feb 4, 2009 at 6:59 PM, An
>From Hossman...
<<>>
Search time boosts, as the name implies, factor into the scoring of
documents, increasing the score assigned to documents that match on the
boosted term, thus tending to score the entire document higher. So these
documents tend to be returned earlier in the results when sor
Now it’s a giving a different message
Severe errors in solr configuration. Check your log files for more detailed
information on what may be wrong. If you want solr to continue after
configuration errors, change:
false in null
-
java
A slash?
Olivier
Von meinem iPhone gesendet
Am 04.02.2009 um 14:06 schrieb Anto Binish Kaspar :
I am using Context file, here is my solr.xml
$ cat /var/lib/tomcat6/conf/Catalina/localhost/solr.xml
I change the ownership of the folder (usr/local/solr/solr-1.3/solr)
to tomcat6:tomcat6
Hi,
I want to know about boosting. What is the use ?
How we can implement that? and How it will affect my search results?
Thanks,
Tushar
--
View this message in context:
http://www.nabble.com/Boost-function-tp21829651p21829651.html
Sent from the Solr - User mailing list archive at Nabble.com
I am using Context file, here is my solr.xml
$ cat /var/lib/tomcat6/conf/Catalina/localhost/solr.xml
I change the ownership of the folder (usr/local/solr/solr-1.3/solr) to
tomcat6:tomcat6 from root:root
Anything I am missing?
- Anto Binish Kaspar
-Original Message-
From: Ol
Am 04.02.2009 um 13:54 schrieb Anto Binish Kaspar:
Hi Olivier
Thanks for your quick reply. I am using the release 1.3 as war file.
- Anto Binish Kaspar
OK.
As far a i understood you need to make sure that your solr home is set.
this needs to be done in
Quting:
http://wiki.apache.org/solr/
Hi Olivier
Thanks for your quick reply. I am using the release 1.3 as war file.
- Anto Binish Kaspar
-Original Message-
From: Olivier Dobberkau [mailto:olivier.dobber...@dkd.de]
Sent: Wednesday, February 04, 2009 6:20 PM
To: solr-user@lucene.apache.org
Subject: Re: Severe errors in sol
Am 04.02.2009 um 13:33 schrieb Anto Binish Kaspar:
Hi,
I am trying to configure solr on ubuntu server and I am getting the
following exception. I can able work it on windows box.
Hi Anto.
Have you installed the solr package 1.2 from ubuntu?
Or the release 1.3 as war file?
Olivier
--
Oli
Hi,
I am trying to configure solr on ubuntu server and I am getting the following
exception. I can able work it on windows box.
message Severe errors in solr configuration. Check your log files for more
detailed information on what may be wrong. If you want solr to continue after
configuration
>: > The solr data field is populated properly. So I guess that bit works.
>: > I really wish I could use xpath="//para"
>
>: The limitation comes from streaming the XML instead of creating a DOM.
>: XPathRecordReader is a custom streaming XPath parser implementation and
>: streaming is easy only b
Unfortunately, after some tests listing all the distinct surnames or other
fields is too slow and too memory consuming with our current infrastructure.
Could someone confirm that if I wanted to add this functionality (just count
the total of different facets) what I should do is to subclass the
Sim
Thanks Shalin,
Using the following appears to work properly!
Regards Fergus
>On Wed, Feb 4, 2009 at 1:35 AM, Fergus McMenemie wrote:
>
>> > dataSource="myfilereader"
>> processor="XPathEntityProcessor"
>> url="${jc.fileAbsolutePath}"
>> stream="false"
>>
Thanks, I will try that though I am talking in my case about 100,000+
distinct surnames/towns maximum per query and I just needed the count and
not the whole list. In any case, this brute-force approach is still
something I can try but I wonder how this will behave speed and memory wise
when there
On Wed, Feb 4, 2009 at 2:53 PM, Bruno Aranda wrote:
> Mmh, thanks for your answer but with that I get the count of names starting
> with A*, but I would like to get the count of distinct surnames (or town
> names, or any other field that is not the name...) for the people with name
> starting wit
Mmh, thanks for your answer but with that I get the count of names starting
with A*, but I would like to get the count of distinct surnames (or town
names, or any other field that is not the name...) for the people with name
starting with A*. Is that possible?
Thanks!
Bruno
2009/2/4 Shalin Shekh
I've added them to http://wiki.apache.org/solr/FrontPage under "Search and
Indexing". I declare open season on them. That is, anyone can edit them for
any reason. I'm sure I got some things wrong in memory sizing and sorting.
These tips and opinions came from my experience on an index with hundred
On Wed, Feb 4, 2009 at 2:14 PM, Bruno Aranda wrote:
> Maybe I am not clear, but I am not able to find anything on the net.
> Basically, if I had in my index millions of names starting with A* I would
> like to know how many distinct surnames are present in the resultset
> (similar to a distinct S
Maybe I am not clear, but I am not able to find anything on the net.
Basically, if I had in my index millions of names starting with A* I would
like to know how many distinct surnames are present in the resultset
(similar to a distinct SQL query).
I will attempt to have a look at the SOLR sources t
There are two xml library projects that do streaming xpath reads with full
expression evaluation: Nux and dom4j. Nux is from LBL and is an "kinda like
BSD" license and dom4j is BSD license.
http://dom4j.org/dom4j-1.6.1/project-info.html
http://acs.lbl.gov/nux/
The licensing probably kills these,
currently the initial counter is not set , so the value becomes an empty string
http://subdomain.site.com/boards.rss?page=${blogs.n}
becomes
http://subdomain.site.com/boards.rss?page=
we need to fix this. Unfortunately the transformer is invoked only
after the first chunk is fetched.
the best bet
64 matches
Mail list logo