Help: java.lang.OutOfMemoryError: PermGen space

2010-09-20 Thread Markus.Rietzler
the second time we had the error

java.lang.OutOfMemoryError: PermGen space

and solr stopped responding.

we use the default jetty installation with jdk1.6.0_21. after the last
time i tried to set the garbage collector "right"
these are my settings:

-D64 -server -Xms892m -Xmx2048m -XX:+UseConcMarkSweepGC
-XX:+UseParNewGC -XX:-HeapDumpOnOutOfMemoryError
-XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled

as far as i thought, "-XX:+CMSClassUnloadingEnabled
-XX:+CMSPermGenSweepingEnabled" should also cleanup the PermGen space.

what can we do? 

ok, at the moment solr is not stopped and is running "all time". maybe
we should do a regular (daily) restart, then the problem should be
fixed. but how can we adjust the garbage settings, so that the PermGen
space is not running out of space...


markus


Re: Help: java.lang.OutOfMemoryError: PermGen space

2010-09-20 Thread Peter Karich
see
http://stackoverflow.com/questions/88235/how-to-deal-with-java-lang-outofmemoryerror-permgen-space-error

and the links there. There seems to be no good solution :-/
The only reliable solution is restart, before you haven't enough
permgenspace (use jvisualvm to monitor)
And try to increase -XX:MaxPermSize to make the restart interval longer
or using jrebel or sth. like that should probably help too.

Regards,
Peter.

> the second time we had the error
>
>   java.lang.OutOfMemoryError: PermGen space
>
> and solr stopped responding.
>
> we use the default jetty installation with jdk1.6.0_21. after the last
> time i tried to set the garbage collector "right"
> these are my settings:
>
>   -D64 -server -Xms892m -Xmx2048m -XX:+UseConcMarkSweepGC
> -XX:+UseParNewGC -XX:-HeapDumpOnOutOfMemoryError
> -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled
>
> as far as i thought, "-XX:+CMSClassUnloadingEnabled
> -XX:+CMSPermGenSweepingEnabled" should also cleanup the PermGen space.
>
> what can we do? 
>
> ok, at the moment solr is not stopped and is running "all time". maybe
> we should do a regular (daily) restart, then the problem should be
> fixed. but how can we adjust the garbage settings, so that the PermGen
> space is not running out of space...
>
>
> markus
>
>   


-- 
http://jetwick.com twitter search prototype



Solr UIMA integration

2010-09-20 Thread Tommaso Teofili
Hi all,
I am working on integrating Apache UIMA as un UpdateRequestProcessor for
Apache Solr and I am now at the first working snapshot.
I put the code on GoogleCode [1] and you can take a look at the tutorial
[2].

I would be glad to donate it to the Apache Solr project, as I think it could
be a useful module to trigger automatic content extraction while indexing
documents.

At the moment the UIMAUpdateRequestProcessor base implementation can
automatically extract document's sentences, language, keywords, concepts and
named entities using Apache UIMA's HMMTagger, OpenCalaisAnnotator and
AlchemyAPIAnnotator components (but it can be easily expanded).

Any feedback is welcome.
Have a nice day.
Tommaso

[1] : http://code.google.com/p/solr-uima/
[2] : http://code.google.com/p/solr-uima/wiki/5MinutesTutorial


Restrict possible results based on relational information

2010-09-20 Thread Stefan Matheis
Hi List,

this is my first message on this list, so if there's something
missing/incorrect, please let me know :)

the current problem, described in short words followed by an short example,
is the following one:

users can send privates messages, the selection of recipients is done via
auto-complete. therefore we need to restrict the possible results based on
the users confirmed contacts - but i have absolutely no idea how to do that
:/ Add all confirmed contacts to the index, and use it like a type of
relation? pass the list of confirmed contacts together with the query?

let's say we have "John Doe" which creates a new message. typing "doe"
should suggest "Jane Doe", "Thomas Doe" - but not "Another Doe", which is
also a user, but none of his confirmed Contacts. Maybe we get also "John
Doe" as possible match, but that should be okay in the first place - if we
could exclude the user himself also, that's of course better.

every user-record has an id, additional fields for firstname and lastname.
confirmed contacts are simply explained records with field from:user-id
to:user-id, actually with no additional information about type of
relationship or something. but nothing of this relationship-information is
currently submitted to the solr-index.

if you need more information to answer this not-very-concrete question (and
i'm sure, i've missed some relevant info) just ask, please :)

Regards
Stefan


Re: Restrict possible results based on relational information

2010-09-20 Thread Chantal Ackermann
hi Stefan

> users can send privates messages, the selection of recipients is done via
> auto-complete. therefore we need to restrict the possible results based on
> the users confirmed contacts - but i have absolutely no idea how to do that
> :/ Add all confirmed contacts to the index, and use it like a type of
> relation? pass the list of confirmed contacts together with the query?

This does not sound like a search query because:
1. you know the user
2. you know his/her list of confirmed contacts

If both statements are true, the list of confirmed contacts should be
accessible via JSON-URL call so that you can load it into a autocomplete
dropdown.
SOLR needs not be involved in this case (but you can of course store the
list of confirmed contacts in a multivalued field per user if you need
it for other searches or facetting).

Cheers,
Chantal



Solr Analyzer results before the actual query.

2010-09-20 Thread zackko

Hi to all the Forum from a new subscriber,

I’m working on the Server Side Search solution of the Company when I’m
currently employed with. I have a problem at the moment: When I will submit
a search to Solr I want to see the “Analyzer results”, with all the Filter
applied to it as defined into the types.xml, of the search terms (Query)
submitted to the Analyzer itself. The result of the Analyzer I want to have
displayed BEFORE the actual search will be performed so I can decide at this
point if I can run the proper search or leave the user with no results on
the search performed.
 
The problem is more less described in that issue
https://issues.apache.org/jira/browse/SOLR-261. In summary is that possible
to have the Analyzer results (in code) before running the actual Sorl
search?

I'm quite new to Solr so maybe this issue has been already discussed in
another thread but I'm unable to find it at the moment, so if anybody has a
any clue on how to do that please any suggestion will be more than welcome.
 
Thanks very much in advance for your answer.
 
Best wishes.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Analyzer-results-before-the-actual-query-tp1528692p1528692.html
Sent from the Solr - User mailing list archive at Nabble.com.


NGram and word boundaries?

2010-09-20 Thread Harry Hochheiser
I've got a question regarding NGramFilterFactory. It seems to work
very well, but I've had trouble getting it to work with other filters.


Specifically, if I have an index analyzer that uses a
StandardTokenizerFactory to tokenize and follows it up with an
NGramFilterFactory, it does a fine job of handling ngrams, but it
doesn't respect word boundaries: queries will match across whitespace.
Using a modified example of the monitor.xml file for the example, If I
have a field containing the text "Dell Widescreen UltraSharp 3007WFP",
and I provide the search query "en U", it will match.

I'd like to have the NGramFilterFactory match only _within_ words: how
can I go about doing that? I'd like to avoid having to manually
pre-process the query.

I can provide detailed schema and examples is they'd help..

thanks!
-harry


Re: Solr for statistical data

2010-09-20 Thread Kjetil Ødegaard
On Thu, Sep 16, 2010 at 11:48 AM, Peter Karich  wrote:

> Hi Kjetil,
>
> is this custom component (which performes groub by + calcs stats)
> somewhere available?
> I would like to do something similar. Would you mind to share if it
> isn't already available?
>
> The grouping stuff sounds similar to
> https://issues.apache.org/jira/browse/SOLR-236
>
> where you can have mem problems too ;-) or see:
> https://issues.apache.org/jira/browse/SOLR-1682
>
>
Thanks for the links! These patches seem to provide somewhat similar
functionality, I'll investigate if they're implemented in a similar way too.

We've developed this component for a client, so while I'd like to share it I
can't make any promises. Sorry.


> > Any tips or similar experiences?
>
> you want to decrease memory usage?


Yes. Specifically, I would like to keep the heap at 4 GB. Unfortunately I'm
still seeing some OutOfMemoryErrors so I might have to up the heap size
again.

I guess what I'm really wondering is if there's a way to keep memory use
down, while at the same time not sacrificing the performance of our queries.
The queries have to run through all values for a field in order to calculate
the sum, so it's not enough to just cache a few values.

The code which fetches values from the index uses
FieldCache.DEFAULT.getStringIndex for a field, and then indexes like this:

FieldType fieldType = searcher.getSchema().getFieldType(fieldName);
fieldType.indexedToReadable(stringIndex.lookup[stringIndex.order[documentId]]);

Is there a better way to do this? Thanks.


---Kjetil


Re: Searching solr with a two word query

2010-09-20 Thread noel
Here is my raw query:
q=opening+excellent+AND+presentation_id%3A294+AND+type%3Ablob&version=1.3&json.nl=map&rows=10&start=0&wt=xml&hl=true&hl.fl=text&hl.simple.pre=&hl.simple.post=<%2Fspan>&hl.fragsize=0&hl.mergeContiguous=false&debugQuery=on

and here is what I get on the debugQuery:

−

opening excellent AND presentation_id:294 AND type:blob

−

opening excellent AND presentation_id:294 AND type:blob

−

all_text:open +all_text:excel +presentation_id:294 +type:blob

−

all_text:open +all_text:excel +presentation_id:€#0;Ħ +type:blob

−

−


3.1143723 = (MATCH) sum of:
  0.46052343 = (MATCH) weight(all_text:open in 4457), product of:
0.5531408 = queryWeight(all_text:open), product of:
  5.3283896 = idf(docFreq=162, maxDocs=12359)
  0.10381013 = queryNorm
0.8325609 = (MATCH) fieldWeight(all_text:open in 4457), product of:
  1.0 = tf(termFreq(all_text:open)=1)
  5.3283896 = idf(docFreq=162, maxDocs=12359)
  0.15625 = fieldNorm(field=all_text, doc=4457)
  0.74662465 = (MATCH) weight(all_text:excel in 4457), product of:
0.7043054 = queryWeight(all_text:excel), product of:
  6.7845535 = idf(docFreq=37, maxDocs=12359)
  0.10381013 = queryNorm
1.0600865 = (MATCH) fieldWeight(all_text:excel in 4457), product of:
  1.0 = tf(termFreq(all_text:excel)=1)
  6.7845535 = idf(docFreq=37, maxDocs=12359)
  0.15625 = fieldNorm(field=all_text, doc=4457)
  1.7987071 = (MATCH) weight(presentation_id:€#0;Ħ in 4457), product of:
0.43211576 = queryWeight(presentation_id:€#0;Ħ), product of:
  4.1625586 = idf(docFreq=522, maxDocs=12359)
  0.10381013 = queryNorm
4.1625586 = (MATCH) fieldWeight(presentation_id:€#0;Ħ in 4457), product of:
  1.0 = tf(termFreq(presentation_id:€#0;Ħ)=1)
  4.1625586 = idf(docFreq=522, maxDocs=12359)
  1.0 = fieldNorm(field=presentation_id, doc=4457)
  0.108517066 = (MATCH) weight(type:blob in 4457), product of:
0.10613751 = queryWeight(type:blob), product of:
  1.0224196 = idf(docFreq=12084, maxDocs=12359)
  0.10381013 = queryNorm
1.0224196 = (MATCH) fieldWeight(type:blob in 4457), product of:
  1.0 = tf(termFreq(type:blob)=1)
  1.0224196 = idf(docFreq=12084, maxDocs=12359)
  1.0 = fieldNorm(field=type, doc=4457)

−


2.06395 = (MATCH) product of:
  2.7519336 = (MATCH) sum of:
0.84470934 = (MATCH) weight(all_text:excel in 4911), product of:
  0.7043054 = queryWeight(all_text:excel), product of:
6.7845535 = idf(docFreq=37, maxDocs=12359)
0.10381013 = queryNorm
  1.199351 = (MATCH) fieldWeight(all_text:excel in 4911), product of:
1.4142135 = tf(termFreq(all_text:excel)=2)
6.7845535 = idf(docFreq=37, maxDocs=12359)
0.125 = fieldNorm(field=all_text, doc=4911)
1.7987071 = (MATCH) weight(presentation_id:€#0;Ħ in 4911), product of:
  0.43211576 = queryWeight(presentation_id:€#0;Ħ), product of:
4.1625586 = idf(docFreq=522, maxDocs=12359)
0.10381013 = queryNorm
  4.1625586 = (MATCH) fieldWeight(presentation_id:€#0;Ħ in 4911), product 
of:
1.0 = tf(termFreq(presentation_id:€#0;Ħ)=1)
4.1625586 = idf(docFreq=522, maxDocs=12359)
1.0 = fieldNorm(field=presentation_id, doc=4911)
0.108517066 = (MATCH) weight(type:blob in 4911), product of:
  0.10613751 = queryWeight(type:blob), product of:
1.0224196 = idf(docFreq=12084, maxDocs=12359)
0.10381013 = queryNorm
  1.0224196 = (MATCH) fieldWeight(type:blob in 4911), product of:
1.0 = tf(termFreq(type:blob)=1)
1.0224196 = idf(docFreq=12084, maxDocs=12359)
1.0 = fieldNorm(field=type, doc=4911)
  0.75 = coord(3/4)

−


1.9903867 = (MATCH) product of:
  2.653849 = (MATCH) sum of:
0.74662465 = (MATCH) weight(all_text:excel in 4468), product of:
  0.7043054 = queryWeight(all_text:excel), product of:
6.7845535 = idf(docFreq=37, maxDocs=12359)
0.10381013 = queryNorm
  1.0600865 = (MATCH) fieldWeight(all_text:excel in 4468), product of:
1.0 = tf(termFreq(all_text:excel)=1)
6.7845535 = idf(docFreq=37, maxDocs=12359)
0.15625 = fieldNorm(field=all_text, doc=4468)
1.7987071 = (MATCH) weight(presentation_id:€#0;Ħ in 4468), product of:
  0.43211576 = queryWeight(presentation_id:€#0;Ħ), product of:
4.1625586 = idf(docFreq=522, maxDocs=12359)
0.10381013 = queryNorm
  4.1625586 = (MATCH) fieldWeight(presentation_id:€#0;Ħ in 4468), product 
of:
1.0 = tf(termFreq(presentation_id:€#0;Ħ)=1)
4.1625586 = idf(docFreq=522, maxDocs=12359)
1.0 = fieldNorm(field=presentation_id, doc=4468)
0.108517066 = (MATCH) weight(type:blob in 4468), product of:
  0.10613751 = queryWeight(type:blob), product of:
1.0224196 = idf(docFreq=12084, maxDocs=12359)
0.10381013 = queryNorm
  1.0224196 = (MATCH) fieldWeight(type:blob in 4468), product of:
1.0 = tf(termFreq(type:blob)=1)
1.0224196 = idf(docFreq=12084, m

SolrCloud new....

2010-09-20 Thread satya swaroop
Hi all,
I  am having 4 instances of solr in 4 systems.Each system has a
single instance of solr.. I want the result from all these servers. I came
to know using of solrcloud. I read about it and worked on the example and it
was working as given in wiki.
I am using solr 1.4 and apache tomcat. In order to implement cloud in the
solr trunk wat procedure should be followed.
1)Should i copy the libraries from cloud to trunk???
2)should i keep the cloud module in every system???
3) I am not using any cores in the solr. It is a single solr in every
system.can solrcloud support it??
4) the example is given in jetty.Is it the same way to make it in tomcat???

Regards,
satya


Re: Solr for statistical data

2010-09-20 Thread Thomas Joiner
I don't know if this thread might help with your problems any, but it might
give some pointers:

http://lucene.472066.n3.nabble.com/Tuning-Solr-caches-with-high-commit-rates-NRT-td1461275.html


--Thomas

On Mon, Sep 20, 2010 at 7:58 AM, Kjetil Ødegaard
wrote:

> On Thu, Sep 16, 2010 at 11:48 AM, Peter Karich  wrote:
>
> > Hi Kjetil,
> >
> > is this custom component (which performes groub by + calcs stats)
> > somewhere available?
> > I would like to do something similar. Would you mind to share if it
> > isn't already available?
> >
> > The grouping stuff sounds similar to
> > https://issues.apache.org/jira/browse/SOLR-236
> >
> > where you can have mem problems too ;-) or see:
> > https://issues.apache.org/jira/browse/SOLR-1682
> >
> >
> Thanks for the links! These patches seem to provide somewhat similar
> functionality, I'll investigate if they're implemented in a similar way
> too.
>
> We've developed this component for a client, so while I'd like to share it
> I
> can't make any promises. Sorry.
>
>
> > > Any tips or similar experiences?
> >
> > you want to decrease memory usage?
>
>
> Yes. Specifically, I would like to keep the heap at 4 GB. Unfortunately I'm
> still seeing some OutOfMemoryErrors so I might have to up the heap size
> again.
>
> I guess what I'm really wondering is if there's a way to keep memory use
> down, while at the same time not sacrificing the performance of our
> queries.
> The queries have to run through all values for a field in order to
> calculate
> the sum, so it's not enough to just cache a few values.
>
> The code which fetches values from the index uses
> FieldCache.DEFAULT.getStringIndex for a field, and then indexes like this:
>
> FieldType fieldType = searcher.getSchema().getFieldType(fieldName);
>
> fieldType.indexedToReadable(stringIndex.lookup[stringIndex.order[documentId]]);
>
> Is there a better way to do this? Thanks.
>
>
> ---Kjetil
>


Re: Calculating distances in Solr using longitude latitude

2010-09-20 Thread PeterKerk

Hi Dennis,

Good suggestion, but I see that most of that is Solr 4.0 functionality,
which has not been released yet.
How can I still use the longitude latitude functionality (LatLonType)?

Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Calculating-distances-in-Solr-using-longitude-latitude-tp1524297p1529097.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searching solr with a two word query

2010-09-20 Thread Erick Erickson
Here's an excellent description of the Lucene query operators and how they
differ from strict
boolean logic: http://www.gossamer-threads.com/lists/lucene/java-user/47928

But the short
form is that (and boy, doesn't the fact that the URL escaping spaces
as '+', which is also a Lucene operator make looking at these interesting),
is that the
first term is essentially a SHOULD clause in a Lucene BooleanQuery and is
matching your docs all by itself.

HTH
Erick

On Mon, Sep 20, 2010 at 8:58 AM,  wrote:

> Here is my raw query:
> q=opening+excellent+AND+presentation_id%3A294+AND+type%3Ablob&version=1.3&
> json.nl
> =map&rows=10&start=0&wt=xml&hl=true&hl.fl=text&hl.simple.pre=&hl.simple.post=<%2Fspan>&hl.fragsize=0&hl.mergeContiguous=false&debugQuery=on
>
> and here is what I get on the debugQuery:
> 
> −
> 
> opening excellent AND presentation_id:294 AND type:blob
> 
> −
> 
> opening excellent AND presentation_id:294 AND type:blob
> 
> −
> 
> all_text:open +all_text:excel +presentation_id:294 +type:blob
> 
> −
> 
> all_text:open +all_text:excel +presentation_id:€#0;Ħ +type:blob
> 
> −
> 
> −
> 
>
> 3.1143723 = (MATCH) sum of:
>  0.46052343 = (MATCH) weight(all_text:open in 4457), product of:
>0.5531408 = queryWeight(all_text:open), product of:
>  5.3283896 = idf(docFreq=162, maxDocs=12359)
>  0.10381013 = queryNorm
>0.8325609 = (MATCH) fieldWeight(all_text:open in 4457), product of:
>  1.0 = tf(termFreq(all_text:open)=1)
>  5.3283896 = idf(docFreq=162, maxDocs=12359)
>  0.15625 = fieldNorm(field=all_text, doc=4457)
>  0.74662465 = (MATCH) weight(all_text:excel in 4457), product of:
>0.7043054 = queryWeight(all_text:excel), product of:
>  6.7845535 = idf(docFreq=37, maxDocs=12359)
>  0.10381013 = queryNorm
>1.0600865 = (MATCH) fieldWeight(all_text:excel in 4457), product of:
>  1.0 = tf(termFreq(all_text:excel)=1)
>  6.7845535 = idf(docFreq=37, maxDocs=12359)
>  0.15625 = fieldNorm(field=all_text, doc=4457)
>  1.7987071 = (MATCH) weight(presentation_id:€#0;Ħ in 4457), product of:
>0.43211576 = queryWeight(presentation_id:€#0;Ħ), product of:
>  4.1625586 = idf(docFreq=522, maxDocs=12359)
>  0.10381013 = queryNorm
>4.1625586 = (MATCH) fieldWeight(presentation_id:€#0;Ħ in 4457), product
> of:
>  1.0 = tf(termFreq(presentation_id:€#0;Ħ)=1)
>  4.1625586 = idf(docFreq=522, maxDocs=12359)
>  1.0 = fieldNorm(field=presentation_id, doc=4457)
>  0.108517066 = (MATCH) weight(type:blob in 4457), product of:
>0.10613751 = queryWeight(type:blob), product of:
>  1.0224196 = idf(docFreq=12084, maxDocs=12359)
>  0.10381013 = queryNorm
>1.0224196 = (MATCH) fieldWeight(type:blob in 4457), product of:
>  1.0 = tf(termFreq(type:blob)=1)
>  1.0224196 = idf(docFreq=12084, maxDocs=12359)
>  1.0 = fieldNorm(field=type, doc=4457)
> 
> −
> 
>
> 2.06395 = (MATCH) product of:
>  2.7519336 = (MATCH) sum of:
>0.84470934 = (MATCH) weight(all_text:excel in 4911), product of:
>  0.7043054 = queryWeight(all_text:excel), product of:
>6.7845535 = idf(docFreq=37, maxDocs=12359)
>0.10381013 = queryNorm
>  1.199351 = (MATCH) fieldWeight(all_text:excel in 4911), product of:
>1.4142135 = tf(termFreq(all_text:excel)=2)
>6.7845535 = idf(docFreq=37, maxDocs=12359)
>0.125 = fieldNorm(field=all_text, doc=4911)
>1.7987071 = (MATCH) weight(presentation_id:€#0;Ħ in 4911), product of:
>  0.43211576 = queryWeight(presentation_id:€#0;Ħ), product of:
>4.1625586 = idf(docFreq=522, maxDocs=12359)
>0.10381013 = queryNorm
>  4.1625586 = (MATCH) fieldWeight(presentation_id:€#0;Ħ in 4911),
> product of:
>1.0 = tf(termFreq(presentation_id:€#0;Ħ)=1)
>4.1625586 = idf(docFreq=522, maxDocs=12359)
>1.0 = fieldNorm(field=presentation_id, doc=4911)
>0.108517066 = (MATCH) weight(type:blob in 4911), product of:
>  0.10613751 = queryWeight(type:blob), product of:
>1.0224196 = idf(docFreq=12084, maxDocs=12359)
>0.10381013 = queryNorm
>  1.0224196 = (MATCH) fieldWeight(type:blob in 4911), product of:
>1.0 = tf(termFreq(type:blob)=1)
>1.0224196 = idf(docFreq=12084, maxDocs=12359)
>1.0 = fieldNorm(field=type, doc=4911)
>  0.75 = coord(3/4)
> 
> −
> 
>
> 1.9903867 = (MATCH) product of:
>  2.653849 = (MATCH) sum of:
>0.74662465 = (MATCH) weight(all_text:excel in 4468), product of:
>  0.7043054 = queryWeight(all_text:excel), product of:
>6.7845535 = idf(docFreq=37, maxDocs=12359)
>0.10381013 = queryNorm
>  1.0600865 = (MATCH) fieldWeight(all_text:excel in 4468), product of:
>1.0 = tf(termFreq(all_text:excel)=1)
>6.7845535 = idf(docFreq=37, maxDocs=12359)
>0.15625 = fieldNorm(field=all_text, doc=4468)
>1.7987071 = (MATCH) weight(presentation_id:€#0;Ħ in 4468), product of:
>  0.43211576 = queryWeight(pres

Solr starting problem

2010-09-20 Thread Yavuz Selim YILMAZ
I use solr in windows without any problem, I 'm trying to run solr in linux,
( copy all files from windows to linux ), but I'm given exceptions when I
try to start solr (java -jar start.jar)

java.lang.ClassNotFoundException: org.mortbay.xml.xmlConfiguration
   at java.net.URLClassLoader.findClass(URLClassLoader.java:378)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:570)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:502)
   at jorg.mortbay.start.Main.start(Main.java:534)
   at jorg.mortbay.start.Main.start(Main.java:441)
   at jorg.mortbay.start.Main.Main(Main.java:119)

I controlled all jar files, problem looks releted with jetty, but I can't
find any solution.

Any ideas?

Thnx.

--

Yavuz Selim YILMAZ


Re: Searching solr with a two word query

2010-09-20 Thread noel
I noticed that my defaultOperator is "OR", and that does have an effect on what 
does come up. If I were to change that to and, it's an exact match to my query, 
but Im would like similar matches with either word as a single result. Is there 
another value I can use? Or maybe I should use another query parser?

Thanks.
- Noel

-Original Message-
From: "Erick Erickson" 
Sent: Monday, September 20, 2010 10:05am
To: solr-user@lucene.apache.org
Subject: Re: Searching solr with a two word query

Here's an excellent description of the Lucene query operators and how they
differ from strict
boolean logic: http://www.gossamer-threads.com/lists/lucene/java-user/47928

But the short
form is that (and boy, doesn't the fact that the URL escaping spaces
as '+', which is also a Lucene operator make looking at these interesting),
is that the
first term is essentially a SHOULD clause in a Lucene BooleanQuery and is
matching your docs all by itself.

HTH
Erick

On Mon, Sep 20, 2010 at 8:58 AM,  wrote:

> Here is my raw query:
> q=opening+excellent+AND+presentation_id%3A294+AND+type%3Ablob&version=1.3&
> json.nl
> =map&rows=10&start=0&wt=xml&hl=true&hl.fl=text&hl.simple.pre=&hl.simple.post=<%2Fspan>&hl.fragsize=0&hl.mergeContiguous=false&debugQuery=on
>
> and here is what I get on the debugQuery:
> 
> −
> 
> opening excellent AND presentation_id:294 AND type:blob
> 
> −
> 
> opening excellent AND presentation_id:294 AND type:blob
> 
> −
> 
> all_text:open +all_text:excel +presentation_id:294 +type:blob
> 
> −
> 
> all_text:open +all_text:excel +presentation_id:€#0;Ħ +type:blob
> 
> −
> 
> −
> 
>
> 3.1143723 = (MATCH) sum of:
>  0.46052343 = (MATCH) weight(all_text:open in 4457), product of:
>0.5531408 = queryWeight(all_text:open), product of:
>  5.3283896 = idf(docFreq=162, maxDocs=12359)
>  0.10381013 = queryNorm
>0.8325609 = (MATCH) fieldWeight(all_text:open in 4457), product of:
>  1.0 = tf(termFreq(all_text:open)=1)
>  5.3283896 = idf(docFreq=162, maxDocs=12359)
>  0.15625 = fieldNorm(field=all_text, doc=4457)
>  0.74662465 = (MATCH) weight(all_text:excel in 4457), product of:
>0.7043054 = queryWeight(all_text:excel), product of:
>  6.7845535 = idf(docFreq=37, maxDocs=12359)
>  0.10381013 = queryNorm
>1.0600865 = (MATCH) fieldWeight(all_text:excel in 4457), product of:
>  1.0 = tf(termFreq(all_text:excel)=1)
>  6.7845535 = idf(docFreq=37, maxDocs=12359)
>  0.15625 = fieldNorm(field=all_text, doc=4457)
>  1.7987071 = (MATCH) weight(presentation_id:€#0;Ħ in 4457), product of:
>0.43211576 = queryWeight(presentation_id:€#0;Ħ), product of:
>  4.1625586 = idf(docFreq=522, maxDocs=12359)
>  0.10381013 = queryNorm
>4.1625586 = (MATCH) fieldWeight(presentation_id:€#0;Ħ in 4457), product
> of:
>  1.0 = tf(termFreq(presentation_id:€#0;Ħ)=1)
>  4.1625586 = idf(docFreq=522, maxDocs=12359)
>  1.0 = fieldNorm(field=presentation_id, doc=4457)
>  0.108517066 = (MATCH) weight(type:blob in 4457), product of:
>0.10613751 = queryWeight(type:blob), product of:
>  1.0224196 = idf(docFreq=12084, maxDocs=12359)
>  0.10381013 = queryNorm
>1.0224196 = (MATCH) fieldWeight(type:blob in 4457), product of:
>  1.0 = tf(termFreq(type:blob)=1)
>  1.0224196 = idf(docFreq=12084, maxDocs=12359)
>  1.0 = fieldNorm(field=type, doc=4457)
> 
> −
> 
>
> 2.06395 = (MATCH) product of:
>  2.7519336 = (MATCH) sum of:
>0.84470934 = (MATCH) weight(all_text:excel in 4911), product of:
>  0.7043054 = queryWeight(all_text:excel), product of:
>6.7845535 = idf(docFreq=37, maxDocs=12359)
>0.10381013 = queryNorm
>  1.199351 = (MATCH) fieldWeight(all_text:excel in 4911), product of:
>1.4142135 = tf(termFreq(all_text:excel)=2)
>6.7845535 = idf(docFreq=37, maxDocs=12359)
>0.125 = fieldNorm(field=all_text, doc=4911)
>1.7987071 = (MATCH) weight(presentation_id:€#0;Ħ in 4911), product of:
>  0.43211576 = queryWeight(presentation_id:€#0;Ħ), product of:
>4.1625586 = idf(docFreq=522, maxDocs=12359)
>0.10381013 = queryNorm
>  4.1625586 = (MATCH) fieldWeight(presentation_id:€#0;Ħ in 4911),
> product of:
>1.0 = tf(termFreq(presentation_id:€#0;Ħ)=1)
>4.1625586 = idf(docFreq=522, maxDocs=12359)
>1.0 = fieldNorm(field=presentation_id, doc=4911)
>0.108517066 = (MATCH) weight(type:blob in 4911), product of:
>  0.10613751 = queryWeight(type:blob), product of:
>1.0224196 = idf(docFreq=12084, maxDocs=12359)
>0.10381013 = queryNorm
>  1.0224196 = (MATCH) fieldWeight(type:blob in 4911), product of:
>1.0 = tf(termFreq(type:blob)=1)
>1.0224196 = idf(docFreq=12084, maxDocs=12359)
>1.0 = fieldNorm(field=type, doc=4911)
>  0.75 = coord(3/4)
> 
> −
> 
>
> 1.9903867 = (MATCH) product of:
>  2.653849 = (MATCH) sum of:
>0.74662465 = (MATCH) weight(all_text:excel in 

solr index different type of xml

2010-09-20 Thread yklxmas

Hello guys,

I need to index 5 different kinds of xml files. They share similar structure
with slight differences in each of them.

example 1:

  
9780815341291
Essential Cell Biology,Third Edition

Alberts;Bruce
Bray;Dennis


SCABC
SCDEF

  
  

123456789
03_Mutations_Origin_Cancer.mp3
audio/mpeg
Part Three - Mutations and the Origin of 
Cancer
123

1


  


example 2:


9780815341291
Essential Cell Biology,Third Edition

FN:Alberts;Bruce
FN:Bray;Dennis


SCABC
SCGHI





123456789
A subunit 
The portion of a bacterial exotoxin that 
interferes with
normal host cell function. 

10





My dih-config.xml is as below:























I'm not quite familiar with xpath. I can't use wildcard in element name, can
I? Tried it and it didn't work. 

Many thanks in advance.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-index-different-type-of-xml-tp1529898p1529898.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr UIMA integration

2010-09-20 Thread Jan Høydahl / Cominvent
Hi Tommaso,

Really cool what you've done. Looking forward to testing it, and I'm sure it's 
a welcome contribution to Solr.
You can easily contribute your code by opening a JIRA issue and attaching a 
patch file.

BTW
Have you considered making the output field names configurable on a per 
instance basis? It could be done as follows:

  concept
  concept
  concept
  ...


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 20. sep. 2010, at 12.35, Tommaso Teofili wrote:

> Hi all,
> I am working on integrating Apache UIMA as un UpdateRequestProcessor for
> Apache Solr and I am now at the first working snapshot.
> I put the code on GoogleCode [1] and you can take a look at the tutorial
> [2].
> 
> I would be glad to donate it to the Apache Solr project, as I think it could
> be a useful module to trigger automatic content extraction while indexing
> documents.
> 
> At the moment the UIMAUpdateRequestProcessor base implementation can
> automatically extract document's sentences, language, keywords, concepts and
> named entities using Apache UIMA's HMMTagger, OpenCalaisAnnotator and
> AlchemyAPIAnnotator components (but it can be easily expanded).
> 
> Any feedback is welcome.
> Have a nice day.
> Tommaso
> 
> [1] : http://code.google.com/p/solr-uima/
> [2] : http://code.google.com/p/solr-uima/wiki/5MinutesTutorial



Re: Restrict possible results based on relational information

2010-09-20 Thread Jan Høydahl / Cominvent
Hi,

You could simply create an autocomplete Solr Core with a simple schema 
consisting of id, from, to:
Let the fieldType of "from" be String, and in the fieldType of "to" you can use 
StandardTokenizer, WordDelimiterFilter and EdgeNGramFilter.


  
john@mycompany.com-jane.doe@mycompany.com
john@mycompany.com
Jane Doe (jane@mycompany.com)
  
  
john@mycompany.com-thomas.doe@mycompany.com
john@mycompany.com
Thomas Doe (thomas@mycompany.com)
  
  
peter@mycompany.com-another.doe@mycompany.com
peter@mycompany.com
Another Doe (another@mycompany.com)
  


Now, if your autocomplete query is like this:
wt=json&fl=to&qf=from:"john@mycompany.com"&q={!q.op=AND df=to}do

Your response will now be a list of valid recepients where the from field is 
current user. By using EdgeNGramFilter in the "to" field, you get the effect of 
an automatic wildcard search since "John Doe" will be indexed as (conceptually) 
"J Jo Joh John D Do Doe"

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 20. sep. 2010, at 12.36, Stefan Matheis wrote:

> Hi List,
> 
> this is my first message on this list, so if there's something
> missing/incorrect, please let me know :)
> 
> the current problem, described in short words followed by an short example,
> is the following one:
> 
> users can send privates messages, the selection of recipients is done via
> auto-complete. therefore we need to restrict the possible results based on
> the users confirmed contacts - but i have absolutely no idea how to do that
> :/ Add all confirmed contacts to the index, and use it like a type of
> relation? pass the list of confirmed contacts together with the query?
> 
> let's say we have "John Doe" which creates a new message. typing "doe"
> should suggest "Jane Doe", "Thomas Doe" - but not "Another Doe", which is
> also a user, but none of his confirmed Contacts. Maybe we get also "John
> Doe" as possible match, but that should be okay in the first place - if we
> could exclude the user himself also, that's of course better.
> 
> every user-record has an id, additional fields for firstname and lastname.
> confirmed contacts are simply explained records with field from:user-id
> to:user-id, actually with no additional information about type of
> relationship or something. but nothing of this relationship-information is
> currently submitted to the solr-index.
> 
> if you need more information to answer this not-very-concrete question (and
> i'm sure, i've missed some relevant info) just ask, please :)
> 
> Regards
> Stefan



Re: Solr UIMA integration

2010-09-20 Thread Dennis Gearon
Looks like a great scraping engine technology :-)
Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Mon, 9/20/10, Tommaso Teofili  wrote:

> From: Tommaso Teofili 
> Subject: Solr UIMA integration
> To: solr-user@lucene.apache.org
> Date: Monday, September 20, 2010, 3:35 AM
> Hi all,
> I am working on integrating Apache UIMA as un
> UpdateRequestProcessor for
> Apache Solr and I am now at the first working snapshot.
> I put the code on GoogleCode [1] and you can take a look at
> the tutorial
> [2].
> 
> I would be glad to donate it to the Apache Solr project, as
> I think it could
> be a useful module to trigger automatic content extraction
> while indexing
> documents.
> 
> At the moment the UIMAUpdateRequestProcessor base
> implementation can
> automatically extract document's sentences, language,
> keywords, concepts and
> named entities using Apache UIMA's HMMTagger,
> OpenCalaisAnnotator and
> AlchemyAPIAnnotator components (but it can be easily
> expanded).
> 
> Any feedback is welcome.
> Have a nice day.
> Tommaso
> 
> [1] : http://code.google.com/p/solr-uima/
> [2] : http://code.google.com/p/solr-uima/wiki/5MinutesTutorial
> 


Re: Calculating distances in Solr using longitude latitude

2010-09-20 Thread Dennis Gearon
Hmmm,
 I am about to put a engineer on our search engine requirements with the 
assumption that latitude/longitude is available in the current release of Solr, 
(not knowing what that is). 

 I have been partitioning the whole Solr thing to him,except enough info 
for me to understand and interface to his work. So, I don't have that 
answer.

 Can someone else answer him?


Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Mon, 9/20/10, PeterKerk  wrote:

> From: PeterKerk 
> Subject: Re: Calculating distances in Solr using longitude latitude
> To: solr-user@lucene.apache.org
> Date: Monday, September 20, 2010, 6:53 AM
> 
> Hi Dennis,
> 
> Good suggestion, but I see that most of that is Solr 4.0
> functionality,
> which has not been released yet.
> How can I still use the longitude latitude functionality
> (LatLonType)?
> 
> Thanks!
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Calculating-distances-in-Solr-using-longitude-latitude-tp1524297p1529097.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
> 


Re: Solr starting problem

2010-09-20 Thread Erick Erickson
Are you trying to implement custom code or is this a stock release?
Because if you're trying to just move a stock release over, it'd be much
simpler to just unpack the distribution (for Linux) on the linux machine
and go. It might be worth doing anyway just to compare the differences
to see what's causing your problem.

But it looks like you're problem is in you Jetty configuration. I'm really
guessing that you can't start your Jetty servlet at all

HTH
Erick

On Mon, Sep 20, 2010 at 11:19 AM, Yavuz Selim YILMAZ <
yvzslmyilm...@gmail.com> wrote:

> I use solr in windows without any problem, I 'm trying to run solr in
> linux,
> ( copy all files from windows to linux ), but I'm given exceptions when I
> try to start solr (java -jar start.jar)
>
> java.lang.ClassNotFoundException: org.mortbay.xml.xmlConfiguration
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:378)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:570)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:502)
>   at jorg.mortbay.start.Main.start(Main.java:534)
>   at jorg.mortbay.start.Main.start(Main.java:441)
>   at jorg.mortbay.start.Main.Main(Main.java:119)
>
> I controlled all jar files, problem looks releted with jetty, but I can't
> find any solution.
>
> Any ideas?
>
> Thnx.
>
> --
>
> Yavuz Selim YILMAZ
>


Re: Searching solr with a two word query

2010-09-20 Thread Erick Erickson
I'm missing what you really want out of your query, your
phrase "either word as a single result" just isn't connecting
in my grey matter.. Could you give some example inputs and
outputs that demonstrates what you want?

Best
Erick

On Mon, Sep 20, 2010 at 11:41 AM,  wrote:

> I noticed that my defaultOperator is "OR", and that does have an effect on
> what does come up. If I were to change that to and, it's an exact match to
> my query, but Im would like similar matches with either word as a single
> result. Is there another value I can use? Or maybe I should use another
> query parser?
>
> Thanks.
> - Noel
>
> -Original Message-
> From: "Erick Erickson" 
> Sent: Monday, September 20, 2010 10:05am
> To: solr-user@lucene.apache.org
> Subject: Re: Searching solr with a two word query
>
> Here's an excellent description of the Lucene query operators and how they
> differ from strict
> boolean logic:
> http://www.gossamer-threads.com/lists/lucene/java-user/47928
>
> But the
> short
> form is that (and boy, doesn't the fact that the URL escaping spaces
> as '+', which is also a Lucene operator make looking at these interesting),
> is that the
> first term is essentially a SHOULD clause in a Lucene BooleanQuery and is
> matching your docs all by itself.
>
> HTH
> Erick
>
> On Mon, Sep 20, 2010 at 8:58 AM,  wrote:
>
> > Here is my raw query:
> >
> q=opening+excellent+AND+presentation_id%3A294+AND+type%3Ablob&version=1.3&
> > json.nl
> >
> =map&rows=10&start=0&wt=xml&hl=true&hl.fl=text&hl.simple.pre=&hl.simple.post=<%2Fspan>&hl.fragsize=0&hl.mergeContiguous=false&debugQuery=on
> >
> > and here is what I get on the debugQuery:
> > 
> > −
> > 
> > opening excellent AND presentation_id:294 AND type:blob
> > 
> > −
> > 
> > opening excellent AND presentation_id:294 AND type:blob
> > 
> > −
> > 
> > all_text:open +all_text:excel +presentation_id:294 +type:blob
> > 
> > −
> > 
> > all_text:open +all_text:excel +presentation_id:€#0;Ħ +type:blob
> > 
> > −
> > 
> > −
> > 
> >
> > 3.1143723 = (MATCH) sum of:
> >  0.46052343 = (MATCH) weight(all_text:open in 4457), product of:
> >0.5531408 = queryWeight(all_text:open), product of:
> >  5.3283896 = idf(docFreq=162, maxDocs=12359)
> >  0.10381013 = queryNorm
> >0.8325609 = (MATCH) fieldWeight(all_text:open in 4457), product of:
> >  1.0 = tf(termFreq(all_text:open)=1)
> >  5.3283896 = idf(docFreq=162, maxDocs=12359)
> >  0.15625 = fieldNorm(field=all_text, doc=4457)
> >  0.74662465 = (MATCH) weight(all_text:excel in 4457), product of:
> >0.7043054 = queryWeight(all_text:excel), product of:
> >  6.7845535 = idf(docFreq=37, maxDocs=12359)
> >  0.10381013 = queryNorm
> >1.0600865 = (MATCH) fieldWeight(all_text:excel in 4457), product of:
> >  1.0 = tf(termFreq(all_text:excel)=1)
> >  6.7845535 = idf(docFreq=37, maxDocs=12359)
> >  0.15625 = fieldNorm(field=all_text, doc=4457)
> >  1.7987071 = (MATCH) weight(presentation_id:€#0;Ħ in 4457), product of:
> >0.43211576 = queryWeight(presentation_id:€#0;Ħ), product of:
> >  4.1625586 = idf(docFreq=522, maxDocs=12359)
> >  0.10381013 = queryNorm
> >4.1625586 = (MATCH) fieldWeight(presentation_id:€#0;Ħ in 4457),
> product
> > of:
> >  1.0 = tf(termFreq(presentation_id:€#0;Ħ)=1)
> >  4.1625586 = idf(docFreq=522, maxDocs=12359)
> >  1.0 = fieldNorm(field=presentation_id, doc=4457)
> >  0.108517066 = (MATCH) weight(type:blob in 4457), product of:
> >0.10613751 = queryWeight(type:blob), product of:
> >  1.0224196 = idf(docFreq=12084, maxDocs=12359)
> >  0.10381013 = queryNorm
> >1.0224196 = (MATCH) fieldWeight(type:blob in 4457), product of:
> >  1.0 = tf(termFreq(type:blob)=1)
> >  1.0224196 = idf(docFreq=12084, maxDocs=12359)
> >  1.0 = fieldNorm(field=type, doc=4457)
> > 
> > −
> > 
> >
> > 2.06395 = (MATCH) product of:
> >  2.7519336 = (MATCH) sum of:
> >0.84470934 = (MATCH) weight(all_text:excel in 4911), product of:
> >  0.7043054 = queryWeight(all_text:excel), product of:
> >6.7845535 = idf(docFreq=37, maxDocs=12359)
> >0.10381013 = queryNorm
> >  1.199351 = (MATCH) fieldWeight(all_text:excel in 4911), product of:
> >1.4142135 = tf(termFreq(all_text:excel)=2)
> >6.7845535 = idf(docFreq=37, maxDocs=12359)
> >0.125 = fieldNorm(field=all_text, doc=4911)
> >1.7987071 = (MATCH) weight(presentation_id:€#0;Ħ in 4911), product of:
> >  0.43211576 = queryWeight(presentation_id:€#0;Ħ), product of:
> >4.1625586 = idf(docFreq=522, maxDocs=12359)
> >0.10381013 = queryNorm
> >  4.1625586 = (MATCH) fieldWeight(presentation_id:€#0;Ħ in 4911),
> > product of:
> >1.0 = tf(termFreq(presentation_id:€#0;Ħ)=1)
> >4.1625586 = idf(docFreq=522, maxDocs=12359)
> >1.0 = fieldNorm(field=presentation_id, doc=4911)
> >0.108517066 = (MATCH) weight(type:blob in 4911), product 

logging for solr

2010-09-20 Thread Christopher Gross
I'm running an old version of Solr (1.2) on Apache Tomcat 5.5.25.
Right now the logs all go to the catalina.out file, which has been
growing rather large.  I have to shut down the servers periodically to
clear out that logfile because it keeps getting large and giving disk
space warnings.

I've tried looking around for instructions on configuring the logging
for Solr, but I'm not having much luck.  Can someone please point me
in the right direction to set up the logging for Solr?  If I can get
it into rolling logfiles, I can just have a cron job take out the old
ones and not have to restart to do cleanup.

Please don't tell me to upgrade the software -- it is not an option at
this point.  I'm sure that the latest versions have it working better,
but right now I am unable to upgrade Solr or Tomcat to new versions.

Thanks!

-- Chris


Re: Searching solr with a two word query

2010-09-20 Thread noel
Say if I had a two word query that was "opening excellent", I would like it to 
return something like:

opening excellent
opening
opening
opening
excellent
excellent
excellent

Instead of:
opening excellent
excellent
excellent
excellent

If I did a search, I would like the first word alone to also show up in the 
results, because currently my results show both words in one result and only 
the second word for the rest of the results. I've done a search on each word by 
itself, and there are results for them.

Thanks.

-Original Message-
From: "Erick Erickson" 
Sent: Monday, September 20, 2010 2:37pm
To: solr-user@lucene.apache.org
Subject: Re: Searching solr with a two word query

I'm missing what you really want out of your query, your
phrase "either word as a single result" just isn't connecting
in my grey matter.. Could you give some example inputs and
outputs that demonstrates what you want?

Best
Erick

On Mon, Sep 20, 2010 at 11:41 AM,  wrote:

> I noticed that my defaultOperator is "OR", and that does have an effect on
> what does come up. If I were to change that to and, it's an exact match to
> my query, but Im would like similar matches with either word as a single
> result. Is there another value I can use? Or maybe I should use another
> query parser?
>
> Thanks.
> - Noel
>
> -Original Message-
> From: "Erick Erickson" 
> Sent: Monday, September 20, 2010 10:05am
> To: solr-user@lucene.apache.org
> Subject: Re: Searching solr with a two word query
>
> Here's an excellent description of the Lucene query operators and how they
> differ from strict
> boolean logic:
> http://www.gossamer-threads.com/lists/lucene/java-user/47928
>
> But the
> short
> form is that (and boy, doesn't the fact that the URL escaping spaces
> as '+', which is also a Lucene operator make looking at these interesting),
> is that the
> first term is essentially a SHOULD clause in a Lucene BooleanQuery and is
> matching your docs all by itself.
>
> HTH
> Erick
>
> On Mon, Sep 20, 2010 at 8:58 AM,  wrote:
>
> > Here is my raw query:
> >
> q=opening+excellent+AND+presentation_id%3A294+AND+type%3Ablob&version=1.3&
> > json.nl
> >
> =map&rows=10&start=0&wt=xml&hl=true&hl.fl=text&hl.simple.pre=&hl.simple.post=<%2Fspan>&hl.fragsize=0&hl.mergeContiguous=false&debugQuery=on
> >
> > and here is what I get on the debugQuery:
> > 
> > −
> > 
> > opening excellent AND presentation_id:294 AND type:blob
> > 
> > −
> > 
> > opening excellent AND presentation_id:294 AND type:blob
> > 
> > −
> > 
> > all_text:open +all_text:excel +presentation_id:294 +type:blob
> > 
> > −
> > 
> > all_text:open +all_text:excel +presentation_id:€#0;Ħ +type:blob
> > 
> > −
> > 
> > −
> > 
> >
> > 3.1143723 = (MATCH) sum of:
> >  0.46052343 = (MATCH) weight(all_text:open in 4457), product of:
> >0.5531408 = queryWeight(all_text:open), product of:
> >  5.3283896 = idf(docFreq=162, maxDocs=12359)
> >  0.10381013 = queryNorm
> >0.8325609 = (MATCH) fieldWeight(all_text:open in 4457), product of:
> >  1.0 = tf(termFreq(all_text:open)=1)
> >  5.3283896 = idf(docFreq=162, maxDocs=12359)
> >  0.15625 = fieldNorm(field=all_text, doc=4457)
> >  0.74662465 = (MATCH) weight(all_text:excel in 4457), product of:
> >0.7043054 = queryWeight(all_text:excel), product of:
> >  6.7845535 = idf(docFreq=37, maxDocs=12359)
> >  0.10381013 = queryNorm
> >1.0600865 = (MATCH) fieldWeight(all_text:excel in 4457), product of:
> >  1.0 = tf(termFreq(all_text:excel)=1)
> >  6.7845535 = idf(docFreq=37, maxDocs=12359)
> >  0.15625 = fieldNorm(field=all_text, doc=4457)
> >  1.7987071 = (MATCH) weight(presentation_id:€#0;Ħ in 4457), product of:
> >0.43211576 = queryWeight(presentation_id:€#0;Ħ), product of:
> >  4.1625586 = idf(docFreq=522, maxDocs=12359)
> >  0.10381013 = queryNorm
> >4.1625586 = (MATCH) fieldWeight(presentation_id:€#0;Ħ in 4457),
> product
> > of:
> >  1.0 = tf(termFreq(presentation_id:€#0;Ħ)=1)
> >  4.1625586 = idf(docFreq=522, maxDocs=12359)
> >  1.0 = fieldNorm(field=presentation_id, doc=4457)
> >  0.108517066 = (MATCH) weight(type:blob in 4457), product of:
> >0.10613751 = queryWeight(type:blob), product of:
> >  1.0224196 = idf(docFreq=12084, maxDocs=12359)
> >  0.10381013 = queryNorm
> >1.0224196 = (MATCH) fieldWeight(type:blob in 4457), product of:
> >  1.0 = tf(termFreq(type:blob)=1)
> >  1.0224196 = idf(docFreq=12084, maxDocs=12359)
> >  1.0 = fieldNorm(field=type, doc=4457)
> > 
> > −
> > 
> >
> > 2.06395 = (MATCH) product of:
> >  2.7519336 = (MATCH) sum of:
> >0.84470934 = (MATCH) weight(all_text:excel in 4911), product of:
> >  0.7043054 = queryWeight(all_text:excel), product of:
> >6.7845535 = idf(docFreq=37, maxDocs=12359)
> >0.10381013 = queryNorm
> >  1.199351 = (MATCH) fieldWeight(all_text:excel in 4911), product of:
> >1.4142135 = tf

RE: Re: Calculating distances in Solr using longitude latitude

2010-09-20 Thread Markus Jelsma
Hi,

 

In the early Solr 1.3 times we had an index with leisure-time objects that 
included geographical coordinates. Based on certain conditions we had to 
display a specific list of nearby objects. We simply implemented some Great 
Circle calculations such as the distance between points [1] and aggregated 
nearby objects and sent then to our index. The drawback is that for each 
addition to the index, you'd have to recalculate all other nearby objects, that 
takes a while. The good thing is, in production, the system isn't slowed down 
by these calculations so it's very fast.

 

[1]: http://williams.best.vwh.net/avform.htm#Dist

 

Cheers,
 
-Original message-
From: Dennis Gearon 
Sent: Mon 20-09-2010 19:42
To: solr-user@lucene.apache.org; 
Subject: Re: Calculating distances in Solr using longitude latitude

Hmmm,
    I am about to put a engineer on our search engine requirements with the 
assumption that latitude/longitude is available in the current release of Solr, 
(not knowing what that is). 

    I have been partitioning the whole Solr thing to him,except enough info for 
me to understand and interface to his work. So, I don't have that 
answer.

    Can someone else answer him?


Dennis Gearon

Signature Warning

EARTH has a Right To Life,
 otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Mon, 9/20/10, PeterKerk  wrote:

> From: PeterKerk 
> Subject: Re: Calculating distances in Solr using longitude latitude
> To: solr-user@lucene.apache.org
> Date: Monday, September 20, 2010, 6:53 AM
> 
> Hi Dennis,
> 
> Good suggestion, but I see that most of that is Solr 4.0
> functionality,
> which has not been released yet.
> How can I still use the longitude latitude functionality
> (LatLonType)?
> 
> Thanks!
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Calculating-distances-in-Solr-using-longitude-latitude-tp1524297p1529097.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
> 


Re: Solr for statistical data

2010-09-20 Thread Alexander Kanarsky
Set up your JVM to produce the heap dumps in case of OOM and try to
analyze them with a profiler like YourKit. This could give you some
ideas on what takes memory and what potentially could be reduced.
Sometimes the cache settings could be adjusted without significant
performance toll etc. See what on query side and on indexing side
could be downsized. In some case you might need to modify the Lucene
source code to adjust the internal cache I/O buffers size, for
example. But look for low hanging fruits first. Use 32-bit JVM if
possible, of course.

-Alexander


On Mon, Sep 20, 2010 at 5:58 AM, Kjetil Ødegaard
 wrote:
> On Thu, Sep 16, 2010 at 11:48 AM, Peter Karich  wrote:
>
>> Hi Kjetil,
>>
>> is this custom component (which performes groub by + calcs stats)
>> somewhere available?
>> I would like to do something similar. Would you mind to share if it
>> isn't already available?
>>
>> The grouping stuff sounds similar to
>> https://issues.apache.org/jira/browse/SOLR-236
>>
>> where you can have mem problems too ;-) or see:
>> https://issues.apache.org/jira/browse/SOLR-1682
>>
>>
> Thanks for the links! These patches seem to provide somewhat similar
> functionality, I'll investigate if they're implemented in a similar way too.
>
> We've developed this component for a client, so while I'd like to share it I
> can't make any promises. Sorry.
>
>
>> > Any tips or similar experiences?
>>
>> you want to decrease memory usage?
>
>
> Yes. Specifically, I would like to keep the heap at 4 GB. Unfortunately I'm
> still seeing some OutOfMemoryErrors so I might have to up the heap size
> again.
>
> I guess what I'm really wondering is if there's a way to keep memory use
> down, while at the same time not sacrificing the performance of our queries.
> The queries have to run through all values for a field in order to calculate
> the sum, so it's not enough to just cache a few values.
>
> The code which fetches values from the index uses
> FieldCache.DEFAULT.getStringIndex for a field, and then indexes like this:
>
> FieldType fieldType = searcher.getSchema().getFieldType(fieldName);
> fieldType.indexedToReadable(stringIndex.lookup[stringIndex.order[documentId]]);
>
> Is there a better way to do this? Thanks.
>
>
> ---Kjetil
>


Re: logging for solr

2010-09-20 Thread Jak Akdemir
It is quite easy to modify its default value. Solr is using default
logging values that started to use in jvm. It can be bound as a start
parameter or can be externally defined in
../tomcat/conf/logging.properties.

Simply it is enough to remove all contents (backup first) in
../tomcat/conf/logging.properties and write .level = SEVERE

This change will make root checkbox from unset to severe. Of course
you can switch it to WARNING or INFO too.

You can observe changes from http://localhost:8080/solr/admin/logging
or simply ~/admin/logging  pages.

Details are here:

http://wiki.apache.org/tomcat/Logging_Tutorial

http://tomcat.apache.org/tomcat-6.0-doc/logging.html

Jak

On Mon, Sep 20, 2010 at 10:32 PM, Christopher Gross  wrote:
>
> I'm running an old version of Solr (1.2) on Apache Tomcat 5.5.25.
> Right now the logs all go to the catalina.out file, which has been
> growing rather large.  I have to shut down the servers periodically to
> clear out that logfile because it keeps getting large and giving disk
> space warnings.
>
> I've tried looking around for instructions on configuring the logging
> for Solr, but I'm not having much luck.  Can someone please point me
> in the right direction to set up the logging for Solr?  If I can get
> it into rolling logfiles, I can just have a cron job take out the old
> ones and not have to restart to do cleanup.
>
> Please don't tell me to upgrade the software -- it is not an option at
> this point.  I'm sure that the latest versions have it working better,
> but right now I am unable to upgrade Solr or Tomcat to new versions.
>
> Thanks!
>
> -- Chris


Re: logging for solr

2010-09-20 Thread Christopher Gross
Thanks Jak!  That was just what I was looking for!

-- Chris



On Mon, Sep 20, 2010 at 4:25 PM, Jak Akdemir  wrote:
> It is quite easy to modify its default value. Solr is using default
> logging values that started to use in jvm. It can be bound as a start
> parameter or can be externally defined in
> ../tomcat/conf/logging.properties.
>
> Simply it is enough to remove all contents (backup first) in
> ../tomcat/conf/logging.properties and write .level = SEVERE
>
> This change will make root checkbox from unset to severe. Of course
> you can switch it to WARNING or INFO too.
>
> You can observe changes from http://localhost:8080/solr/admin/logging
> or simply ~/admin/logging  pages.
>
> Details are here:
>
> http://wiki.apache.org/tomcat/Logging_Tutorial
>
> http://tomcat.apache.org/tomcat-6.0-doc/logging.html
>
> Jak
>
> On Mon, Sep 20, 2010 at 10:32 PM, Christopher Gross  wrote:
>>
>> I'm running an old version of Solr (1.2) on Apache Tomcat 5.5.25.
>> Right now the logs all go to the catalina.out file, which has been
>> growing rather large.  I have to shut down the servers periodically to
>> clear out that logfile because it keeps getting large and giving disk
>> space warnings.
>>
>> I've tried looking around for instructions on configuring the logging
>> for Solr, but I'm not having much luck.  Can someone please point me
>> in the right direction to set up the logging for Solr?  If I can get
>> it into rolling logfiles, I can just have a cron job take out the old
>> ones and not have to restart to do cleanup.
>>
>> Please don't tell me to upgrade the software -- it is not an option at
>> this point.  I'm sure that the latest versions have it working better,
>> but right now I am unable to upgrade Solr or Tomcat to new versions.
>>
>> Thanks!
>>
>> -- Chris
>


Re: Searching solr with a two word query

2010-09-20 Thread Tom Hill
It will probably be clearer if you don't use the pseudo-boolean
operators, and just use + for required terms.

If you look at your output from debug, you see your query becomes:

    all_text:open +all_text:excel +presentation_id:294 +type:blob

Note that "all_text:open" does not have a + sign, but
"all_text:excel" has one. So "all_text:open" is not required, but
"all_text:excel" is.

I think this is because AND marks both of its operands as required.
(which puts the + on +"all_text:excel"), but the open has no explicit
op, so it uses OR, which marks that term as optional.

What I would suggest you do is:

   opening excellent +presentation_id:294 +type:blob

Which is think is much clearer.

I think you could also do
  opening excellent presentation_id:294 AND type:blob
but I think it's  non-obvious how the result will differ from
  opening excellent AND presentation_id:294 AND type:blob
So I wouldn't use either of the last two.


Tom
p.s. Not sure what is going on with the last lines of your debug
output for the query. Is that really what shows up after presentation
ID? I see Euro, hash mark, zero, semi-colon, and "H with stroke"


all_text:open +all_text:excel +presentation_id:€#0;Ħ +type:blob


On Mon, Sep 20, 2010 at 12:46 PM,  wrote:
>
> Say if I had a two word query that was "opening excellent", I would like it 
> to return something like:
>
> opening excellent
> opening
> opening
> opening
> excellent
> excellent
> excellent
>
> Instead of:
> opening excellent
> excellent
> excellent
> excellent
>
> If I did a search, I would like the first word alone to also show up in the 
> results, because currently my results show both words in one result and only 
> the second word for the rest of the results. I've done a search on each word 
> by itself, and there are results for them.
>
> Thanks.
>
> -Original Message-
> From: "Erick Erickson" 
> Sent: Monday, September 20, 2010 2:37pm
> To: solr-user@lucene.apache.org
> Subject: Re: Searching solr with a two word query
>
> I'm missing what you really want out of your query, your
> phrase "either word as a single result" just isn't connecting
> in my grey matter.. Could you give some example inputs and
> outputs that demonstrates what you want?
>
> Best
> Erick
>
> On Mon, Sep 20, 2010 at 11:41 AM,  wrote:
>
> > I noticed that my defaultOperator is "OR", and that does have an effect on
> > what does come up. If I were to change that to and, it's an exact match to
> > my query, but Im would like similar matches with either word as a single
> > result. Is there another value I can use? Or maybe I should use another
> > query parser?
> >
> > Thanks.
> > - Noel
> >
> > -Original Message-
> > From: "Erick Erickson" 
> > Sent: Monday, September 20, 2010 10:05am
> > To: solr-user@lucene.apache.org
> > Subject: Re: Searching solr with a two word query
> >
> > Here's an excellent description of the Lucene query operators and how they
> > differ from strict
> > boolean logic:
> > http://www.gossamer-threads.com/lists/lucene/java-user/47928
> >
> > But the
> > short
> > form is that (and boy, doesn't the fact that the URL escaping spaces
> > as '+', which is also a Lucene operator make looking at these interesting),
> > is that the
> > first term is essentially a SHOULD clause in a Lucene BooleanQuery and is
> > matching your docs all by itself.
> >
> > HTH
> > Erick
> >
> > On Mon, Sep 20, 2010 at 8:58 AM,  wrote:
> >
> > > Here is my raw query:
> > >
> > q=opening+excellent+AND+presentation_id%3A294+AND+type%3Ablob&version=1.3&
> > > json.nl
> > >
> > =map&rows=10&start=0&wt=xml&hl=true&hl.fl=text&hl.simple.pre=&hl.simple.post=<%2Fspan>&hl.fragsize=0&hl.mergeContiguous=false&debugQuery=on
> > >
> > > and here is what I get on the debugQuery:
> > > 
> > > −
> > > 
> > > opening excellent AND presentation_id:294 AND type:blob
> > > 
> > > −
> > > 
> > > opening excellent AND presentation_id:294 AND type:blob
> > > 
> > > −
> > > 
> > > all_text:open +all_text:excel +presentation_id:294 +type:blob
> > > 
> > > −
> > > 
> > > all_text:open +all_text:excel +presentation_id:€#0;Ħ +type:blob
> > > 
> > > −
> > > 
> > > −
> > > 
> > >
> > > 3.1143723 = (MATCH) sum of:
> > >  0.46052343 = (MATCH) weight(all_text:open in 4457), product of:
> > >    0.5531408 = queryWeight(all_text:open), product of:
> > >      5.3283896 = idf(docFreq=162, maxDocs=12359)
> > >      0.10381013 = queryNorm
> > >    0.8325609 = (MATCH) fieldWeight(all_text:open in 4457), product of:
> > >      1.0 = tf(termFreq(all_text:open)=1)
> > >      5.3283896 = idf(docFreq=162, maxDocs=12359)
> > >      0.15625 = fieldNorm(field=all_text, doc=4457)
> > >  0.74662465 = (MATCH) weight(all_text:excel in 4457), product of:
> > >    0.7043054 = queryWeight(all_text:excel), product of:
> > >      6.7845535 = idf(docFreq=37, maxDocs=12359)
> > >      0.10381013 = queryNorm
> > >    1.0600865 = (MATCH) fieldWeight(all_text:excel in 44

RE: Re: Calculating distances in Solr using longitude latitude

2010-09-20 Thread Dennis Gearon
You know, if there were some sort of hexagonal/pentagonal, soccer ball 
coordinate system for the Earth, all you'd need is an entry's distance to each 
of the 6/5 facets of the cell it was in, the distance between any two facets, 
and the distance to the endpoint to all it's facets. A giant table of 
precomputed distances, or some numbering system of coordinates that 
automatically gave the two facets and the distance between the faces would be 
even better.

Then just look up the distances and add them.

Still waiting for the coordinate system though :-). If one could get it to 10 
meters resolution, wow.




Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Mon, 9/20/10, Markus Jelsma  wrote:

> From: Markus Jelsma 
> Subject: RE: Re: Calculating distances in Solr using longitude latitude
> To: solr-user@lucene.apache.org
> Date: Monday, September 20, 2010, 1:00 PM
> Hi,
> 
>  
> 
> In the early Solr 1.3 times we had an index with
> leisure-time objects that included geographical coordinates.
> Based on certain conditions we had to display a specific
> list of nearby objects. We simply implemented some Great
> Circle calculations such as the distance between points [1]
> and aggregated nearby objects and sent then to our index.
> The drawback is that for each addition to the index, you'd
> have to recalculate all other nearby objects, that takes a
> while. The good thing is, in production, the system isn't
> slowed down by these calculations so it's very fast.
> 
>  
> 
> [1]: http://williams.best.vwh.net/avform.htm#Dist

> 
>  
> 
> Cheers,
>  
> -Original message-
> From: Dennis Gearon 
> Sent: Mon 20-09-2010 19:42
> To: solr-user@lucene.apache.org;
> 
> Subject: Re: Calculating distances in Solr using longitude
> latitude
> 
> Hmmm,
>     I am about to put a engineer on our search engine
> requirements with the assumption that latitude/longitude is
> available in the current release of Solr, (not knowing what
> that is). 
> 
>     I have been partitioning the whole Solr thing to
> him,except enough info for me to understand and interface to
> his work. So, I don't have that answer.
> 
>     Can someone else answer him?
> 
> 
> Dennis Gearon
> 
> Signature Warning
> 
> EARTH has a Right To Life,
>  otherwise we all die.
> 
> Read 'Hot, Flat, and Crowded'
> Laugh at http://www.yert.com/film.php

> 
> 
> --- On Mon, 9/20/10, PeterKerk 
> wrote:
> 
> > From: PeterKerk 
> > Subject: Re: Calculating distances in Solr using
> longitude latitude
> > To: solr-user@lucene.apache.org
> > Date: Monday, September 20, 2010, 6:53 AM
> > 
> > Hi Dennis,
> > 
> > Good suggestion, but I see that most of that is Solr
> 4.0
> > functionality,
> > which has not been released yet.
> > How can I still use the longitude latitude
> functionality
> > (LatLonType)?
> > 
> > Thanks!
> > -- 
> > View this message in context: 
> > http://lucene.472066.n3.nabble.com/Calculating-distances-in-Solr-using-longitude-latitude-tp1524297p1529097.html

> > Sent from the Solr - User mailing list archive at
> > Nabble.com.
> > 
>


Re: Calculating distances in Solr using longitude latitude

2010-09-20 Thread Lance Norskog
There is a third-party add-on for Solr 1.4 called LocalSolr. It has a 
different API than the upcoming SpatialSearch stuff, and will probably 
not live on in future releases.


The LatLonType stuff is definitely only on the trunk, not even 3.x.

PeterKerk wrote:

Hi Dennis,

Good suggestion, but I see that most of that is Solr 4.0 functionality,
which has not been released yet.
How can I still use the longitude latitude functionality (LatLonType)?

Thanks!
   


Re: Solr for statistical data

2010-09-20 Thread Lance Norskog

Does this do what you want?

http://wiki.apache.org/solr/StatsComponent

I can see that "group by" is a possible enhancement to this component.

Kjetil Ødegaard wrote:

Hi all,


we're currently using Solr 1.4.0 in a project for statistical data, where we
group and sum a number of "double" values. Probably not what most people use
Solr for, but it seems to be working fine for us :-)


We do have some challenges, especially with memory use, so I thought I'd
check here if anybody has done something similar.


Some details:


- The index is currently around 30 GB and growing. The data is indexed
directly from a database, each row ends up as a document. I think we have
around 100 million documents now, the largest core is about 40 million. The
data is split in different cores for different statistics data.


- Heap size is currently 4 GB. We're currently running all the cores in a
single JVM on WebSphere (WAS) 6.1. We have a couple of GB left for OS disk
cache. Initially we used a 1 GB heap, so we had to split cores in different
shards in order to avoid OutOfMemoryErrors because of the FieldCache (I
think).


- The grouping is done by a custom Solr component which takes parameters
that specify which fields to group by (like in SQL) and sums up values for
the group. This uses the FieldCache for speedy retrieval. We did a PoC on
using Documents instead, but this seemed to go a lot slower. I've done a
memory dump and the combined FieldCache looks to be about 3 GB (taken with a
grain of salt since I'm not sure all the data was cached).


I guess this is different from normal Solr searches since we have to process
all the documents in a core in order to calculate results, we can't just
return the first 10 (or whatever) documents.


Any tips or similar experiences?



---Kjetil

   


Re: Solr Analyzer results before the actual query.

2010-09-20 Thread Lance Norskog
Yes. Look at the jsp page solr/admin/analysis.jsp . This does calls to 
Solr which do exactly what you want. They use the AnalysisComponent.


Lance

zackko wrote:

Hi to all the Forum from a new subscriber,

I’m working on the Server Side Search solution of the Company when I’m
currently employed with. I have a problem at the moment: When I will submit
a search to Solr I want to see the “Analyzer results”, with all the Filter
applied to it as defined into the types.xml, of the search terms (Query)
submitted to the Analyzer itself. The result of the Analyzer I want to have
displayed BEFORE the actual search will be performed so I can decide at this
point if I can run the proper search or leave the user with no results on
the search performed.

The problem is more less described in that issue
https://issues.apache.org/jira/browse/SOLR-261. In summary is that possible
to have the Analyzer results (in code) before running the actual Sorl
search?

I'm quite new to Solr so maybe this issue has been already discussed in
another thread but I'm unable to find it at the moment, so if anybody has a
any clue on how to do that please any suggestion will be more than welcome.

Thanks very much in advance for your answer.

Best wishes.

   


Re: Calculating distances in Solr using longitude latitude

2010-09-20 Thread Dennis Gearon
What's the timeline on that?

For now, we write our own functions and sort by them?

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Mon, 9/20/10, Lance Norskog  wrote:

> From: Lance Norskog 
> Subject: Re: Calculating distances in Solr using longitude latitude
> To: solr-user@lucene.apache.org
> Date: Monday, September 20, 2010, 9:40 PM
> There is a third-party add-on for
> Solr 1.4 called LocalSolr. It has a different API than the
> upcoming SpatialSearch stuff, and will probably not live on
> in future releases.
> 
> The LatLonType stuff is definitely only on the trunk, not
> even 3.x.
> 
> PeterKerk wrote:
> > Hi Dennis,
> > 
> > Good suggestion, but I see that most of that is Solr
> 4.0 functionality,
> > which has not been released yet.
> > How can I still use the longitude latitude
> functionality (LatLonType)?
> > 
> > Thanks!
> >