date:20061211

highlighting phrasal hits

2006-12-11 Thread Edward Garrett


hello,

i'm doing phrasal searches, and am not happy with how highlighting is done
by default.

if i search for something, like "w1 w2 w3", then correctly, only fields that
match perfectly will be found. however, when i specify highlighting with
hl=true&hl.fl=myfield, then two things don't work according to (my)
expectations:

1) "w1 w2 w3" is not highlighted as a whole, but rather the pieces are
highlighted. e.g. w1 w2 w3. really, the whole
thing should be contained within a single  element.

2) relatedly, and presumably for the same reason, all instances of "w1",
"w2" and "w3" in myfield are highlighted, even when they don't occur
together.

i can't see any possible reason for things working this way, but perhaps
SOLR is just following lucene here.

any thoughts appreciated,
edward

p.s. haven't actually tested the above against indexed english data, so it's
possible that it's an artifact of the data and analysis procedures i am
using.
--
Edward Garrett

Visiting Fellow (2006-07)
Endangered Languages Academic Programme
School of Oriental and African Studies
London, UK
0207 898 4536

Assistant Professor, Linguistics Program
Eastern Michigan University
612 Pray-Harrold Building
Ypsilanti, MI, USA

New SOLR installation problems

2006-12-11 Thread Andrew Nagy

I installed the 12-8 snapshot of solr on my 64bit RH AS server and 
whenever I go to the admin page I get the following error:


SEVERE: Servlet.service() for servlet jsp threw exception
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.solr.core.SolrCore


Any ideas as to what is causing this?

Thanks
Andrew

Re: New SOLR installation problems

2006-12-11 Thread Yonik Seeley


On 12/11/06, Andrew Nagy <[EMAIL PROTECTED]> wrote:

I installed the 12-8 snapshot of solr on my 64bit RH AS server and
whenever I go to the admin page I get the following error:

SEVERE: Servlet.service() for servlet jsp threw exception
java.lang.NoClassDefFoundError: Could not initialize class
org.apache.solr.core.SolrCore

Any ideas as to what is causing this?


Look through the logs of whatever servlet container you are using for
the first exception thrown.  It most likely has something to do with
not being able to find the solr config files.

-Yonik

Re: New SOLR installation problems

2006-12-11 Thread Andrew Nagy

Thanks Yonik for the reply. 
I am using tomcat, and there is nothing in the catalina.out file.  The 
access log just reports the same error I see in the browser which is 
reported below.
I am starting tomcat from my solr directory which has the data, bin and 
conf directories as subdirectories.  So the CWD should be correct.  I 
have this same setup on another server that I have been working on with 
no problem.  Im kinda lost with this one.

Is their a setting in the solrconfig.xml file that I should be looking at?

Andrew

Yonik Seeley wrote:


On 12/11/06, Andrew Nagy <[EMAIL PROTECTED]> wrote:


I installed the 12-8 snapshot of solr on my 64bit RH AS server and
whenever I go to the admin page I get the following error:

SEVERE: Servlet.service() for servlet jsp threw exception
java.lang.NoClassDefFoundError: Could not initialize class
org.apache.solr.core.SolrCore

Any ideas as to what is causing this?



Look through the logs of whatever servlet container you are using for
the first exception thrown.  It most likely has something to do with
not being able to find the solr config files.

-Yonik

Re: New SOLR installation problems

2006-12-11 Thread Andrew Nagy

Nevermind, I got it working now.  Had the paths setup incorrectly. 
Dumb++


Andrew

Andrew Nagy wrote:

Thanks Yonik for the reply. I am using tomcat, and there is nothing in 
the catalina.out file.  The access log just reports the same error I 
see in the browser which is reported below.
I am starting tomcat from my solr directory which has the data, bin 
and conf directories as subdirectories.  So the CWD should be 
correct.  I have this same setup on another server that I have been 
working on with no problem.  Im kinda lost with this one.
Is their a setting in the solrconfig.xml file that I should be looking 
at?


Andrew

Yonik Seeley wrote:


On 12/11/06, Andrew Nagy <[EMAIL PROTECTED]> wrote:


I installed the 12-8 snapshot of solr on my 64bit RH AS server and
whenever I go to the admin page I get the following error:

SEVERE: Servlet.service() for servlet jsp threw exception
java.lang.NoClassDefFoundError: Could not initialize class
org.apache.solr.core.SolrCore

Any ideas as to what is causing this?




Look through the logs of whatever servlet container you are using for
the first exception thrown.  It most likely has something to do with
not being able to find the solr config files.

-Yonik

multiple collections

2006-12-11 Thread Andrew Nagy

I was wondering how I might create multiple collections that have 
different field sets under solr.  Would I have to have multiple 
implementations of solr running, or can I have more than one schema.xml 
file per "collection" ?


Thanks
Andrew

Re: Top Searches

2006-12-11 Thread Yonik Seeley


On 12/11/06, sangraal aiken <[EMAIL PROTECTED]> wrote:

I'm looking into creating something to track the top 10 - 20 searches that
run through Solr for a given period.


For offline processing, using log files is the simplest thing... the
code remains separated, you can do historical processing if you keep
the logs, and it doesn't affect live queries.

It depends on how fresh the info needs to be and how it will be used.

-Yonik

Re: Top Searches

2006-12-11 Thread sangraal aiken

That's a great idea, thanks Yonik.

-Sangraal

On 12/11/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:

On 12/11/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> I'm looking into creating something to track the top 10 - 20 searches
that
> run through Solr for a given period.

For offline processing, using log files is the simplest thing... the
code remains separated, you can do historical processing if you keep
the logs, and it doesn't affect live queries.

It depends on how fresh the info needs to be and how it will be used.

-Yonik

Re: multiple collections

2006-12-11 Thread Chris Hostetter


: different field sets under solr.  Would I have to have multiple
: implementations of solr running, or can I have more than one schema.xml
: file per "collection" ?

currently the only supported way to do this is run multiple isntances of
the solr.war ... if you look at the various container specific wiki pages,
they each have information on how to run "Multiple Solr Webapps" in a
single container instance...

http://wiki.apache.org/solr/?action=fullsearch&value=Multiple+Solr+Webapps&fullsearch=Text



-Hoss

How to query a parent child relationship returning result set of parents?

2006-12-11 Thread Eric Van Dewoestine


We are currently using solr to index various types of content in our
system, several of which allow users to comment on.  What we would
like to do is issue a query on the top level content which also
searches the attached comments but only returns unique top level
documents as results, while still maintaining the option to search and
return comments as an alternative type of search for the user.

The simplest example would probably be that of a blog.  The blog could
be indexed as follows:

id: blog_intId
title: blog title
content: blog content

And the associated comments:

id: comment_intId
title: comment title
content: comment content
parentId: blog_intId

Given this type of layout, how would I go about querying and returning
a list of blogs which contain text in either the blog content or any
of the comments' content?

The only solutions I can come up with would be to:
1) aggregate comment content into the blog content index, allowing me
to query directly on the blog.  However we are expecting the site to
generate many comments, along the lines of hundreds and possibly
thousands.  This also has the downside of requiring duplicate content
in the index if we want to still permit users to search on and return
comments.

2) Use facets to get a list of parent items and issue an additional
query (or hit the database) to pull in the parent content.  Again,
this isn't an ideal solution since we would have to page the results
ourselves since solr's facet parameters don't support an offset.  This
possibly negates any optimizations solr may have for paging regular
queries.  Also, it forces us to issue a second round trip to either
solr or the database to get summary content to display in the search
results list.  It also seems like a poor use case for the facet
functionality in general.

3) Plug into the solr code and implement a custom request handler,
HitCollector, or ...?  I've spent some time digging into the solr code
and I don't see any obvious place to plug this type of functionality
in.  A major concern of mine is performance as well, so I want to
ensure that I can get at and modify the results prior to solr loading
any unnecessary content into memory.

Any thoughts on this are very appreciated.  Any kind of kick start,
pointer, or places to dig into would be very helpful.

--
eric

Re: highlighting phrasal hits

2006-12-11 Thread Mike Klaas


On 12/11/06, Edward Garrett <[EMAIL PROTECTED]> wrote:

hello,

i'm doing phrasal searches, and am not happy with how highlighting is done
by default.

if i search for something, like "w1 w2 w3", then correctly, only fields that
match perfectly will be found. however, when i specify highlighting with
hl=true&hl.fl=myfield, then two things don't work according to (my)
expectations:

1) "w1 w2 w3" is not highlighted as a whole, but rather the pieces are
highlighted. e.g. w1 w2 w3. really, the whole
thing should be contained within a single  element.

2) relatedly, and presumably for the same reason, all instances of "w1",
"w2" and "w3" in myfield are highlighted, even when they don't occur
together.

i can't see any possible reason for things working this way, but perhaps
SOLR is just following lucene here.


Solr is using Lucene's built-in highlighter, which has the
deficiencies you mention.  There have been improved highlighting
approaches proposed; see
http://issues.apache.org/jira/browse/LUCENE-663 and
http://issues.apache.org/jira/browse/LUCENE-644.

Improving Solr's highlighting is something I am quite interested in
personally.  Unfortunately, this is an extremely busy time for me at
work, and I doubt that I'll have time to work on this in the near
future.

-Mike

Re: Suggestion for solr.war

2006-12-11 Thread Chris Hostetter


:   application/xml
:   
:
: That causes the MIME type to get set explicitly and Firefox renders
: the page properly, with or without IE Tab.

thanks for the suggestion, i went ahead and added this to the Solr
web.xml, but i used "application/xslt+xml" instead since it seems to work
just as well, and has been in the W3C XSLT 2.0 docs since the May2003
draft...

http://www.w3.org/TR/xslt20/

...I'm not really an XSLT or mime type expert however, so if anyone kowns
a more generally compatible approach we can take plese chime in.


-Hoss

try setting useFilterForSortedQuery to false

2006-12-11 Thread Yonik Seeley


People may want to consider changing the useFilterForSortedQuery
option from true to false (or commenting it out) in solrconfig.xml
I believe it should result in a speedup for the average query that
sorts on something other than score.

Generating and using filters for base queries used to be a win due to:
1) a set of complex queries that were almost never sorted by score,
but the sort changed
2) bugs/slowness in Lucene when dealing with multi-segment indicies

The deficiencies in Lucene have been corrected, and the introduction
of fq filter parameters that are cached separately go a long way
toward mitigating #1.   The downside to this optimization is extra
work when the same base query isn't resorted often, and pollution of
the filterCache. I've changed the default to false for the example
solrconfig.xml

-Yonik

highlighting phrasal hits

New SOLR installation problems

Re: New SOLR installation problems

Re: New SOLR installation problems

Re: New SOLR installation problems

multiple collections

Top Searches

Re: Top Searches

Re: Top Searches

Re: multiple collections

How to query a parent child relationship returning result set of parents?

Re: highlighting phrasal hits

Re: Suggestion for solr.war

try setting useFilterForSortedQuery to false

14 matches

Site Navigation

Mail list logo

Footer information