Re: return matched terms / fuzzy or wildcard searches

2007-03-24 Thread Krystian Napiatek

My Solr-Server: http://www.captionsearch.de/solr.html
Everytime you make a new search you get the last response file here:
http://www.captionsearch.de/response.xml


2007/3/24, Chris Hostetter <[EMAIL PROTECTED]>:



: > Perhaps our use of ConstantScorePrefixQuery by default?
:
: Ah, that would probably explain it!   I had stumbled on this before
: too and went to fix it and saw the rewrite in there and was
: perplexed, but then got distracted by something shiny.

yeah, that makes sense ... a true wildcard query works fine...

http://localhost:8983/solr/select/?q=id:V???B*&fl=id&hl=true&hl.fl=id


To answer your question Krystian: it's suppose to work for you, for
fuzzy queries (like: dna~0.7) and wildcard queries (like: d?a) it
should currently be working fine ... pelase send us an example Solr URL
that doesn't work if it's not what you are observing.

Only a simple prefix query (like: dn*) doesn't work ... and that seems to
be because of the way we optimize a PrefixQuery into a
ConstantScorePrefixQuery .. a workarround is to allways include a "?" in
your query when you want highlighting -- so instead of dn* search for dn?*


-Hoss




schema field type doesn't work

2007-03-24 Thread Dimitar Ouzounov

Hi everybody,
I added the following fieldtype in schema.xml :


  
 
 
 
 
 
  

I want to index two types of strings, for example :

12345678
1234-5678

No matter which of the above strings is stored, I'd like to match it by
using either 12345678 or 1234-5678.
Everything is working fine, except for the case when 12345678 is stored and
I try to match it using
1234-5678. I must be doing something wrong, maybe in the schema. Does anyone
have any suggestions?
Any help would be greatly appreciated.


Re: schema field type doesn't work

2007-03-24 Thread Bertrand Delacretaz

On 3/24/07, Dimitar Ouzounov <[EMAIL PROTECTED]> wrote:


...I must be doing something wrong, maybe in the schema. Does anyone
have any suggestions?..


The best way to debug such problems is with the analyzer admin tool:
http://localhost:8983/solr/admin/analysis.jsp

You can try various combinations of analyzers and see what Solr
actually indexes for various values.

HTH,
-Bertrand


Re: schema field type doesn't work

2007-03-24 Thread Yonik Seeley

On 3/24/07, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote:

On 3/24/07, Dimitar Ouzounov <[EMAIL PROTECTED]> wrote:

> ...I must be doing something wrong, maybe in the schema. Does anyone
> have any suggestions?..

The best way to debug such problems is with the analyzer admin tool:
http://localhost:8983/solr/admin/analysis.jsp


Yep...
trying the analysis page, one can see that parts of the numbers (not
just the catenation) are also still being generated, messing up the
query.

So if 123-456 is indexed, and you also want to be able to match parts
of that number (like 123), then you need a query analyzer and an index
analyzer for the field type, and turn off generation of parts for the
query analyzer only.

If you don't want to match parts, then a single analyzer for both
query and indexing will do, but explicitly turn off part generation:
   

-Yonik


Re: schema field type doesn't work

2007-03-24 Thread Dimitar Ouzounov

Thanks a lot ! The analyzer admin tool is indeed useful.

On 3/24/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:


On 3/24/07, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote:
> On 3/24/07, Dimitar Ouzounov <[EMAIL PROTECTED]> wrote:
>
> > ...I must be doing something wrong, maybe in the schema. Does anyone
> > have any suggestions?..
>
> The best way to debug such problems is with the analyzer admin tool:
> http://localhost:8983/solr/admin/analysis.jsp

Yep...
trying the analysis page, one can see that parts of the numbers (not
just the catenation) are also still being generated, messing up the
query.

So if 123-456 is indexed, and you also want to be able to match parts
of that number (like 123), then you need a query analyzer and an index
analyzer for the field type, and turn off generation of parts for the
query analyzer only.

If you don't want to match parts, then a single analyzer for both
query and indexing will do, but explicitly turn off part generation:
   

-Yonik



Re: sorting question

2007-03-24 Thread shai deljo

True, but let me ask the question in a different way.
The problem is that when I run the query and order by date then the
most recent results are not relevant enough (in general I find I need
to do work on top of what solr provides in order to get good
relevancy) so I guess I'm looking more for of a threashold to retrieve
results only from a certain score and I need this threshold to be
adaptive. I.e it's not about the number of results to retrieve since I
want as many as possible so I have better chance to get the most
recent one, but more about getting all the results that are relevant
enough.
When I display results sorted by score this is not a problem because
all these results hide in page number X (X is big).

I can think of several hacks (e.g calculating the distribution of
results myself) to do this but was wondering if there is a proper
solution.
Thx

On 3/23/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:


: Is there a way (in 1 query) to retrieve the best scoring X results and
: then sort them by another field (date  for example)?

not at the moment.

keep in mind, this is the type of thing that can be done easily on the
client side -- pull back the top X results sorted by score, then sort by
date.



-Hoss




Re: Backup and distributed index/backup management

2007-03-24 Thread al patel

Reposting :)

Hi:


I am novice to solr in terms of backup/operations.

We have a single instance of master (solr) working well, I tried the
backup scripts etc and could get things working fine.

My question is, even with backup, solr will still have a single index,
right? We will have huge amount of data in index - it is ever increasing.

I want to archive older data - say every 2 weeks and start a new index -
but want the older indices to be searchable.

I can potentially take a snapshot at master at 2 week interval, backup and
restart master with fresh index.

On the slaves, where the actual searches happen, how do I deal with things
- won't there be multiple indices there then?

Does solr handle this - how? Or how do I solve this problem? Open to other
suggestions too.

Best Regards
-al



Re: Backup and distributed index/backup management

2007-03-24 Thread Chris Hostetter

: My question is, even with backup, solr will still have a single index,
: right? We will have huge amount of data in index - it is ever increasing.

if you have older docs you want to retire out of your index, you'll need
to do that manually (delete by query can come in handy)

: I want to archive older data - say every 2 weeks and start a new index - but
: want the older indices to be searchable.
:
: I can potentially take a snapshot at master at 2 week interval, backup and
: restart master with fresh index.

you don't really need to restart the master ... you could pull snapshots
from your master to a slave, and then when you decide that slave is "full"
of old docs you stop pulling snapshots, and delete the old docs from your
master and start replicating to a new slave.

: Does solr handle this - how? Or how do I solve this problem? Open to other
: suggestions too.

what you're describing is fairly outside of what i would consider "normal"
Solr usage .. it seems very special purpose.



-Hoss



Re: Using cocoon to update index

2007-03-24 Thread Chris Hostetter

: Is anyone using cocoon to index data? I'm trying to do this via cincludes
: but I have had no luck. If you are using cocoon, and are POSTing data to
: solr via a pipeline, would you share an example of how you have things

you may want to take a look at the forest plugin Thorsten wrote, or the
Cocoon/Solr/Subversion presentation Bertrand gave at the Cocoon
2006 GetTogether...

http://forrest.apache.org/pluginDocs/plugins_0_80/org.apache.forrest.plugin.output.solr/
http://wiki.apache.org/solr/SolrResources
http://wiki.apache.org/cocoon-data/attachments/GT2006Notes/attachments/13-SubversionSolr.pdf




-Hoss



Re: return matched terms / fuzzy or wildcard searches

2007-03-24 Thread Mike Klaas

On 3/23/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:


Only a simple prefix query (like: dn*) doesn't work ... and that seems to
be because of the way we optimize a PrefixQuery into a
ConstantScorePrefixQuery .. a workarround is to allways include a "?" in
your query when you want highlighting -- so instead of dn* search for dn?*


Note that you need the a recent nightly build for that to work--it
wasn't there for the last release.

-Mike


RE: Using cocoon to update index

2007-03-24 Thread Binkley, Peter
I've blogged a method of doing this using Cocoon's webdav transformer: 
http://www.wallandbinkley.com/quaedam/?p=104
 
Peter
 



From: Winona Salesky [mailto:[EMAIL PROTECTED]
Sent: Fri 3/23/2007 12:14 PM
To: solr-user@lucene.apache.org
Subject: Using cocoon to update index



Hi,
Is anyone using cocoon to index data? I'm trying to do this via cincludes
but I have had no luck. If you are using cocoon, and are POSTing data to
solr via a pipeline, would you share an example of how you have things
working.
Thanks for the help,
-Winona

-
Winona Salesky
The University of Vermont Libraries
[EMAIL PROTECTED]