return matched terms / fuzzy or wildcard searches

2007-03-23 Thread Krystian Napiatek

Hi,

is it possible to get a list of all matched terms, when using queries like:
dna~0.7; d?a; dn*;
I need the terms for highlighting them later in the output.

Thank you && greets
Krystian


Re: multiple indexes

2007-03-23 Thread Maarten . De . Vilder
> Why not create a multivalued field that stores the customer perms?
> add has_access:cust1 has_access:cust2, etc to the document at index
> time, and turn this into a filter query at query time?

that is what we are doing at the moment, and i must say, it works very and 
does not slow the server down at all (because of the efficient indexes 
that solr builds)





"Mike Klaas" <[EMAIL PROTECTED]> 
22/03/2007 19:15
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: multiple indexes






On 3/22/07, Kevin Osborn <[EMAIL PROTECTED]> wrote:
> Here is an issue that I am trying to resolve. We have a large catalog of 
documents, but our customers (several hundred) can only see a subset of 
those documents. And the subsets vary in size greatly. And some of these 
customers will be creating a lot of traffic. Also, there is no way to map 
the subsets to a query. The customer either has access to a document or 
they don't.
>
> Has anybody worked on this issue before? If I use one large index and do 
the filtering in my application, then Solr will be serving a lot of 
useless documents. The counts would also be screwed up for facet queries. 
Is the best solution to extend Solr and do the filtering there?
>
> The other potential solution is to have one index per customer. This 
would require one instance of the servlet per index, correct? It just 
seems like this would require a lot of hardware and complexity 
(configuring the memory of each servlet instance to index size and 
traffic).

Why not create a multivalued field that stores the customer perms?
add has_access:cust1 has_access:cust2, etc to the document at index
time, and turn this into a filter query at query time?

-Mike



Re: return matched terms / fuzzy or wildcard searches

2007-03-23 Thread Erik Hatcher


On Mar 23, 2007, at 5:44 AM, Krystian Napiatek wrote:
is it possible to get a list of all matched terms, when using  
queries like:

dna~0.7; d?a; dn*;
I need the terms for highlighting them later in the output.


Will the built-in highlighting capability help you here?






Re: return matched terms / fuzzy or wildcard searches

2007-03-23 Thread Krystian Napiatek

Yes I do:
...&hl=on&hl.fl=figure&hl.fragsize=0&hl.snippets=200&hl.simple.pre=&hl.simple.post=...

But the response isn't highlighted using fuzzy or wildcard searches...


2007/3/23, Erik Hatcher <[EMAIL PROTECTED]>:



On Mar 23, 2007, at 5:44 AM, Krystian Napiatek wrote:
> is it possible to get a list of all matched terms, when using
> queries like:
> dna~0.7; d?a; dn*;
> I need the terms for highlighting them later in the output.

Will the built-in highlighting capability help you here?

   





Re: SOLR hosting

2007-03-23 Thread Tim Archambault

Is your question inherently asking if someone out there provides a service
that manages the indexes, etc for you and pre-installs and configures the
software?

If NOT, I can tell you that I bought a Linux VPS at Hostmysite.com cheaply
and dedicated 1 virtual domain to my SOLR instance and it worked fairly
easily. I'm no tech expert and got it to run.

Hope that helps.

Tim

On 3/21/07, Michael Kimsal <[EMAIL PROTECTED]> wrote:


Are there any companies that offer hosted SOLR services?

If not, is there any interest in the community in a service like this?


--
Michael Kimsal
http://webdevradio.com



Editing wiki-page "Powerd by Solr"

2007-03-23 Thread Fabio Confalonieri

I have a problem posting an update to the Powered By Solr wiki page.

I would like to add the line:
 * [http://annunci.repubblica.it La Repubblica Newspaper Classifieds] (in
Italian) uses Solr for faceted browsing/filtering through classifieds of one
of the main Italian Newspapers

But I receive this error:
Sorry, can not save page because "annunci.repubblica.it" is not allowed in
this wiki.

I understand "annunci.repubblica.it" is somehow blacklisted, but I cannot
argue why.

Sorry for posting here, I could not find a reference on wiki
posting/editing.

Thank You

Fabio Confalonieri




-- 
View this message in context: 
http://www.nabble.com/Editing-wiki-page-%22Powerd-by-Solr%22-tf3454859.html#a9638264
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Editing wiki-page "Powerd by Solr"

2007-03-23 Thread Tim Archambault

fabio,

Off topic, but thanks for the link to your newspaper classifieds. I manage
newspaper website here in Maine, USA and am VERY INTERESTED in  using solr
to power our jobs, etc.

Looking to integrate SOLR with DRUPAL right now.

I'd like to collaborate with you in the future if possible.

Thank you kindly.

Tim

On 3/23/07, Fabio Confalonieri <[EMAIL PROTECTED]> wrote:



I have a problem posting an update to the Powered By Solr wiki page.

I would like to add the line:
* [http://annunci.repubblica.it La Repubblica Newspaper Classifieds] (in
Italian) uses Solr for faceted browsing/filtering through classifieds of
one
of the main Italian Newspapers

But I receive this error:
Sorry, can not save page because "annunci.repubblica.it" is not allowed in
this wiki.

I understand "annunci.repubblica.it" is somehow blacklisted, but I cannot
argue why.

Sorry for posting here, I could not find a reference on wiki
posting/editing.

Thank You

Fabio Confalonieri




--
View this message in context:
http://www.nabble.com/Editing-wiki-page-%22Powerd-by-Solr%22-tf3454859.html#a9638264
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Editing wiki-page "Powerd by Solr"

2007-03-23 Thread Chris Hostetter

: But I receive this error:
: Sorry, can not save page because "annunci.repubblica.it" is not allowed in
: this wiki.
:
: I understand "annunci.repubblica.it" is somehow blacklisted, but I cannot
: argue why.

this appears to be a built in feature of MoinMoin, there is a global
"BadContent" list maintained centrally...

http://moinmoin.wikiwikiweb.de/AntiSpamGlobalSolution
http://moinmaster.wikiwikiweb.de/BadContent

...it doesn't look like there is currently a way to whitelist things.

I've added a link to the main newspaper site instead, and clarified that
hte classifieds use Solr.

-Hoss



Re: how to use snappuller

2007-03-23 Thread Chris Hostetter

: i should setup rsyncd.conf, if yes,how to setup and snappuller will be ok.

installing rsync is a littl outside the scope of the Solr mailing list --
you'll want to check the documentation for rsync and rsyncd, you'll
probably want to look for info about running rsync over ssh with
passphraseless keys.

Also: i can't think of any good reason to run snappuller as root ... most
of your Solr distribution stuff should be running as user with very low
privillages, ieally the only things it should have access to are the index
files.



-Hoss



Re: How to assure a permanent index.

2007-03-23 Thread Chris Hostetter

: Where can I find some information about snappulling?

http://wiki.apache.org/solr/CollectionDistribution

-Hoss



Re: return matched terms / fuzzy or wildcard searches

2007-03-23 Thread Chris Hostetter

: But the response isn't highlighted using fuzzy or wildcard searches...

Hmmm... this seems like a bug in the highlighting, using the sample schema
this highlights properly...

http://localhost:8983/solr/select/?q=id%3AVA902B&version=2.2&start=0&rows=10&indent=on&fl=id&hl=true&hl.fl=id

...but this does not...

http://localhost:8983/solr/select/?q=id%3AV*&version=2.2&start=0&rows=10&indent=on&fl=id&hl=true&hl.fl=id

perhaps the Solr highlighting code isn't calling rewrite() before using
the Highlighter?



-Hoss



Re: SOLR hosting

2007-03-23 Thread Michael Kimsal

Thanks.  Perhaps I should have clarified a bit.  I was looking more for the
first option.  And part of what I was asking for was to gauge some interest.
If there are no companies offering that, is there any demand in a service
like that?

On 3/23/07, Tim Archambault <[EMAIL PROTECTED]> wrote:


Is your question inherently asking if someone out there provides a service
that manages the indexes, etc for you and pre-installs and configures the
software?

If NOT, I can tell you that I bought a Linux VPS at Hostmysite.com cheaply
and dedicated 1 virtual domain to my SOLR instance and it worked fairly
easily. I'm no tech expert and got it to run.

Hope that helps.

Tim

On 3/21/07, Michael Kimsal <[EMAIL PROTECTED]> wrote:
>
> Are there any companies that offer hosted SOLR services?
>
> If not, is there any interest in the community in a service like this?
>
>
> --
> Michael Kimsal
> http://webdevradio.com
>





--
Michael Kimsal
http://webdevradio.com


Backup and distributed index/backup management

2007-03-23 Thread al patel

Hi:

I am novice to solr in terms of backup/operations.

We have a single instance of master (solr) working well, I tried the backup
scripts etc and could get things working fine.

My question is, even with backup, solr will still have a single index,
right? We will have huge amount of data in index - it is ever increasing.

I want to archive older data - say every 2 weeks and start a new index - but
want the older indices to be searchable.

I can potentially take a snapshot at master at 2 week interval, backup and
restart master with fresh index.

On the slaves, where the actual searches happen, how do I deal with things -
won't there be multiple indices there then?

Does solr handle this - how? Or how do I solve this problem? Open to other
suggestions too.

Best Regards
-al


Using cocoon to update index

2007-03-23 Thread Winona Salesky
Hi,
Is anyone using cocoon to index data? I'm trying to do this via cincludes
but I have had no luck. If you are using cocoon, and are POSTing data to
solr via a pipeline, would you share an example of how you have things
working.
Thanks for the help,
-Winona

-
Winona Salesky
The University of Vermont Libraries
[EMAIL PROTECTED]



Re: return matched terms / fuzzy or wildcard searches

2007-03-23 Thread Mike Klaas

On 3/23/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:


: But the response isn't highlighted using fuzzy or wildcard searches...

Hmmm... this seems like a bug in the highlighting, using the sample schema
this highlights properly...

http://localhost:8983/solr/select/?q=id%3AVA902B&version=2.2&start=0&rows=10&indent=on&fl=id&hl=true&hl.fl=id

...but this does not...

http://localhost:8983/solr/select/?q=id%3AV*&version=2.2&start=0&rows=10&indent=on&fl=id&hl=true&hl.fl=id

perhaps the Solr highlighting code isn't calling rewrite() before using
the Highlighter?


It is, in trunk/:

NamedList sumData = HighlightingUtils.doHighlighting(
   results.docList, query.rewrite(req.getSearcher().getReader()),
req, new String[]{defaultFiel
d});

Definitely a bug somewhere.  Does anyone more familiar with lucene see
why the above wouldn't be sufficient?

-Mike


Re: return matched terms / fuzzy or wildcard searches

2007-03-23 Thread Yonik Seeley

On 3/23/07, Mike Klaas <[EMAIL PROTECTED]> wrote:

On 3/23/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
> : But the response isn't highlighted using fuzzy or wildcard searches...
>
> Hmmm... this seems like a bug in the highlighting, using the sample schema
> this highlights properly...
>
> 
http://localhost:8983/solr/select/?q=id%3AVA902B&version=2.2&start=0&rows=10&indent=on&fl=id&hl=true&hl.fl=id
>
> ...but this does not...
>
> 
http://localhost:8983/solr/select/?q=id%3AV*&version=2.2&start=0&rows=10&indent=on&fl=id&hl=true&hl.fl=id
>
> perhaps the Solr highlighting code isn't calling rewrite() before using
> the Highlighter?

It is, in trunk/:

NamedList sumData = HighlightingUtils.doHighlighting(
results.docList, query.rewrite(req.getSearcher().getReader()),
req, new String[]{defaultFiel
d});

Definitely a bug somewhere.  Does anyone more familiar with lucene see
why the above wouldn't be sufficient?


Perhaps our use of ConstantScorePrefixQuery by default?

-Yonik


Re: return matched terms / fuzzy or wildcard searches

2007-03-23 Thread Mike Klaas

On 3/23/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

On 3/23/07, Mike Klaas <[EMAIL PROTECTED]> wrote:



> Definitely a bug somewhere.  Does anyone more familiar with lucene see
> why the above wouldn't be sufficient?

Perhaps our use of ConstantScorePrefixQuery by default?


tracked here: http://issues.apache.org/jira/browse/SOLR-195

-Mike


Re: return matched terms / fuzzy or wildcard searches

2007-03-23 Thread Erik Hatcher


On Mar 23, 2007, at 3:26 PM, Yonik Seeley wrote:

On 3/23/07, Mike Klaas <[EMAIL PROTECTED]> wrote:

On 3/23/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
> : But the response isn't highlighted using fuzzy or wildcard  
searches...

>
> Hmmm... this seems like a bug in the highlighting, using the  
sample schema

> this highlights properly...
>
> http://localhost:8983/solr/select/?q=id% 
3AVA902B&version=2.2&start=0&rows=10&indent=on&fl=id&hl=true&hl.fl=id

>
> ...but this does not...
>
> http://localhost:8983/solr/select/?q=id% 
3AV*&version=2.2&start=0&rows=10&indent=on&fl=id&hl=true&hl.fl=id

>
> perhaps the Solr highlighting code isn't calling rewrite()  
before using

> the Highlighter?

It is, in trunk/:

NamedList sumData = HighlightingUtils.doHighlighting(
results.docList, query.rewrite(req.getSearcher().getReader 
()),

req, new String[]{defaultFiel
d});

Definitely a bug somewhere.  Does anyone more familiar with lucene  
see

why the above wouldn't be sufficient?


Perhaps our use of ConstantScorePrefixQuery by default?


Ah, that would probably explain it!   I had stumbled on this before  
too and went to fix it and saw the rewrite in there and was  
perplexed, but then got distracted by something shiny.


Erik



sorting question

2007-03-23 Thread shai deljo

Is there a way (in 1 query) to retrieve the best scoring X results and
then sort them by another field (date  for example)?


Re: Setting "Solr Home" via JNDI on Tomcat Bundled with JBoss

2007-03-23 Thread Chris Hostetter

: with JBoss AS 4.0.5 GA.  There is plenty of help on the Solr Wiki about
: setting it up on Tomcat 5.5 Standalone, but no help on Tomcat 5.5 Bundled.

Sorry, i don't really know anything about JBoss.

You might wnat to start by tackling the JNDI problem seperate from Solr
... make a simple little WAR containing a single JSP that echos the value
of a JNDI variable ... and figure out the neccessary JBoss/Tomcat config
to mke that work (presumably the JBoss community can help since you'll
have a simple, portable test case that won't involve explaining Solr to
them)

if that works, but the same config for solr doesn't work ... well then
we go back to hte drawing board.




-Hoss



Re: return matched terms / fuzzy or wildcard searches

2007-03-23 Thread Chris Hostetter

: > Perhaps our use of ConstantScorePrefixQuery by default?
:
: Ah, that would probably explain it!   I had stumbled on this before
: too and went to fix it and saw the rewrite in there and was
: perplexed, but then got distracted by something shiny.

yeah, that makes sense ... a true wildcard query works fine...

http://localhost:8983/solr/select/?q=id:V???B*&fl=id&hl=true&hl.fl=id


To answer your question Krystian: it's suppose to work for you, for
fuzzy queries (like: dna~0.7) and wildcard queries (like: d?a) it
should currently be working fine ... pelase send us an example Solr URL
that doesn't work if it's not what you are observing.

Only a simple prefix query (like: dn*) doesn't work ... and that seems to
be because of the way we optimize a PrefixQuery into a
ConstantScorePrefixQuery .. a workarround is to allways include a "?" in
your query when you want highlighting -- so instead of dn* search for dn?*


-Hoss



Re: sorting question

2007-03-23 Thread Chris Hostetter

: Is there a way (in 1 query) to retrieve the best scoring X results and
: then sort them by another field (date  for example)?

not at the moment.

keep in mind, this is the type of thing that can be done easily on the
client side -- pull back the top X results sorted by score, then sort by
date.



-Hoss



Re: sorting question

2007-03-23 Thread Walter Underwood
You could also promote recent results with a function query term.
I've done that for news sites, where "recency" is an important
part of relevancy.  --wunder

On 3/23/07 4:59 PM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote:

> 
> : Is there a way (in 1 query) to retrieve the best scoring X results and
> : then sort them by another field (date  for example)?
> 
> not at the moment.
> 
> keep in mind, this is the type of thing that can be done easily on the
> client side -- pull back the top X results sorted by score, then sort by
> date.
> 
> -Hoss