Is this solr 1.2 a final version?

2007-06-07 Thread Thierry Collogne

Hello,

I was just downloading solr and noticed that there is a 1.2 version
available. Is this the final 1.2 version?
Is this the version that is to be used?

Thank you,

Thierry


how to crawl when Solr is search engine?

2007-06-07 Thread Manoharam Reddy

I have just begun using Solr. I see that we have to insert documents
by posting XMLs to solr/update

I would like to know how Solr is used as a search engine in
enterprises. How do you do the crawling of your intranet and passing
the information as XML to solr/update. Isn't this going to be slow? To
put all content in the index via a HTTP POST request requiring network
sockets to be opened?

Isn't there any direct way to to do the same thing without resorting to HTTP?


Re: how to crawl when Solr is search engine?

2007-06-07 Thread Ian Holsman

Hi Manoharam.

we use nutch to do the crawl, and have used sami's patch of nutch 
(http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html 
) to have it integrate with Solr. It works quite well for our needs.


If you are concerned with the speed, Solr also has a CSV upload 
facility, which you might be able to use to upload the data into solr 
that way, but we haven't found the HTTP Post speed to be an issue for us.


Regards
Ian


Manoharam Reddy wrote:

I have just begun using Solr. I see that we have to insert documents
by posting XMLs to solr/update

I would like to know how Solr is used as a search engine in
enterprises. How do you do the crawling of your intranet and passing
the information as XML to solr/update. Isn't this going to be slow? To
put all content in the index via a HTTP POST request requiring network
sockets to be opened?

Isn't there any direct way to to do the same thing without resorting 
to HTTP?






Re: solr+hadoop = next solr

2007-06-07 Thread Ian Holsman

Yonik Seeley wrote:

On 6/6/07, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
In terms of the FederatedSearch wiki entry (updated last year), has 
there
been any progress made this year on this topic, at least something 
worthy of

being added or updated to the wiki page?


Priorities shifted, and I dropped it for a while.
I recently started working with a CNET group that may need it, so I
could start working on it again in the next few months.  Don't wait
for me if you have ideas though... I'll try to follow along and chime
in.

-Yonik


Hi Yonik,

we also have a need for federated search where I work, and are planning
on getting going in the week or two hopefully.

The team will post to the list when they have something more concrete to 
add.


Re: how to crawl when Solr is search engine?

2007-06-07 Thread Manoharam Reddy

Thanks for your quick response.

This brings me to another question. As far as I know Nutch can take
care of crawling as well as indexing. Then why go through the hassle
of crawling through Nutch and integrating it into Solr?

Another question I have, Solr provides the search results in XML
format, any ready made tools to convert them directly to web pages for
visitors to see?

On 6/7/07, Ian Holsman <[EMAIL PROTECTED]> wrote:

Hi Manoharam.

we use nutch to do the crawl, and have used sami's patch of nutch
(http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html
) to have it integrate with Solr. It works quite well for our needs.

If you are concerned with the speed, Solr also has a CSV upload
facility, which you might be able to use to upload the data into solr
that way, but we haven't found the HTTP Post speed to be an issue for us.

Regards
Ian


Manoharam Reddy wrote:
> I have just begun using Solr. I see that we have to insert documents
> by posting XMLs to solr/update
>
> I would like to know how Solr is used as a search engine in
> enterprises. How do you do the crawling of your intranet and passing
> the information as XML to solr/update. Isn't this going to be slow? To
> put all content in the index via a HTTP POST request requiring network
> sockets to be opened?
>
> Isn't there any direct way to to do the same thing without resorting
> to HTTP?
>




Re: how to crawl when Solr is search engine?

2007-06-07 Thread Ian Holsman

Manoharam Reddy wrote:

Thanks for your quick response.

This brings me to another question. As far as I know Nutch can take
care of crawling as well as indexing. Then why go through the hassle
of crawling through Nutch and integrating it into Solr?


I found Solr's caching and maintenance easier to use than nutch's. But 
that's just me.




Another question I have, Solr provides the search results in XML
format, any ready made tools to convert them directly to web pages for
visitors to see?


yep.. it's called XSLT. most modern browsers can do the transform on the 
client side.
otherwise there is some server side tools (cocoon I think does this) to 
do the transform on the server before sending it out.


--Ian


On 6/7/07, Ian Holsman <[EMAIL PROTECTED]> wrote:

Hi Manoharam.

we use nutch to do the crawl, and have used sami's patch of nutch
(http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html 


) to have it integrate with Solr. It works quite well for our needs.

If you are concerned with the speed, Solr also has a CSV upload
facility, which you might be able to use to upload the data into solr
that way, but we haven't found the HTTP Post speed to be an issue for 
us.


Regards
Ian


Manoharam Reddy wrote:
> I have just begun using Solr. I see that we have to insert documents
> by posting XMLs to solr/update
>
> I would like to know how Solr is used as a search engine in
> enterprises. How do you do the crawling of your intranet and passing
> the information as XML to solr/update. Isn't this going to be slow? To
> put all content in the index via a HTTP POST request requiring network
> sockets to be opened?
>
> Isn't there any direct way to to do the same thing without resorting
> to HTTP?
>








Re: how to crawl when Solr is search engine?

2007-06-07 Thread Manoharam Reddy

Pardon me if I am taking too much of your time.

It would be really great if you could please highlight a few
advantages of caching and maintenance over nutch.

Some musing:-
(I have used Nutch before and one thing I observed there was that if I
delete the crawl folder when Nutch is running, users can still search
and obtain proper results. It seems Nutch caches all the indexes in
the memory when it starts. I don't understand how is that feasible
when the size of the crawl is in the order of 10 GBs where as you have
a RAM + swap of only a few GBs.)

How is Solr caching better than this?

On 6/7/07, Ian Holsman <[EMAIL PROTECTED]> wrote:

Manoharam Reddy wrote:
> Thanks for your quick response.
>
> This brings me to another question. As far as I know Nutch can take
> care of crawling as well as indexing. Then why go through the hassle
> of crawling through Nutch and integrating it into Solr?

I found Solr's caching and maintenance easier to use than nutch's. But
that's just me.

>
> Another question I have, Solr provides the search results in XML
> format, any ready made tools to convert them directly to web pages for
> visitors to see?

yep.. it's called XSLT. most modern browsers can do the transform on the
client side.
otherwise there is some server side tools (cocoon I think does this) to
do the transform on the server before sending it out.

--Ian
>
> On 6/7/07, Ian Holsman <[EMAIL PROTECTED]> wrote:
>> Hi Manoharam.
>>
>> we use nutch to do the crawl, and have used sami's patch of nutch
>> 
(http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html
>>
>> ) to have it integrate with Solr. It works quite well for our needs.
>>
>> If you are concerned with the speed, Solr also has a CSV upload
>> facility, which you might be able to use to upload the data into solr
>> that way, but we haven't found the HTTP Post speed to be an issue for
>> us.
>>
>> Regards
>> Ian
>>
>>
>> Manoharam Reddy wrote:
>> > I have just begun using Solr. I see that we have to insert documents
>> > by posting XMLs to solr/update
>> >
>> > I would like to know how Solr is used as a search engine in
>> > enterprises. How do you do the crawling of your intranet and passing
>> > the information as XML to solr/update. Isn't this going to be slow? To
>> > put all content in the index via a HTTP POST request requiring network
>> > sockets to be opened?
>> >
>> > Isn't there any direct way to to do the same thing without resorting
>> > to HTTP?
>> >
>>
>>
>




Re: Highlight in a response writer, bad practice ?

2007-06-07 Thread Frédéric Glorieux


Simplicity. 


The best answer :o)

 The memory usage for highlight fields in normal responses

is not an issue.
If it becomes an issue for you, then you're roughly taking the right 
approach.


However, rather than write your own response writer to solve your
issue, you might consider
just your own response handler, 


I should, but perhaps not for the same reasons as below.


and insert an Iterable (which will be
written as an array in the response writer).  This way, all response
writers (xml, json, etc) will work.


To my opinion, it seems that a KWIC view 
 could be just a response writer, 
with its own configuration parameters (like size of lines), open to 
multiple type of queries. The only input needed is an hits object 
implementation.


I will try to think it in the most generic view I'm able to, if some one 
could find usage of that...



--
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique


highlight and wildcards ?

2007-06-07 Thread Frédéric Glorieux


Hi all,

I'm talking about solr subversion, jetty example, default documents, 
like the tutorial. I tried to highlight queries with wildcard. Documents 
are found like waited, but I haven't seen the terms highlighted. It 
seems to work with fuzzy search, so I thought it was a supposed feature. 
Am I wrong ?



Tests
=

q=solr
http://localhost:8983/solr/select?indent=on&version=2.2&q=solr&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl=on&hl.fl=features


 
   
 Scalability - Efficient Replication to other Solr 
Search Servers

  
 


q=black~ (fuzzy search)
see black and clocked
http://localhost:8983/solr/select?indent=on&version=2.2&q=solr&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl=on&hl.fl=features

  

  Printing speed up to 29ppm black, 19ppm color

  
  
  

  NVIDIA GeForce 7800 GTX GPU/VPU clocked at 
486MHz


  
  

 ATI RADEON X1900 GPU/VPU clocked at 650MHz

  




q=a*
http://localhost:8983/solr/select?indent=on&version=2.2&start=0&rows=100&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl=on&hl.fl=features&q=a*

  
  
  
  
  
  
  
  
  
  
  
  
  



--
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique


Re: how to crawl when Solr is search engine?

2007-06-07 Thread Bertrand Delacretaz

On 6/7/07, Ian Holsman <[EMAIL PROTECTED]> wrote:


. it's called XSLT. most modern browsers can do the transform on the
client side.
otherwise there is some server side tools (cocoon I think does this) to
do the transform on the server before sending it out


Solr also does server-side XSLT, see
http://wiki.apache.org/solr/XsltResponseWriter

-Bertrand


Logging errors from multiple solr instances

2007-06-07 Thread Walter Lewis
I'm running solr 1.1 under Tomcat 5.5.  On the development machine there 
are a modest number of instances of solr indexes (six).


In the logs currently the only way to distinguish them is to compare the 
[EMAIL PROTECTED], where the someIdentifier changes each time 
Tomcat is restarted (depressingly frequently in my programming style).


This value isn't written out for commits and queries, as well as a 
variety of other instances where distinguishing between the activities 
posted against the various indexes would be useful.


Is this addressed in 1.2 or is running multiple instances of indexes 
such a Bad Idea that supporting this would be leading a fool further astray?


Walter Lewis


host logging options (was Re: Schema validator/debugger)

2007-06-07 Thread Walter Lewis

Andrew Nagy wrote:

Yonik Seeley wrote:

I dropped your schema.xml directly into the Solr example (using
Jetty), fired it up, and everything works fine!?

Okay, I switched over to Jetty and now I get a different error:
SEVERE: org.apache.solr.core.SolrException: undefined field text 
As someone who has used both Jetty and Tomcat in production (and has 
come to prefer Tomcat), what are my choices to get the "undefined field 
xxx" error in the catalina log files (or is it stashed somewhere I'm 
overlooking?)


Walter Lewis


Re: Logging errors from multiple solr instances

2007-06-07 Thread Clay Webster

Perhaps not the most elegant, but running each index on a
different container & port works pretty well.  And we can tune
the jvm (and of course caches) differently.

--cw


Re: how to crawl when Solr is search engine?

2007-06-07 Thread Walter Underwood
Solr is not designed to be a general enterprise search engine. It is
a back end search server.

If you are going to crawl your intranet, you will need a good crawler
that is easy to manage, and the ability to parse lots of kinds of
documents. Unfortunately, Solr really doesn't have those.

Commercial solutions aren't very expensive, probably less than the
cost of the time it would take you to put together a worse solution
from open source bits.

Look at Ultraseek (www.ultraseek.com), IBM OmniFind, or one of the
Google Search Appliances. Ultraseek and OmniFind are software
products and have eval downloads. I worked on Ultraseek for years
and it is really easy to install and get going.

Why would posting XML be any slower than the initial crawl over
HTTP? It is local, it should be way faster.

wunder

On 6/7/07 12:30 AM, "Manoharam Reddy" <[EMAIL PROTECTED]> wrote:

> I have just begun using Solr. I see that we have to insert documents
> by posting XMLs to solr/update
> 
> I would like to know how Solr is used as a search engine in
> enterprises. How do you do the crawling of your intranet and passing
> the information as XML to solr/update. Isn't this going to be slow? To
> put all content in the index via a HTTP POST request requiring network
> sockets to be opened?
> 
> Isn't there any direct way to to do the same thing without resorting to HTTP?



Re: Wildcards / Binary searches

2007-06-07 Thread Frédéric Glorieux




Sorry to jump on a "Side  note" of the thread, but the topic is about 
some of my need of the moment.



Side Note: It's my opinion that "type ahead" or "auto complete' style
functionality is best addressed by customized logic (most likely using
specially built fields containing all of the prefixes of the key words up
to N characters as seperate tokens).  


Do you mean something like below ?
w wo wor word


simple uses of PrefixQueries are
only going ot get you so far particularly under heavy load or in an index
with a large number of unique terms.


For a bibliographic app with lucene, I implemented a suggest on 
different fields (especially "subject" terms, like topic or place), to 
populate a form with already used values. I used the Lucene IndexReader 
to get very fastly list of terms in sorting order, without duplicate values.




There's a bad drawback of this way, "The enumeration is ordered by 
Term.compareTo()", the sorting order is natively ASCII, uppercase is 
before lowercase. I had to patch Lucene Term.compareTo() for this 
project, definitively not a good practice for portability of indexes. A 
duplicate field with an analyser to produce a sortable ASCII version 
would be better.


Opinions of the list on this topic would be welcome.

--
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique


Solr 1.2 released

2007-06-07 Thread Yonik Seeley

Solr 1.2 is now available for download!
This is the first release since Solr graduated from the Incubator, and
includes many improvements, including CSV/delimited-text data
loading, time based auto-commit, faster faceting, negative filters,
a spell-check handler, sounds-like word filters, regex text filters,
and more flexible plugins.

Solr releases can be downloaded from
http://www.apache.org/dyn/closer.cgi/lucene/solr/

-Yonik


Re: Is this solr 1.2 a final version?

2007-06-07 Thread Yonik Seeley

On 6/7/07, Thierry Collogne <[EMAIL PROTECTED]> wrote:

I was just downloading solr and noticed that there is a 1.2 version
available. Is this the final 1.2 version?
Is this the version that is to be used?


Yes.  A release is typically available a day before an announcement
because it takes a while for it to propagate to all the mirrors.

-Yonik


RE: highlight and wildcards ?

2007-06-07 Thread Xuesong Luo
Frédéric,
I asked a similar question several days before, it seems we don't have a 
perfect solution when using prefix wildcard with highlight. Here is what Chris 
said: 

in Solr 1.1, highlighting used the info from the raw query to do highlighting, 
hence in your query for consult* it would highlight the Consult part of 
Consultant even though the prefix query was matchign the whole word.  In the 
trunk (soon to be Solr 1.2) Mike fixed that so the query is "rewritten" to it's 
expanded form before highlighting is done ...
this works great for true wild card queries (ie: cons*t* or cons?lt*) but for 
prefix queries Solr has an optimization ofr Prefix queries (ie:
consult*) to reduce the likely hood of Solr crashing if the prefix matches a 
lot of terms ... unfortunately this breaks highlighting of prefix queries, and 
no one has implemented a solution yet...



-Original Message-
From: Frédéric Glorieux [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 07, 2007 3:52 AM
To: solr-user@lucene.apache.org
Subject: highlight and wildcards ?


Hi all,

I'm talking about solr subversion, jetty example, default documents, 
like the tutorial. I tried to highlight queries with wildcard. Documents 
are found like waited, but I haven't seen the terms highlighted. It 
seems to work with fuzzy search, so I thought it was a supposed feature. 
Am I wrong ?


Tests
=

q=solr
http://localhost:8983/solr/select?indent=on&version=2.2&q=solr&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl=on&hl.fl=features


  

  Scalability - Efficient Replication to other Solr 
Search Servers
   
  


q=black~ (fuzzy search)
see black and clocked
http://localhost:8983/solr/select?indent=on&version=2.2&q=solr&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl=on&hl.fl=features

   
 
   Printing speed up to 29ppm black, 19ppm color
 
   
   
   
 
   NVIDIA GeForce 7800 GTX GPU/VPU clocked at 
486MHz
 
   
   
 
  ATI RADEON X1900 GPU/VPU clocked at 650MHz
 
   




q=a*
http://localhost:8983/solr/select?indent=on&version=2.2&start=0&rows=100&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl=on&hl.fl=features&q=a*

   
   
   
   
   
   
   
   
   
   
   
   
   



-- 
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique




Re: highlight and wildcards ?

2007-06-07 Thread Frédéric Glorieux

Xuesong (?),

Thanks a lot for your answer, sorry to have not scan the archives 
before. This a really good and understandable reason, but sad for my 
project. Prefix queries will be the main activities of my users (they 
need to search latin texts, so that domin* is enough to match "dominus" 
or "domino"). So, I need some more investigations.


Xuesong Luo a écrit :


Frédéric,
I asked a similar question several days before, it seems we don't have a perfect solution when using prefix wildcard with highlight. Here is what Chris said: 


in Solr 1.1, highlighting used the info from the raw query to do highlighting, hence in 
your query for consult* it would highlight the Consult part of Consultant even though the 
prefix query was matchign the whole word.  In the trunk (soon to be Solr 1.2) Mike fixed 
that so the query is "rewritten" to it's expanded form before highlighting is 
done ...
this works great for true wild card queries (ie: cons*t* or cons?lt*) but for 
prefix queries Solr has an optimization ofr Prefix queries (ie:
consult*) to reduce the likely hood of Solr crashing if the prefix matches a 
lot of terms ... unfortunately this breaks highlighting of prefix queries, and 
no one has implemented a solution yet...




--
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique


RE: highlight and wildcards ?

2007-06-07 Thread Xuesong Luo
Same in my project. Chris does mention we can put a ? before the *, so instead 
of domin*, you can use domin?*, however that requires at least one char 
following your search string.


-Original Message-
From: Frédéric Glorieux [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 07, 2007 10:37 AM
To: solr-user@lucene.apache.org
Cc: Florence Clavaud; Nicolas Legrand
Subject: Re: highlight and wildcards ?

Xuesong (?),

Thanks a lot for your answer, sorry to have not scan the archives 
before. This a really good and understandable reason, but sad for my 
project. Prefix queries will be the main activities of my users (they 
need to search latin texts, so that domin* is enough to match "dominus" 
or "domino"). So, I need some more investigations.

Xuesong Luo a écrit :

> Frédéric,
> I asked a similar question several days before, it seems we don't have a 
> perfect solution when using prefix wildcard with highlight. Here is what 
> Chris said: 
> 
> in Solr 1.1, highlighting used the info from the raw query to do 
> highlighting, hence in your query for consult* it would highlight the Consult 
> part of Consultant even though the prefix query was matchign the whole word.  
> In the trunk (soon to be Solr 1.2) Mike fixed that so the query is 
> "rewritten" to it's expanded form before highlighting is done ...
> this works great for true wild card queries (ie: cons*t* or cons?lt*) but for 
> prefix queries Solr has an optimization ofr Prefix queries (ie:
> consult*) to reduce the likely hood of Solr crashing if the prefix matches a 
> lot of terms ... unfortunately this breaks highlighting of prefix queries, 
> and no one has implemented a solution yet...



-- 
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique




Multi-language indexing and searching

2007-06-07 Thread Daniel Alheiros
Hi, 

I'm just starting to use Solr and so far, it has been a very interesting
learning process. I wasn't a Lucene user, so I'm learning a lot about both.

My problem is:
I have to index and search content in several languages.

My scenario is a bit different from other that I've already read in this
forum, as my client is the same to search any language and it could be
accomplished using a field to define language.

My questions are more focused on how to keep the benefits of all the
protwords, stopwords and synonyms in a multilanguage situation

Should I create new Analyzers that can deal with the "language" field of the
document? What do you recommend?

Regards,
Daniel 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.



Re: highlight and wildcards ?

2007-06-07 Thread Walter Underwood
Implementing a stemmer for Latin might be easier for you and for
your users. It will probably provide better results, too.

http://informationr.net/ir/2-1/paper10.html

wunder

On 6/7/07 10:36 AM, "Frédéric Glorieux" <[EMAIL PROTECTED]>
wrote:

> Thanks a lot for your answer, sorry to have not scan the archives
> before. This a really good and understandable reason, but sad for my
> project. Prefix queries will be the main activities of my users (they
> need to search latin texts, so that domin* is enough to match "dominus"
> or "domino"). So, I need some more investigations.



Re: Multi-language indexing and searching

2007-06-07 Thread Walter Underwood
I'm not sure what sort of "field" you mean for defining the
language.

If you plan to use a single search UI regardless of language,
we used to do this in Ultraseek, but it doesn't really work.
Queries are too short for reliable language ID (is "die" in
German, English, or Latin?), and language-specific processing
can be pretty differnent.

We ran into surface words that collided in different languages.
As I remember, "mobile" is a plural noun in Dutch but a verb in
English.

Finally, Solr lingustic support is OK for English, but not as
good for more heavily-inflected langauge. For German, you
really need to decompose compound words, something not available
in Solr.

The only semi-successful cross-language search seems to be with
n-gram indexing. That usually produces a larger index and somewhat
slower performance (because of the number of terms), but at least
it works.

wunder

On 6/7/07 10:47 AM, "Daniel Alheiros" <[EMAIL PROTECTED]> wrote:

> I have to index and search content in several languages.
> 
> My scenario is a bit different from other that I've already read in this
> forum, as my client is the same to search any language and it could be
> accomplished using a field to define language.



filter query speed

2007-06-07 Thread Michael Thessel
Hello UG,

I've got a problem with filtered queries. I have an index with about 8
million documents. I save a timestamp (not the time of indexing) for
each document as an integer field. Querying the index is pretty fast.
But when I filter on the timestamp the queries are extremely slow, even
if the unfiltered search is already cached.

schema.xml:
...

...

INFO: /select/ rows=25&start=0&q=((title:(test)+AND+is_starter:true)^8
+OR+pagetext:(test)^6+OR+title_pagetext:(test)^4+);+score+desc&fl=
+score,postid&qt=standard&stylesheet=&version=2.1 0 5

INFO: /select/ rows=25&start=0&fq=dateline:[0+TO
+1181237598]+&q=((title:(test)+AND+is_starter:true)^8+OR
+pagetext:(test)^6+OR+title_pagetext:(test)^4+);+score+desc&fl=
+score,postid&qt=standard&stylesheet=&version=2.1 0 79495

I currently run version:
Solr Specification Version: 1.1.2007.05.24.08.06.21
Solr Implementation Version: nightly - yonik - 2007-05-24 08:06:21
Lucene Specification Version: 2007-05-20_00-04-53
Lucene Implementation Version: build 2007-05-20
Tomcat: 6.0.10


Cheers,

Michael




-- 
Michael Thessel <[EMAIL PROTECTED]>
Gossamer Threads Inc. http://www.gossamer-threads.com/
Tel: (604) 687-5804 Fax: (604) 687-5806



Re: highlight and wildcards ?

2007-06-07 Thread Frédéric Glorieux

Same in my project. Chris does mention we can put a ? before the *, so instead 
of domin*, you can use domin?*, however that requires at least one char 
following your search string.


Right, it works well, and one char is a detail.

With "a?*" I get the documented lucene error
maxClauseCount is set to 1024



I know that some of my users will like to find big lists of words of 
phrases on common prefix, like "ante" for example.


I should evaluate RegexQuery.

--
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique


Re: how to crawl when Solr is search engine?

2007-06-07 Thread Mike Klaas

On 7-Jun-07, at 1:04 AM, Manoharam Reddy wrote:


Some musing:-
(I have used Nutch before and one thing I observed there was that if I
delete the crawl folder when Nutch is running, users can still search
and obtain proper results. It seems Nutch caches all the indexes in
the memory when it starts. I don't understand how is that feasible
when the size of the crawl is in the order of 10 GBs where as you have
a RAM + swap of only a few GBs.)


This is true also for Solr, because it is an OS feature: if you  
delete a file that is open by certain processes, it isn't really  
deleted at all (check disk usage stats).



How is Solr caching better than this?


It is unrelated. Solr can cache certain reusable components of  
queries (namely, filters), and provides for fully-customizable schema  
and arbitrary query execution on it.


-Mike


Re: Logging errors from multiple solr instances

2007-06-07 Thread Chris Hostetter

: Is this addressed in 1.2 or is running multiple instances of indexes
: such a Bad Idea that supporting this would be leading a fool further astray?

I still haven't had a chance to try it myself using Tomcat, but here's
what i found the last time someone asked about this...

http://www.nabble.com/Re%3A-separate-log-files-p8396579.html




-Hoss



TextField case sensitivity

2007-06-07 Thread Xuesong Luo
I run a problem when searching on a TextField. When I pass q=William or
q=WILLiam, solr is able to find records whose default search field value
is William, however if I pass q=WilliAm, solr did not return any thing.
I searched on the archive, Yonik mentioned the lowercasefilterfactory
doesn't work for wildcard because the QueryParser does not invoke
analysis for partial word, that makes sense. But in my case, it's a
whole word. Anyone knows why it's not working? Below is my schema info.

Thanks
Xuesong


  


  
  


  




Re: TextField case sensitivity

2007-06-07 Thread Yonik Seeley

On 6/7/07, Xuesong Luo <[EMAIL PROTECTED]> wrote:

I run a problem when searching on a TextField. When I pass q=William or
q=WILLiam, solr is able to find records whose default search field value
is William, however if I pass q=WilliAm, solr did not return any thing.


Sounds like WordDelimiterFilter is still being used for your fieldType.
After you changed the fieldType for "text", did you restart Solr and
re-index your collection?

-Yonik



I searched on the archive, Yonik mentioned the lowercasefilterfactory
doesn't work for wildcard because the QueryParser does not invoke
analysis for partial word, that makes sense. But in my case, it's a
whole word. Anyone knows why it's not working? Below is my schema info.

Thanks
Xuesong


  


  
  


  



Re: highlight and wildcards ?

2007-06-07 Thread Chris Hostetter

: With "a?*" I get the documented lucene error
: maxClauseCount is set to 1024

Which is why Solr converts PrefixQueries to ConstantScorePrefixQueries
that don't have that problem --the trade off being that they can't be
highlighted, and we're right back where we started.

It's a question of priorities.  In developing Solr, we prioritized
cosistent stability regardless of query or index characteristics and
highlighting of PrefxQueries suffered.  Working arround that decision by
using Wildcards may get highlighting working for you, but the stability
issue of the maxClauseCount is always going to be there (you can increase
maxClauseCount in the solrconfig, but there's always the chance the a user
will specify a wildcard that results in 1 more clause then you've
configured)

: I should evaluate RegexQuery.

for the record, i don't think that will help ... RegexQuery it works just
like WildcardQuery but with a differnet syntax -- it rewrites itself to a
BooleanQuery containing all of the Terms in the index that match your
regex.


-Hoss



Re: TextField case sensitivity

2007-06-07 Thread Ryan McKinley

have you taken a look the output from the admin/analysis?
http://localhost:8983/solr/admin/analysis.jsp?highlight=on

This lets you see what tokens are generated for index/query.  From your 
description, I'm suspicious that the generated tokens are actually:

 willi am

Also, if you want the same analyzer for indexing and query, just define one:



 




Xuesong Luo wrote:

I run a problem when searching on a TextField. When I pass q=William or
q=WILLiam, solr is able to find records whose default search field value
is William, however if I pass q=WilliAm, solr did not return any thing.
I searched on the archive, Yonik mentioned the lowercasefilterfactory
doesn't work for wildcard because the QueryParser does not invoke
analysis for partial word, that makes sense. But in my case, it's a
whole word. Anyone knows why it's not working? Below is my schema info.

Thanks
Xuesong


  



  
  


  







Re: filter query speed

2007-06-07 Thread Yonik Seeley

On 6/7/07, Michael Thessel <[EMAIL PROTECTED]> wrote:

I've got a problem with filtered queries. I have an index with about 8
million documents. I save a timestamp (not the time of indexing) for
each document as an integer field. Querying the index is pretty fast.
But when I filter on the timestamp the queries are extremely slow, even
if the unfiltered search is already cached.


Filters are cached independently of queries, but cached queries
consist of the sort *and* any applied filters.


INFO: /select/ rows=25&start=0&q=((title:(test)+AND+is_starter:true)^8
+OR+pagetext:(test)^6+OR+title_pagetext:(test)^4+);+score+desc&fl=
+score,postid&qt=standard&stylesheet=&version=2.1 0 5

INFO: /select/ rows=25&start=0&fq=dateline:[0+TO
+1181237598]+&q=((title:(test)+AND+is_starter:true)^8+OR
+pagetext:(test)^6+OR+title_pagetext:(test)^4+);+score+desc&fl=
+score,postid&qt=standard&stylesheet=&version=2.1 0 79495


I suspect that the endpoint to your dateline filter changes often,
hence caching is doing no good.  Is then endpoint (1181237598) derived
from the current time?
If so, there are some things you can do:
1) make it faster to generate a new filter by limiting the number of
terms in the dateline field (during indexing, always round it to the
nearest day)
2) allow solr to reuse previously generated filters more often by
rounding the dateline endpoint during query time.

You most likely want to do #2, and probably #1 (depending on how often
you commit new changes to the index).

-Yonik


Re: solr+hadoop = next solr

2007-06-07 Thread Mike Klaas

On 6-Jun-07, at 7:44 PM, Jeff Rodenburg wrote:

I've been exploring distributed search, as of late.  I don't know  
about the
"next solr" but I could certainly see a "distributed solr" grow out  
of such

an expansion.


I've implemented a highly-distributed search engine using Solr (200m  
docs and growing, 60+ servers).   It is not a Solr-based solution in  
the vein of FederatedSearch--it is a higher-level architecture that  
uses Solr as indexing nodes.  I'll note that it is a lot of work and  
would be even more work to develop in the generic extensible  
philosophy that Solr espouses.


It is not really suitable for contribution, unfortunately (being  
written in python and proprietary).


In terms of the FederatedSearch wiki entry (updated last year), has  
there
been any progress made this year on this topic, at least something  
worthy of
being added or updated to the wiki page?  Not to splinter efforts  
here, but
maybe a working group that was focused on that topic could help to  
move

things forward a bit.


I don't believe that absence of organization has been the cause of  
lack of forward progress on this issue, but simply that there has  
been no-one sufficiently interested and committed to prioritizing  
this huge task to work on it.  There is no need to form a working  
group (not when there are only a handful of active committers to  
begin with)--all interested people could just use solr-dev@ for  
discussion.


Solr is an open-source project, so huge features will get implemented  
when there is a person or group of people devoted to leading the  
charge on the issue.  If you're interested in being that person,  
that's great!


-Mike


Re: filter query speed

2007-06-07 Thread Michael Thessel
Hey Yoink,

thanks a lot for your quick reply.

> I suspect that the endpoint to your dateline filter changes often,
> hence caching is doing no good.  Is then endpoint (1181237598) derived
> from the current time?
Yes, it is.

> If so, there are some things you can do:
> 1) make it faster to generate a new filter by limiting the number of
> terms in the dateline field (during indexing, always round it to the
> nearest day)
> 2) allow solr to reuse previously generated filters more often by
> rounding the dateline endpoint during query time.
> 
> You most likely want to do #2, and probably #1 (depending on how often
> you commit new changes to the index).

I will give both of them a try. 

Is there a general speed problem with range searches in solr? It looks a bit 
strange for me, that a query for a term takes 5 ms while adding a filter to the 
same resultset takes 80s?

Cheers,

Michael


-- 
Michael Thessel <[EMAIL PROTECTED]>
Gossamer Threads Inc. http://www.gossamer-threads.com/
Tel: (604) 687-5804 Fax: (604) 687-5806



Re: filter query speed

2007-06-07 Thread Yonik Seeley

On 6/7/07, Michael Thessel <[EMAIL PROTECTED]> wrote:

Is there a general speed problem with range searches in solr? It looks a bit 
strange for me, that a query for a term takes 5 ms while adding a filter to the 
same resultset takes 80s?


It's completely dependent on the number of terms in the range.
The unit of indexing in lucene is the term, so finding docs for a
single term is fast.
There are many terms in a range though.

The algorithm is simply:
for every term in the range: collect the docs for that term

-Yonik


RE: TextField case sensitivity

2007-06-07 Thread Xuesong Luo
I have WordDelimiterFilter defined in the schema, I didn't include it in
my original email because I thought it doesn't matter. It seems it
matters. Looks like WilliAm is treated as two words. That's why it
didn't find a match.

Thanks
Xuesong

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Thursday, June 07, 2007 11:25 AM
To: solr-user@lucene.apache.org
Subject: Re: TextField case sensitivity

On 6/7/07, Xuesong Luo <[EMAIL PROTECTED]> wrote:
> I run a problem when searching on a TextField. When I pass q=William
or
> q=WILLiam, solr is able to find records whose default search field
value
> is William, however if I pass q=WilliAm, solr did not return any
thing.

Sounds like WordDelimiterFilter is still being used for your fieldType.
After you changed the fieldType for "text", did you restart Solr and
re-index your collection?

-Yonik


> I searched on the archive, Yonik mentioned the lowercasefilterfactory
> doesn't work for wildcard because the QueryParser does not invoke
> analysis for partial word, that makes sense. But in my case, it's a
> whole word. Anyone knows why it's not working? Below is my schema
info.
>
> Thanks
> Xuesong
>
>  positionIncrementGap="100">
>   
> 
> 
>   
>   
> 
> 
>   
> 




RE: TextField case sensitivity

2007-06-07 Thread Xuesong Luo
Ryan, you are right, that's the problem. WilliAM is treated as two words
by the WordDelimiterFilterFactory.

Thanks
Xuesong

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 07, 2007 11:30 AM
To: solr-user@lucene.apache.org
Subject: Re: TextField case sensitivity

have you taken a look the output from the admin/analysis?
http://localhost:8983/solr/admin/analysis.jsp?highlight=on

This lets you see what tokens are generated for index/query.  From your 
description, I'm suspicious that the generated tokens are actually:
  willi am

Also, if you want the same analyzer for indexing and query, just define
one:


 
  




Xuesong Luo wrote:
> I run a problem when searching on a TextField. When I pass q=William
or
> q=WILLiam, solr is able to find records whose default search field
value
> is William, however if I pass q=WilliAm, solr did not return any
thing.
> I searched on the archive, Yonik mentioned the lowercasefilterfactory
> doesn't work for wildcard because the QueryParser does not invoke
> analysis for partial word, that makes sense. But in my case, it's a
> whole word. Anyone knows why it's not working? Below is my schema
info.
> 
> Thanks
> Xuesong
> 
>  positionIncrementGap="100">
>   
> 
> 
>   
>   
> 
> 
>   
> 
> 
> 




Re: solr+hadoop = next solr

2007-06-07 Thread Jeff Rodenburg

Mike - thanks for the comments.  Some responses added below.

On 6/7/07, Mike Klaas <[EMAIL PROTECTED]> wrote:



I've implemented a highly-distributed search engine using Solr (200m
docs and growing, 60+ servers).   It is not a Solr-based solution in
the vein of FederatedSearch--it is a higher-level architecture that
uses Solr as indexing nodes.  I'll note that it is a lot of work and
would be even more work to develop in the generic extensible
philosophy that Solr espouses.



Yeah, we've done the same thing in the .Net world, and it's a tough slog.
We're in the same situation -- making our solution generically extensible is
pretty much a non-starter.


In terms of the FederatedSearch wiki entry (updated last year), has
> there
> been any progress made this year on this topic, at least something
> worthy of
> being added or updated to the wiki page?  Not to splinter efforts
> here, but
> maybe a working group that was focused on that topic could help to
> move
> things forward a bit.

I don't believe that absence of organization has been the cause of
lack of forward progress on this issue, but simply that there has
been no-one sufficiently interested and committed to prioritizing
this huge task to work on it.  There is no need to form a working
group (not when there are only a handful of active committers to
begin with)--all interested people could just use solr-dev@ for
discussion.



That makes sense, just didn't want to bombard the list with the subject if
it was a detractor from the core project, i.e. keep lucene messages on
lucene, solr messages on solr, etc.  The good-community-participant
approach, if you will.

Solr is an open-source project, so huge features will get implemented

when there is a person or group of people devoted to leading the
charge on the issue.  If you're interested in being that person,
that's great!



Glad to jump in, not sure I qualify as such for that, but certainly a big
cheerleader nonetheless.


DisMax request handler doesn't work with stopwords?

2007-06-07 Thread Casey Durfee
 
It appears that if your search terms include stopwords and you use the DisMax 
request handler, you get no results whereas the same search with the standard 
request handler does give you results.  Is this a bug or by design?
 
Thanks,
 
--Casey
 


Re: TextField case sensitivity

2007-06-07 Thread Mike Klaas


On 7-Jun-07, at 1:04 PM, Xuesong Luo wrote:

Ryan, you are right, that's the problem. WilliAM is treated as two  
words

by the WordDelimiterFilterFactory.


I have found this behaviour a little too aggresive for my needs, so i  
added an option to disable it.  Patch is here:

http://issues.apache.org/jira/browse/SOLR-257

I'll probably commit it in a day or so, at which point it will be  
part of the Solr nightly build.


-Mike


Re: solr+hadoop = next solr

2007-06-07 Thread Rafael Rossini

Hi, Jeff and Mike.

  Would you mind telling us about the architecture of your solutions a
little bit? Mike, you said that you implemented a highly-distributed search
engine using Solr as indexing nodes. What does that mean? You guys
implemented a master, multi-slave solution for replication? Or the whole
index shards for high availability and fail over?


On 6/7/07, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:


Mike - thanks for the comments.  Some responses added below.

On 6/7/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
>
>
> I've implemented a highly-distributed search engine using Solr (200m
> docs and growing, 60+ servers).   It is not a Solr-based solution in
> the vein of FederatedSearch--it is a higher-level architecture that
> uses Solr as indexing nodes.  I'll note that it is a lot of work and
> would be even more work to develop in the generic extensible
> philosophy that Solr espouses.


Yeah, we've done the same thing in the .Net world, and it's a tough slog.
We're in the same situation -- making our solution generically extensible
is
pretty much a non-starter.

> In terms of the FederatedSearch wiki entry (updated last year), has
> > there
> > been any progress made this year on this topic, at least something
> > worthy of
> > being added or updated to the wiki page?  Not to splinter efforts
> > here, but
> > maybe a working group that was focused on that topic could help to
> > move
> > things forward a bit.
>
> I don't believe that absence of organization has been the cause of
> lack of forward progress on this issue, but simply that there has
> been no-one sufficiently interested and committed to prioritizing
> this huge task to work on it.  There is no need to form a working
> group (not when there are only a handful of active committers to
> begin with)--all interested people could just use solr-dev@ for
> discussion.


That makes sense, just didn't want to bombard the list with the subject if
it was a detractor from the core project, i.e. keep lucene messages on
lucene, solr messages on solr, etc.  The good-community-participant
approach, if you will.

Solr is an open-source project, so huge features will get implemented
> when there is a person or group of people devoted to leading the
> charge on the issue.  If you're interested in being that person,
> that's great!
>
>
Glad to jump in, not sure I qualify as such for that, but certainly a big
cheerleader nonetheless.



Re: DisMax request handler doesn't work with stopwords?

2007-06-07 Thread Chris Hostetter

: It appears that if your search terms include stopwords and you use the
: DisMax request handler, you get no results whereas the same search with
: the standard request handler does give you results.  Is this a bug or by
: design?

dismax works just fine with stop words ... can you give a specific
example url?  what does the query toString look like when you use
debugQuery?




-Hoss



Re: DisMax request handler doesn't work with stopwords?

2007-06-07 Thread Casey Durfee
Sure thing.  I downloaded the latest version of Solr, started up the example 
server, and indexed the ipod_other.xml file.  The following URLs give a result:
 
http://localhost:8983/solr/select/?q=ipod 
http://localhost:8983/solr/select/?q=the+ipod 
http://localhost:8983/solr/select/?q=ipod&qt=dismax 
The following URL does not:
http://localhost:8983/solr/select/?q=the+ipod&qt=dismax 
 
the toString in the last case is:
 
+(((cat:the^1.4 | id:the^10.0)~0.01 (text:ipod^0.5 | cat:ipod^1.4 | 
features:ipod | name:ipod^1.2 | sku:ipod^1.5 | manu:ipod^1.1 | 
id:ipod^10.0)~0.01)~2) (text:ipod^0.2 | manu:ipod^1.4 | name:ipod^1.5 | 
manu_exact:the ipod^1.9 | features:ipod^1.1)~0.01 
(org.apache.solr.search.function.OrdFieldSource:ord(poplarity))^0.5 
(org.apache.solr.search.function.ReciprocalFloatFunction:1000.0/(1.0*float(rord(price))+1000.0))^0.3
 

>>> Chris Hostetter <[EMAIL PROTECTED]> 6/7/2007 2:12 PM >>>

: It appears that if your search terms include stopwords and you use the
: DisMax request handler, you get no results whereas the same search with
: the standard request handler does give you results.  Is this a bug or by
: design?

dismax works just fine with stop words ... can you give a specific
example url?  what does the query toString look like when you use
debugQuery?




-Hoss



Re: DisMax request handler doesn't work with stopwords?

2007-06-07 Thread Mike Klaas

On 7-Jun-07, at 1:41 PM, Casey Durfee wrote:

It appears that if your search terms include stopwords and you use  
the DisMax request handler, you get no results whereas the same  
search with the standard request handler does give you results.  Is  
this a bug or by design?


There is a subtlety with stopwords and dismax.  Imagine a search  
"what's in python", using a typical analyzer with stopwords for  
fields such as title, inlinks, rawText, but a more restrictive  
analyzer for fields such as url, that have no stopwords.

For the above search using the following weight function

title^1.2 inlinks^1.4 rawText^1.0
produces the following parsed query string

+(
  (
   (rawText:what | inlinks:what^1.4 | title:what^1.2)~0.01
   (rawText:python | inlinks:python^1.4 | title:python^1.2)~0.01
  )~2
 )
 (rawText:"what python"~5 | inlinks:"what python"~5^1.4 |  
title:"what python"~5^1.2)~0.01

while the same query with a weight function of

title^1.2 inlinks^1.4 rawText^1.0 url^1.0
produces this query string

+(
  (
   (rawText:what | url:what | inlinks:what^1.4 | title:what^1.2)~0.01
   (url:in)~0.01
   (rawText:python | url:python | inlinks:python^1.4 |  
title:python^1.2)~0.01

  )~3
 )
 (rawText:"what python"~5 | url:"what in python"~5 | inlinks:"what  
python"~5^1.4 | title:"what python"~5^1.2)~0.01
Note the latter includes a term (url:in)~0.01 on its own. This  
interacts poorly when using a high mm (minimum #clauses match)  
setting with dismax, as it effectively requires 'in' to be in the url  
column, which was probably not the intent of the query.


-Mike


Re: DisMax request handler doesn't work with stopwords?

2007-06-07 Thread Casey Durfee
Thank you!  That makes sense.
 
--Casey

>>> Mike Klaas <[EMAIL PROTECTED]> 6/7/2007 2:35 PM >>>
On 7-Jun-07, at 1:41 PM, Casey Durfee wrote:

> It appears that if your search terms include stopwords and you use  
> the DisMax request handler, you get no results whereas the same  
> search with the standard request handler does give you results.  Is  
> this a bug or by design?

There is a subtlety with stopwords and dismax.  Imagine a search  
"what's in python", using a typical analyzer with stopwords for  
fields such as title, inlinks, rawText, but a more restrictive  
analyzer for fields such as url, that have no stopwords.
For the above search using the following weight function

title^1.2 inlinks^1.4 rawText^1.0
produces the following parsed query string

+(
   (
(rawText:what | inlinks:what^1.4 | title:what^1.2)~0.01
(rawText:python | inlinks:python^1.4 | title:python^1.2)~0.01
   )~2
  )
  (rawText:"what python"~5 | inlinks:"what python"~5^1.4 |  
title:"what python"~5^1.2)~0.01
while the same query with a weight function of

title^1.2 inlinks^1.4 rawText^1.0 url^1.0
produces this query string

+(
   (
(rawText:what | url:what | inlinks:what^1.4 | title:what^1.2)~0.01
(url:in)~0.01
(rawText:python | url:python | inlinks:python^1.4 |  
title:python^1.2)~0.01
   )~3
  )
  (rawText:"what python"~5 | url:"what in python"~5 | inlinks:"what  
python"~5^1.4 | title:"what python"~5^1.2)~0.01
Note the latter includes a term (url:in)~0.01 on its own. This  
interacts poorly when using a high mm (minimum #clauses match)  
setting with dismax, as it effectively requires 'in' to be in the url  
column, which was probably not the intent of the query.

-Mike


What logging facility shoould in my Solr plugin?

2007-06-07 Thread Teruhiko Kurosaka
I see Solr uses the JDK java.util.logging.Logger.
I should also be using this Logger when I write
a plugin, correct?

I am asking only because I see commons-logging.jar
in apache-solr-1.1.0-incubating/example/ext
What is this for?

-kuro 


Re: What logging facility shoould in my Solr plugin?

2007-06-07 Thread Ryan McKinley

Teruhiko Kurosaka wrote:

I see Solr uses the JDK java.util.logging.Logger.
I should also be using this Logger when I write
a plugin, correct?



You can use which ever logging you like ;)  solr uses JDK logging.  If 
you want to contribute the plugin back to solr, it will need to use JDK 
logging...




I am asking only because I see commons-logging.jar
in apache-solr-1.1.0-incubating/example/ext
What is this for?



that was for the example jetty logging.  the (brand) new release 1.2 
does not include commons logging.


-kuro 





Re: highlight and wildcards ?

2007-06-07 Thread Frédéric Glorieux

Hoss,

Thanks for all your information and pointers. I know that my problems 
are not mainstream.


ConstantScoreQuery @author yonik
  public void extractTerms(Set terms) {
// OK to not add any terms when used for MultiSearcher,
// but may not be OK for highlighting
  }
ConstantScoreRangeQuery @author yonik
ConstantScorePrefixQuery @author yonik

May be a kind of ConstantScoreRegexQuery will be a part of my solution
for things like "(ante|post).*" (our users are linguists).

Score will be lost, but this is not a problem for this kind of users who 
want to read all matches of a pattern. For an highlighter , I should 
investigate in your code, to see where the regexp could be plugged, 
without losing analysers (that we also need, nothing is simple).


--
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique




: With "a?*" I get the documented lucene error
: maxClauseCount is set to 1024

Which is why Solr converts PrefixQueries to ConstantScorePrefixQueries
that don't have that problem --the trade off being that they can't be
highlighted, and we're right back where we started.

It's a question of priorities.  In developing Solr, we prioritized
cosistent stability regardless of query or index characteristics and
highlighting of PrefxQueries suffered.  Working arround that decision by
using Wildcards may get highlighting working for you, but the stability
issue of the maxClauseCount is always going to be there (you can increase
maxClauseCount in the solrconfig, but there's always the chance the a user
will specify a wildcard that results in 1 more clause then you've
configured)

: I should evaluate RegexQuery.

for the record, i don't think that will help ... RegexQuery it works just
like WildcardQuery but with a differnet syntax -- it rewrites itself to a
BooleanQuery containing all of the Terms in the index that match your
regex.


-Hoss





Re: highlight and wildcards ?

2007-06-07 Thread Mike Klaas

On 7-Jun-07, at 5:27 PM, Frédéric Glorieux wrote:


Hoss,

Thanks for all your information and pointers. I know that my  
problems are not mainstream.


Have you tried commenting out getPrefixQuery in  
solr.search.SolrQueryParser?  It should then revert to a "regular"  
lucene prefix query.


-Mike