RE: Multiple Solr Webapps

2007-12-18 Thread Pierre-Yves LANDRON

Hello,

Thanks for your answer.

Sorry for the redunding posts, I genuinely thought that my posts haven't been 
sent to the mailing list, because I haven't received them : but it seems 
attributable to my hotmail account. I'm really sorry for the convenience.

Pierre-Yves Landron

> Date: Mon, 17 Dec 2007 11:19:00 -0500
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> Subject: Re: Multiple Solr Webapps
> 
> Pierre-Yves LANDRON wrote:
> > Hello,
> > 
> > I've got this dumb problem. I've tried to browse the mailing list archive, 
> > but there are way too much messages (btw, is there a way to "fullsearch" 
> > the archives ?)... 
> > 
> 
> try:
> http://www.nabble.com/Solr-f14479.html
> 
> > I'm trying to deploy several solr instance on my linux server, following 
> > the solr wiki instruction : I've created TWO context fragment files 
> > (solr1.xml solr2.xml), each one pointing on a different solr directory ( 
> > and ) and the same solr.war ( and ) to have it working fine. I would prefer 
> > to have only one instance of solr.war, as specified in the solr wiki ( 
> > http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac
> >  ).
> > 
> 
> With resin, I have one .war that is exploded for multiple web-apps. 
> This works fine -- i have not tried with tomcat.
> 
> ryan
> 

_
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

Re: does solr handle hierarchical facets?

2007-12-18 Thread Chris Hostetter

: This approach works (I do a similar thing using solr), but you have to be
: careful as BooleanQuery.TooManyClauses exception can be thrown depending where
: you use the wild card. It should be fine in the case you described however.

You'll never get a TooManyClauses from a prefix or range query in Solr, it 
uses Filters instead of the default query classes.

: > Why not just use the whole path as the unique identifying token for a given
: > node on the hierarchy?   That way, you don't need to map nodes to unique
: > numbers, just use a prefix query.

personally, i don't create a numeric->text mapping just to do this ... i 
already have the numeric mappings in my data, so it's *easier* for me to 
use the numericIds in Solr (one really key reason to do things this way 
is that i can change the name of a topic without reindexing every 
document mapped to that topic)

Term queries should also be a little faster then prefix queries, but i 
won't swear to that.  If you do go the prefix query route, make sure to 
leave on a trailing marker character so you don't run the risk of one 
Topic name being the prefix of a sister topic name...

taxonomy:Place/NorthAmerica/USA/Washington* will match 
Place/NorthAmerica/USA/Washington/Seattle/ and 
Place/NorthAmerica/USA/WashingtonDC/ ... so use 
taxonomy:Place/NorthAmerica/USA/Washington/*


-Hoss



solr field types and case sensitivity

2007-12-18 Thread Dryganets Sergey

can I change query analyzer for concrete request to solr?
ie: I want add option on my site use case-sensitive search or not for this
search request, but can't find any good solution ...

I think that create duplicates (index only fields with different analyzers
configuration) for each field it's bad idea ...

May be any one know good solution for this problem?

-- 
View this message in context: 
http://www.nabble.com/solr-field-types-and-case-sensitivity-tp14395912p14395912.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr field types and case sensitivity

2007-12-18 Thread Ryan McKinley

Dryganets Sergey wrote:

can I change query analyzer for concrete request to solr?
ie: I want add option on my site use case-sensitive search or not for this
search request, but can't find any good solution ...

I think that create duplicates (index only fields with different analyzers
configuration) for each field it's bad idea ...



yes, you would index a field twice - once with a LowerCaseFilter and 
once without.  That is a good solution.


ryan


Max. number of Error messages

2007-12-18 Thread Jae Joo
Is there any parameter to set the max. number of error messages..
The Solr system was killed after a couple of error messages which caused by
WRONG QUERY

Thanks,

Jae


Re: Newbie question about Solr use in web applications

2007-12-18 Thread George Everitt


On Dec 14, 2007, at 9:55 AM, Stuart Sierra wrote:


On Dec 13, 2007 9:20 PM, solruser2 <[EMAIL PROTECTED]> wrote:
Let's say I have a database containing people, groups, and projects  
(these
all have different fields). I want to index these different kinds  
of objects
with a view to eventually present search results from all three  
types mashed
together and sorted by relevance. Using separate indices (and thus  
separate
Solr processes) would make mashing the results together very  
difficult so
I'm guessing I just add the separate fields to the schema along  
with an

'object_type' field or equivalent?


That is the approach I would take.  Having three separate indices
would make your searches slower and more complicated.


I agree.




Secondly should I just store the database row id for each object  
(while
still indexing the field contents) so a query on the index returns  
a list of

id's that I can then fetch from the database?


It depends. :)  If you want highlighted snippets in your search
results, then you have to store the field contents in the index.  In
some situations you can make your search pages faster by storing all
the critical fields (the ones you want to appear in search results) in
the index, so that you don't have to fetch a dozen records from the
database just to display a list of search results.  On the other hand,
if your database records are small and you don't need highlighting, it
may be faster to only store database ID's in the index.



I agree with this also.   However, I've never seen a case where a  
separate database query to retrieve metadata stored in a database will  
be faster than just storing the necessary fields directly in the  
search index and retrieving them with the search results.I've  
found it helpful to think of the full-text index as a very simple,  
very fast, very flat database engine.  You may not be able to do outer  
joins and correlated subqueries on it, but you can get a list of  
documents and titles really fast.



Hope this sheds some light,
-Stuart Sierra
AltLaw.org




Re: retrieve lucene "doc id"

2007-12-18 Thread Otis Gospodnetic
Hi Lance,

You said:
We use the standard (some RFC) text representation of 32 hex
 characters.
This has the advantage that F* pulls 1/16 of the total index, with a
completely randomized distribution, F**  1/256, etc.  This is very
 handy
for data analysis and document extraction. 

Could you elaborate on the last sentence?  Maybe give an example of what you 
have in mind?
Are you thinking that this, because of uniform distribution, lets you easily 
get a subset of documents of predictable size and thus have an apriori 
knowledge of how large of a data set you'll get and work with?  Or something 
else?

Thanks,
Otis

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: "Norskog, Lance" <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, December 17, 2007 2:43:55 PM
Subject: RE: retrieve lucene "doc id"

We are using MD5 to generate our IDs. MD5s are 128 bits creating a very
unique and very randomized number for the content. Nobody has ever
reported two different data sets that create the same MD5.

We use the standard (some RFC) text representation of 32 hex
 characters.
This has the advantage that F* pulls 1/16 of the total index, with a
completely randomized distribution, F**  1/256, etc.  This is very
 handy
for data analysis and document extraction. 

MD5 creates 128 bits, but if your index is small enough that you are
willing to risk it, you could pick 64 bits and park them in a Java
 long.

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 17, 2007 8:15 AM
To: solr-user@lucene.apache.org
Subject: Re: retrieve lucene "doc id"

Yonik Seeley wrote:
> On Dec 17, 2007 1:40 AM, Ben Incani <[EMAIL PROTECTED]>
wrote:
>> I have converted to using the Solr search interface and I am trying 
>> to retrieve documents from a list of search results (where
 previously

>> I had used the doc id directly from the lucene query results) and
 the

>> solr id I have got currently indexed is unfortunately configured not
be unique!
> 
> Ouch... I'd try to make a unique Id then!
> Or barring that, just try to make the query match exactly the docs
 you

> want back (don't do the 2 phase thing).
> 

In 1.3-dev, you can use UUIDField to have solr generate a UUID for each
doc.

ryan





Making stemming dynamic at query time

2007-12-18 Thread Kamran Shadkhast

Stemming both at indexing and query time will be controled by solrconfig.xml,
but I think it would be great if we could dynamiclly control this during
search if we want to search with stemming or not. any thought? feasibilty?
...
Thanks,
-Kamran

-- 
View this message in context: 
http://www.nabble.com/Making-stemming-dynamic-at-query-time-tp14405260p14405260.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Issues with postOptimize

2007-12-18 Thread Sunny Bassan
I've set the permissions on the script to execute for all users. And it
does seem like the user who is running SOLR has the permissions to run
the script. I've come to the conclusion - Linux permissions are
annoying, lol. I've also tried setting selinux to permissive mode and
added the user to the sudoers file, but this has not fixed the issue.
The only thing that does work is croning the script to run after the
optimize script.
 
Sunny 


RE: retrieve lucene "doc id"

2007-12-18 Thread Norskog, Lance
Exactly.  We have done some projects where we extract records en masse.
With this technique we can make a query that will fetch exactly 3000
+-50  records, and walk through every 50 records using the query as a
filter. Works pretty well.

Lance

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 18, 2007 11:07 AM
To: solr-user@lucene.apache.org
Subject: Re: retrieve lucene "doc id"

Hi Lance,

You said:
We use the standard (some RFC) text representation of 32 hex
characters.
This has the advantage that F* pulls 1/16 of the total index, with a
completely randomized distribution, F**  1/256, etc.  This is very
handy for data analysis and document extraction. 

Could you elaborate on the last sentence?  Maybe give an example of what
you have in mind?
Are you thinking that this, because of uniform distribution, lets you
easily get a subset of documents of predictable size and thus have an
apriori knowledge of how large of a data set you'll get and work with?
Or something else?

Thanks,
Otis

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: "Norskog, Lance" <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, December 17, 2007 2:43:55 PM
Subject: RE: retrieve lucene "doc id"

We are using MD5 to generate our IDs. MD5s are 128 bits creating a very
unique and very randomized number for the content. Nobody has ever
reported two different data sets that create the same MD5.

We use the standard (some RFC) text representation of 32 hex
characters.
This has the advantage that F* pulls 1/16 of the total index, with a
completely randomized distribution, F**  1/256, etc.  This is very
handy for data analysis and document extraction. 

MD5 creates 128 bits, but if your index is small enough that you are
willing to risk it, you could pick 64 bits and park them in a Java
long.

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED]
Sent: Monday, December 17, 2007 8:15 AM
To: solr-user@lucene.apache.org
Subject: Re: retrieve lucene "doc id"

Yonik Seeley wrote:
> On Dec 17, 2007 1:40 AM, Ben Incani <[EMAIL PROTECTED]>
wrote:
>> I have converted to using the Solr search interface and I am trying 
>> to retrieve documents from a list of search results (where
 previously

>> I had used the doc id directly from the lucene query results) and
 the

>> solr id I have got currently indexed is unfortunately configured not
be unique!
> 
> Ouch... I'd try to make a unique Id then!
> Or barring that, just try to make the query match exactly the docs
 you

> want back (don't do the 2 phase thing).
> 

In 1.3-dev, you can use UUIDField to have solr generate a UUID for each
doc.

ryan





Re: solr field types and case sensitivity

2007-12-18 Thread Dryganets Sergey



ryantxu wrote:
> 
> yes, you would index a field twice - once with a LowerCaseFilter and 
> once without.  That is a good solution.
> 

Hm... 
So I'm should create n*n indexes where n is search options count ...

Can I copy fields automatically?  

For example I have a field with name  and subset of fields with
prefixes or suffixes, so
can I use regexp to copy field.

Or may be I can describe "copy field policy" for a fieldType (as for me this
solution will be better - there are less efforts to add new search option)

-- 
View this message in context: 
http://www.nabble.com/solr-field-types-and-case-sensitivity-tp14395912p14411420.html
Sent from the Solr - User mailing list archive at Nabble.com.



query string gets truncated

2007-12-18 Thread Kasi Sankaralingam
Hi ,

I have a text type dynamic field, in SOLR admin when I enter a name_t:are, 
query string are is not valid, I have
Posted the debug info from the SOLR admin below:

-
 
-
 
  0
  0
-
 
  standard
  10
  
  0
  
  on
  *,score
  name_t:are
  on
  standard
  2.2
  
  
  
-
 
  name_t:are
  name_t:are
  
  
  
  
  

Parsed Query happens to be nothing, if I add an a to the query string like area 
then parsedquerystring points to area.
I turned off port stemming and it still does not work ( I guess I need to 
re-index again turning off port stemming)

Any ideas?

Thanks a lot,kasi


Re: Making stemming dynamic at query time

2007-12-18 Thread Bertrand Delacretaz
On Dec 18, 2007 9:41 PM, Kamran Shadkhast <[EMAIL PROTECTED]> wrote:

> ...it would be great if we could dynamiclly control this during
> search if we want to search with stemming or not

The easiest is probably to have two copies of your field, using
, one stemmed and one not, and search in one or the other.

-Bertrand