How to use Similarity

2007-04-19 Thread James liu

i uncoment
Similarity  in Schema.xmland start tomcat

i use admin gui to test and find it not effect.

maybe something is wrong, anyone know?

--
regards
jl


AW: Leading wildcards

2007-04-19 Thread Burkamp, Christian
Hi there,

Solr does not support leading wildcards, because it uses Lucene's standard 
QueryParser class without changing the defaults. You can easily change this by 
inserting the line

parser.setAllowLeadingWildcards(true);

in QueryParsing.java line 92. (This is after creating a QueryParser instance in 
QueryParsing.parseQuery(...))

and it obviously means that you have to change solr's source code. It would be 
nice to have an option in the schema to switch leading wildcards on or off per 
field. Leading wildcards really make no sense on richly populated fields 
because queries tend to result in too many clauses exceptions most of the time.

This works for leading wildcards. Unfortunately it does not enable searches 
with leading AND trailing wildcards. (E.g. searching for "*lega*" does not find 
results even if the term "elegance" is in the index. If you put a second 
asterisk at the end, the term "elegance" is found. (search for "*lega**" to get 
hits).
Can anybody explain this though it seems to be more of a lucene QueryParser 
issue?

-- Christian

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Gesendet: Donnerstag, 19. April 2007 08:35
An: solr-user@lucene.apache.org
Betreff: Leading wildcards


hi,

we have been trying to get the leading wildcards to work.

we have been looking around the Solr website, the Lucene website, wiki's 
and the mailing lists etc ...
but we found a lot of contradictory information.

so we have a few question : 
- is the latest version of lucene capable of handling leading wildcards ? 
- is the latest version of solr capable of handling leading wildcards ?
- do we need to make adjustments to the solr source code ?
- if we need to adjust the solr source, what do we need to change ?

thanks in advance !
Maarten



Re: Leading wildcards

2007-04-19 Thread Michael Kimsal

I've investigated this recently, and it looks like the latest lucene dev
supposedly supports leading/trailing at the same time.  However, I couldn't
get the latest dev solr to build with the latest dev lucene (as of two weeks
ago).  A lucene mailing list seemed to indicate that lucene as of the last
official build support both leading/trailing at the same time, but it then
seemed to indicate that it was a 'in development branch only' state still.
I can't find that thread, but that's my understanding of the current
situation.  It's bugged us a little bit, because it's something that we need
(to be able to emulate the previous foo LIKE '%bar%' SQL behaviour we're
replacing), but can't offer our users yet.

On 4/19/07, Burkamp, Christian <[EMAIL PROTECTED]> wrote:


Hi there,

Solr does not support leading wildcards, because it uses Lucene's standard
QueryParser class without changing the defaults. You can easily change this
by inserting the line

parser.setAllowLeadingWildcards(true);

in QueryParsing.java line 92. (This is after creating a QueryParser
instance in QueryParsing.parseQuery(...))

and it obviously means that you have to change solr's source code. It
would be nice to have an option in the schema to switch leading wildcards on
or off per field. Leading wildcards really make no sense on richly populated
fields because queries tend to result in too many clauses exceptions most of
the time.

This works for leading wildcards. Unfortunately it does not enable
searches with leading AND trailing wildcards. (E.g. searching for "*lega*"
does not find results even if the term "elegance" is in the index. If you
put a second asterisk at the end, the term "elegance" is found. (search for
"*lega**" to get hits).
Can anybody explain this though it seems to be more of a lucene
QueryParser issue?

-- Christian

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Gesendet: Donnerstag, 19. April 2007 08:35
An: solr-user@lucene.apache.org
Betreff: Leading wildcards


hi,

we have been trying to get the leading wildcards to work.

we have been looking around the Solr website, the Lucene website, wiki's
and the mailing lists etc ...
but we found a lot of contradictory information.

so we have a few question :
- is the latest version of lucene capable of handling leading wildcards ?
- is the latest version of solr capable of handling leading wildcards ?
- do we need to make adjustments to the solr source code ?
- if we need to adjust the solr source, what do we need to change ?

thanks in advance !
Maarten





--
Michael Kimsal
http://webdevradio.com


Re: Facet Browsing

2007-04-19 Thread Jennifer Seaman



> I can't seem to get things like facet.mincount to work.
We had same issue when we used Solr incubator version.
Now we are using trunk version of Solr and the issue was gone.


Where is this "trunk version"? I thought 
apache-solr-1.1.0-incubating.zip was the latest release. Can anyone 
provide a quick tutorial on how to setup facet browsing? After a 
keyword search I just want to allow the user to narrow the results by 
category, then by state, then by city and then by company.


Some sample code would be appreciated.

Thank you.
Jennifer Seaman 



Re: Facet Browsing

2007-04-19 Thread Yonik Seeley

On 4/19/07, Jennifer Seaman <[EMAIL PROTECTED]> wrote:

> > I can't seem to get things like facet.mincount to work.
>We had same issue when we used Solr incubator version.
>Now we are using trunk version of Solr and the issue was gone.

Where is this "trunk version"?


"trunk" is a reference to the source code control system, subversion.
The trunk has the latest source files, and hence represents the latest
(potentially unstable) development version.

See the second link on the wiki
"Download newest Solr nightly build"
http://wiki.apache.org/solr/FrontPage

-Yonik


Re: Snapshooting or replicating recently indexed data

2007-04-19 Thread Yonik Seeley

On 4/19/07, Doss <[EMAIL PROTECTED]> wrote:

It seems the snapshooter  takes the exact copy of the indexed data, that is all 
the contents inside the index directory,  how can we take the recently added 
once?
...
cp -lr ${data_dir}/index ${temp}
mv ${temp} ${name} ...



I don't quite understand your question, but since hard links are used,
it's more like pointing to the index files instead of copying them.
Rsync is used as a transport to only move the files that were changed
from the master to slaves.

-Yonik


Re: Leading wildcards

2007-04-19 Thread Erik Hatcher


On Apr 19, 2007, at 6:56 AM, Michael Kimsal wrote:

It's bugged us a little bit, because it's something that we need
(to be able to emulate the previous foo LIKE '%bar%' SQL behaviour  
we're

replacing), but can't offer our users yet.


I have also run into this issue and have intended to fix up Solr to  
allow configuring that switch on QueryParser.  I'll eventually get to  
this, but someone supply a patch with a test case would get it done  
sooner.


I must, however, caveat discussion of leading wildcards with the  
underlying effect you get.  If you use standard analysis and perform  
a leading wildcard query, you incur a (possibly) dramatic hit in  
terms of performance.  Lucene has to scan *every* term in the  
specified field.  In fact, with my 3.7M index, a fuzzy query for the  
very same reason, kills the query.  There is also a switch on fuzzy  
query that needs to be configurable through Solr, to adjust the  
number of leading characters that are fixed to avoid this all term  
scanning.


There are techniques that can be used to improve the performance of  
in-string types of queries like this, at the expense of indexing time  
and size and clever query creation.   One such technique I've used  
successfully is term rotation enumeration (cat => cat$, at$c, t 
$ca).   This involves custom analyzers and query creation.


Once Solr supports this switch, you may find performance fine with  
leading wildcard queries, but at least be forewarned that there are  
scalability skeletons in this closet.


Erik



Re: Leading wildcards

2007-04-19 Thread Michael Kimsal

Agreed, but in our tests (100M index) it wasn't a performance hit, and much
better (as in it actually worked) than MSSQL  ;)



On 4/19/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:



On Apr 19, 2007, at 6:56 AM, Michael Kimsal wrote:
> It's bugged us a little bit, because it's something that we need
> (to be able to emulate the previous foo LIKE '%bar%' SQL behaviour
> we're
> replacing), but can't offer our users yet.

I have also run into this issue and have intended to fix up Solr to
allow configuring that switch on QueryParser.  I'll eventually get to
this, but someone supply a patch with a test case would get it done
sooner.

I must, however, caveat discussion of leading wildcards with the
underlying effect you get.  If you use standard analysis and perform
a leading wildcard query, you incur a (possibly) dramatic hit in
terms of performance.  Lucene has to scan *every* term in the
specified field.  In fact, with my 3.7M index, a fuzzy query for the
very same reason, kills the query.  There is also a switch on fuzzy
query that needs to be configurable through Solr, to adjust the
number of leading characters that are fixed to avoid this all term
scanning.

There are techniques that can be used to improve the performance of
in-string types of queries like this, at the expense of indexing time
and size and clever query creation.   One such technique I've used
successfully is term rotation enumeration (cat => cat$, at$c, t
$ca).   This involves custom analyzers and query creation.

Once Solr supports this switch, you may find performance fine with
leading wildcard queries, but at least be forewarned that there are
scalability skeletons in this closet.

Erik





--
Michael Kimsal
http://webdevradio.com


Re: Leading wildcards

2007-04-19 Thread Yonik Seeley

On 4/19/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:

parser.setAllowLeadingWildcards(true);


I have also run into this issue and have intended to fix up Solr to
allow configuring that switch on QueryParser.


Any reason that parser.setAllowLeadingWildcards(true) shouldn't be the default?
Does it really need to be configurable?

-Yonik


Re: Facet Browsing

2007-04-19 Thread Erik Hatcher


On Apr 19, 2007, at 9:32 AM, Jennifer Seaman wrote:
Can anyone provide a quick tutorial on how to setup facet browsing?  
After a keyword search I just want to allow the user to narrow the  
results by category, then by state, then by city and then by company.


Some sample code would be appreciated.


At the moment your best bet will be Solr's excellent wiki on faceted  
browsing: http://wiki.apache.org/solr/SimpleFacetParameters


How you build your app to interact with Solr is going to be unique to  
your situation, so an exact example won't be handy, but you can infer  
a lot.   If you've got "string" indexed fields for category, state,  
city, and company for documents in the index, then you'll first make  
a query asking for the category facet back  
(&facet=on&facet.field=category...) and your user interface will keep  
the state of which facet the user is seeing, and you'd change the  
facet field you request as the user drills in.


As you get more into the implementation and tinkering with Solr, you  
might have more specific questions to refine what you're doing, but  
with a little elbow grease and the Solr wiki you'll go far!


Erik



Re: Leading wildcards

2007-04-19 Thread Erik Hatcher


On Apr 19, 2007, at 10:39 AM, Yonik Seeley wrote:

On 4/19/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:

parser.setAllowLeadingWildcards(true);


I have also run into this issue and have intended to fix up Solr to
allow configuring that switch on QueryParser.


Any reason that parser.setAllowLeadingWildcards(true) shouldn't be  
the default?


That's fine by me.  But...


Does it really need to be configurable?


It all depends on how bad of a hit it'd take on Solr.   What's the  
breaking point where the performance of full-term scanning (and  
subsequently faceting, of course) kills over or dies?   FuzzyQuery's  
die on my 3.7M index and not-super-beefy hardware and system setup.


Erik



Re: Leading wildcards

2007-04-19 Thread Michael Kimsal

It still seems like it's only something that would be invoked by a user's
query.

If I queried for *foobar and leading wildcards were not on in the server,
I'd get back nothing, which isn't really correct.  I'd think the application
should
tell the user that that syntax isn't supported.

Perhaps I'm simplifying it a bit.  It would certainly help out our comfort
level
to have it either be on or configurable by default, rather than having to
maintain a
'patched' version (yes, the patch is only one line, but it's the principle
of the thing).
I suspect this would be the same for others.



On 4/19/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:



On Apr 19, 2007, at 10:39 AM, Yonik Seeley wrote:
> On 4/19/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:
>>> parser.setAllowLeadingWildcards(true);
>>
>> I have also run into this issue and have intended to fix up Solr to
>> allow configuring that switch on QueryParser.
>
> Any reason that parser.setAllowLeadingWildcards(true) shouldn't be
> the default?

That's fine by me.  But...

> Does it really need to be configurable?

It all depends on how bad of a hit it'd take on Solr.   What's the
breaking point where the performance of full-term scanning (and
subsequently faceting, of course) kills over or dies?   FuzzyQuery's
die on my 3.7M index and not-super-beefy hardware and system setup.

Erik





--
Michael Kimsal
http://webdevradio.com


Re: Leading wildcards

2007-04-19 Thread Erik Hatcher


On Apr 19, 2007, at 11:04 AM, Michael Kimsal wrote:
Perhaps I'm simplifying it a bit.  It would certainly help out our  
comfort

level
to have it either be on or configurable by default, rather than  
having to

maintain a
'patched' version (yes, the patch is only one line, but it's the  
principle

of the thing).
I suspect this would be the same for others.


And here's where your effort could go the extra mile to help  
_yourself_ out as well as the community... instead of the one-line  
change, make it a few more lines and make it a switch from the  
configuration (like the toggle for AND/OR default operator) and even  
better round it out with a test case.  Submit it, lobby for it to be  
reviewed and applied, and step 3... profit!  :)


Erik



Re: Facet Browsing

2007-04-19 Thread Kevin Lewandowski

I recommend you build your query with facet options in raw format and
make sure you're getting back the data you want. Then build it into
your app.

On 4/18/07, Jennifer Seaman <[EMAIL PROTECTED]> wrote:

Does anyone have any sample code (php, perl, etc) how to setup facet
browsing with paging? I can't seem to get things like facet.mincount
to work. Thank you.

Jennifer Seaman





Re: Leading wildcards

2007-04-19 Thread Michael Kimsal

I'm in the middle of looking in to that.  For *you* ;)  it may seem like a
quick
thing to do.  For me, who's not an expert at this stuff, it's a balance
between delving in
deeply enough to figure how to do it and hitting our deadlines.

It's actually on someone else's plate here, but he's backed up with two
other projects here first.

It's not that I don't *want* to contribute, but hardly have enough time to
get the basics
done some days.

On 4/19/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:



On Apr 19, 2007, at 11:04 AM, Michael Kimsal wrote:
> Perhaps I'm simplifying it a bit.  It would certainly help out our
> comfort
> level
> to have it either be on or configurable by default, rather than
> having to
> maintain a
> 'patched' version (yes, the patch is only one line, but it's the
> principle
> of the thing).
> I suspect this would be the same for others.

And here's where your effort could go the extra mile to help
_yourself_ out as well as the community... instead of the one-line
change, make it a few more lines and make it a switch from the
configuration (like the toggle for AND/OR default operator) and even
better round it out with a test case.  Submit it, lobby for it to be
reviewed and applied, and step 3... profit!  :)

Erik





--
Michael Kimsal
http://webdevradio.com


Re: Leading wildcards

2007-04-19 Thread Erik Hatcher


On Apr 19, 2007, at 11:37 AM, Michael Kimsal wrote:
It's not that I don't *want* to contribute, but hardly have enough  
time to

get the basics
done some days.


You can rest assured that all of us here are in that same boat.  :)

And you can also rest assured that the switch your asking for will be  
part of Solr in the near future one way or another.  I just like to  
encourage folks that can hack quick and dirty changes to go a little  
bit further and learn the Solr unit testing framework (currently a  
bit more complex than we can make it, I'm sure) and what it takes to  
get a change from hack all the way into the core codebase with wiki  
documentation.  It's easier than most folks think to go the extra  
bit, and helping folks learn how to fish is part of our jobs as well  
(and so we can sit back and relax while all the young whippersnappers  
implement our wishes just from us mentioning them! :)


Erik



Multiple indexes?

2007-04-19 Thread Matthew Runo

Hey there-

I was wondering if the following was possible, and, if so, how to set  
it up...


I want to index two different types of data, and have them searchable  
from the same interface.


For example, a group of products, with size, color, price, etc info.
And a group of brands, with brand, genre, brand description, etc info

So, the info does overlap some. But a lot of the fields for each  
"type" don't matter to the other. Is there a way to set up two  
different schema so that both types may be indexed with relative ease?


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++




Multiple Solr Cores

2007-04-19 Thread Henrib

Following up on a previous thread in the Solr-User list, here is a patch that
allows managing multiple cores in the same VM (thus multiple
config/schemas/indexes).
The SolrCore.core singleton has been changed to a Map; the
current singleton behavior is keyed as 'null'. (Which is used by
SolrInfoRegistry).
All static references to either a Config or a SolrCore have been removed;
this implies that some classes now do refer to either a SolrCore or a
SolrConfig (some ctors have been modified accordingly).

I haven't tried to modify anything above the 'jar' (script, admin & servlet
are unaware of the multi-core part).

The 2 patches files are the src/ & the test/ patches.
http://www.nabble.com/file/7971/solr-test.patch solr-test.patch 
http://www.nabble.com/file/7972/solr-src.patch solr-src.patch 

This being my first attempt at a contribution, I will humbly welcome any
comment.
Regards,
Henri
-- 
View this message in context: 
http://www.nabble.com/Multiple-Solr-Cores-tf3608399.html#a10082201
Sent from the Solr - User mailing list archive at Nabble.com.



Re: acts_as_solr

2007-04-19 Thread solruser

Hi,

Does the acts_as_solr supports now fancier  results such as highlight?
Although I see options to use facets but have not yet explored with the
plugin.

TIA
-amit

Erik Hatcher wrote:
> 
> 
> On Aug 28, 2006, at 10:25 PM, Erik Hatcher wrote:
>> I'd like to commit this to the Solr repository.  Any objections?   
>> Once committed, folks will be able to use "script/plugin  
>> install ..." to install the Ruby side of things, and using a binary  
>> distribution of Solr's example application and a custom solr/conf  
>> directory (just for schema.xml) they'd be up and running quite  
>> quickly.  If ok to commit, what directory should I put things  
>> under?  How about just "ruby"?
> 
> Ok, /client/ruby it is.  I'll get this committed in the next day or so.
> 
> I have to admit that the stuff Seth did with Searchable (linked to  
> from ) is very well done so  
> hopefully he can work with us to perhaps integrate that work into  
> what lives in Solr's repository.  Having the Searchable abstraction  
> is interesting, but it might be a bit limiting in terms of leveraging  
> fancier return values from Solr, like the facets and highlighting -  
> or maybe it's just an unnecessary abstraction for those always  
> working with Solr.  I like it though, and will certainly borrow ideas  
> from it on how to do slick stuff with Ruby.
> 
> While I'm at it, I'd be happy to commit the Java client into /client/ 
> java.  I'll check the status of that contribution when I can.
> 
>   Erik
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/acts_as_solr-tf2181162.html#a10082711
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multiple indexes?

2007-04-19 Thread Cody Caughlan

Why not just store an additional "object_type" field which
differentiates between the actual type of data you are looking for?

So if you're looking for some shoes:

(size:8 AND color:'blue') AND object_type:'shoe'

Or if you're searching on brands

(genre:'skater' AND brand_desc:'skater boy') AND object_type:'brand'

I apologize if I misunderstood your question.

/cody

On 4/19/07, Matthew Runo <[EMAIL PROTECTED]> wrote:

Hey there-

I was wondering if the following was possible, and, if so, how to set
it up...

I want to index two different types of data, and have them searchable
from the same interface.

For example, a group of products, with size, color, price, etc info.
And a group of brands, with brand, genre, brand description, etc info

So, the info does overlap some. But a lot of the fields for each
"type" don't matter to the other. Is there a way to set up two
different schema so that both types may be indexed with relative ease?

++
  | Matthew Runo
  | Zappos Development
  | [EMAIL PROTECTED]
  | 702-943-7833
++





Re: Multiple indexes?

2007-04-19 Thread Henrib

You can not have more than one Solr core per application (to be precise, per
class-loader since there are a few statics).
One way is thus to have 2 webapps - when & if indexes do not have the same
lifetime/radically different schema/etc.
However, the common wisdom is that you usually dont really need different
indexes (I discussed about this last week).

If you really are in desperate need of multiple cores, in the 'Multiple Solr
Cores' thread, you'll find (early state) patches that allow just that...

Cheers
Henri


Matthew Runo wrote:
> 
> Hey there-
> 
> I was wondering if the following was possible, and, if so, how to set  
> it up...
> 
> I want to index two different types of data, and have them searchable  
> from the same interface.
> 
> For example, a group of products, with size, color, price, etc info.
> And a group of brands, with brand, genre, brand description, etc info
> 
> So, the info does overlap some. But a lot of the fields for each  
> "type" don't matter to the other. Is there a way to set up two  
> different schema so that both types may be indexed with relative ease?
> 
> ++
>   | Matthew Runo
>   | Zappos Development
>   | [EMAIL PROTECTED]
>   | 702-943-7833
> ++
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Multiple-indexes--tf3608429.html#a10083580
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multiple Solr Cores

2007-04-19 Thread mpelzsherman

This sounds like a great idea, and potentially very useful for my company.

Can you explain a bit about how you would configure the various solr/home
paths, and how the different indexes would be accessed by clients?

Thanks!

- Michael
-- 
View this message in context: 
http://www.nabble.com/Multiple-Solr-Cores-tf3608399.html#a10083581
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Facet Browsing

2007-04-19 Thread Mike Klaas

On 4/18/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

On 4/18/07, Koji Sekiguchi <[EMAIL PROTECTED]> wrote:
>  > I can't seem to get things like facet.mincount to work.
>
> We had same issue when we used Solr incubator version.
> Now we are using trunk version of Solr and the issue was gone.

Hmmm, good point.
The wiki is often updated at the same time as the most recent
development version of Solr.


What if we made a policy of including a "added in version XX" to wiki
documentation of features that aren't yet in a release?  The XX could
link to a page that includes a link to the nightly build and
CHANGES.txt, or the release package for already-released versions.

-Mike


Re: help need on words with special characters

2007-04-19 Thread Mike Klaas

On 4/18/07, Doss <[EMAIL PROTECTED]> wrote:


I am new to solr(and 0 in lucene), my doubt is how can i protect  words with special 
characters from tokenizing, sat for example A+, A1+ etc. because when i searched for 
"group A" i am getting results with A+ aswell as A1+ and so on, is there any 
special way to index these type of words?


You need to change your analyzer to recognize "A+", "A1+" as tokens.
Normally, special characters like + would not be recognized as parts
of words.

I you have a small number of special terms, you could add some code to
your existing analyzer to recognize it (WordDelimiterFilter if you are
using the standard text field in the Solr example).  If it is
complicated, you should look into creating your own analyzer.

-Mike


Re: Multiple indexes?

2007-04-19 Thread Matthew Runo

Ah hah! This appears to be what I'm interested in doing.

I'll have to read up on object_types.

++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Apr 19, 2007, at 10:04 AM, Cody Caughlan wrote:


Why not just store an additional "object_type" field which
differentiates between the actual type of data you are looking for?

So if you're looking for some shoes:

(size:8 AND color:'blue') AND object_type:'shoe'

Or if you're searching on brands

(genre:'skater' AND brand_desc:'skater boy') AND object_type:'brand'

I apologize if I misunderstood your question.

/cody

On 4/19/07, Matthew Runo <[EMAIL PROTECTED]> wrote:

Hey there-

I was wondering if the following was possible, and, if so, how to set
it up...

I want to index two different types of data, and have them searchable
from the same interface.

For example, a group of products, with size, color, price, etc info.
And a group of brands, with brand, genre, brand description, etc info

So, the info does overlap some. But a lot of the fields for each
"type" don't matter to the other. Is there a way to set up two
different schema so that both types may be indexed with relative  
ease?


++
  | Matthew Runo
  | Zappos Development
  | [EMAIL PROTECTED]
  | 702-943-7833
++









Re: Multiple indexes?

2007-04-19 Thread Cody Caughlan

If you're doing this in Ruby, there is an "acts_as_solr" plugin for
Rails which takes exactly this approach to store all different kinds
of Model objects in the same index...I just "took" the idea from
there...

/Cody

On 4/19/07, Matthew Runo <[EMAIL PROTECTED]> wrote:

Ah hah! This appears to be what I'm interested in doing.

I'll have to read up on object_types.

++
  | Matthew Runo
  | Zappos Development
  | [EMAIL PROTECTED]
  | 702-943-7833
++


On Apr 19, 2007, at 10:04 AM, Cody Caughlan wrote:

> Why not just store an additional "object_type" field which
> differentiates between the actual type of data you are looking for?
>
> So if you're looking for some shoes:
>
> (size:8 AND color:'blue') AND object_type:'shoe'
>
> Or if you're searching on brands
>
> (genre:'skater' AND brand_desc:'skater boy') AND object_type:'brand'
>
> I apologize if I misunderstood your question.
>
> /cody
>
> On 4/19/07, Matthew Runo <[EMAIL PROTECTED]> wrote:
>> Hey there-
>>
>> I was wondering if the following was possible, and, if so, how to set
>> it up...
>>
>> I want to index two different types of data, and have them searchable
>> from the same interface.
>>
>> For example, a group of products, with size, color, price, etc info.
>> And a group of brands, with brand, genre, brand description, etc info
>>
>> So, the info does overlap some. But a lot of the fields for each
>> "type" don't matter to the other. Is there a way to set up two
>> different schema so that both types may be indexed with relative
>> ease?
>>
>> ++
>>   | Matthew Runo
>>   | Zappos Development
>>   | [EMAIL PROTECTED]
>>   | 702-943-7833
>> ++
>>
>>
>>
>




Re: Multiple Solr Cores

2007-04-19 Thread Henrib

There is still only one solr.home instance used to load the various classes
which is used as the one 'root'.
>From there, you can have multiple solrconfig*.xml & schema*.xml (even
absolute pathes); calling new SolrCore(name_of_core, path_to_solrconfig,
path_to_schema) creates a named core that you can refer to.
To refer to a named core, you call SolrCore.getCore(name_of_core) (instead
of SolrCore.getCore()).

>From a servlet perspective, it seems that passing the name of the core back
& forth should do the trick (so we can reacquire the correct core). One
missing part is uploading a config & a schema then start a core (a dynamic
creation of a core). One thing to note is that a schema needs a config to be
created and it is certainly wise to use the same for schema & core
creations. 
For the admin servlet, we'd need to implement a way to choose the core we
want to observe.
And the scripts probably also need to have a 'core name' passed down...

I'm still building my knowledge on the subject so my simplistic view might
not be accurate.
Let me know if this helps.
Cheers
Henrib



mpelzsherman wrote:
> 
> This sounds like a great idea, and potentially very useful for my company.
> 
> Can you explain a bit about how you would configure the various solr/home
> paths, and how the different indexes would be accessed by clients?
> 
> Thanks!
> 
> - Michael
> 

-- 
View this message in context: 
http://www.nabble.com/Multiple-Solr-Cores-tf3608399.html#a10084772
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multiple indexes?

2007-04-19 Thread Matthew Runo

I'll actually be doing this in Perl..

any ideas on perl? heh

++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Apr 19, 2007, at 11:59 AM, Cody Caughlan wrote:


If you're doing this in Ruby, there is an "acts_as_solr" plugin for
Rails which takes exactly this approach to store all different kinds
of Model objects in the same index...I just "took" the idea from
there...

/Cody

On 4/19/07, Matthew Runo <[EMAIL PROTECTED]> wrote:

Ah hah! This appears to be what I'm interested in doing.

I'll have to read up on object_types.

++
  | Matthew Runo
  | Zappos Development
  | [EMAIL PROTECTED]
  | 702-943-7833
++


On Apr 19, 2007, at 10:04 AM, Cody Caughlan wrote:

> Why not just store an additional "object_type" field which
> differentiates between the actual type of data you are looking for?
>
> So if you're looking for some shoes:
>
> (size:8 AND color:'blue') AND object_type:'shoe'
>
> Or if you're searching on brands
>
> (genre:'skater' AND brand_desc:'skater boy') AND  
object_type:'brand'

>
> I apologize if I misunderstood your question.
>
> /cody
>
> On 4/19/07, Matthew Runo <[EMAIL PROTECTED]> wrote:
>> Hey there-
>>
>> I was wondering if the following was possible, and, if so, how  
to set

>> it up...
>>
>> I want to index two different types of data, and have them  
searchable

>> from the same interface.
>>
>> For example, a group of products, with size, color, price, etc  
info.
>> And a group of brands, with brand, genre, brand description,  
etc info

>>
>> So, the info does overlap some. But a lot of the fields for each
>> "type" don't matter to the other. Is there a way to set up two
>> different schema so that both types may be indexed with relative
>> ease?
>>
>> ++
>>   | Matthew Runo
>>   | Zappos Development
>>   | [EMAIL PROTECTED]
>>   | 702-943-7833
>> ++
>>
>>
>>
>








Filter question...

2007-04-19 Thread escher2k

I have a bunch of fields that I am trying to filter on.

When I try to filter the data across the multiple fields, the result seems
to even retrieve fields where the data is not present.

For instance if the filter query contains this -
primary_state:New Delhi OR primary_country:New Delhi OR primary_city:New
Delhi OR secondary_state:New Delhi OR secondary_country:New Delhi

The result retrieved even contains records that have "Rochester, New York". 

Is there a way to only retrieve those records that contain both the words
"New" and "Delhi".

Thanks.
-- 
View this message in context: 
http://www.nabble.com/Filter-question...-tf3609407.html#a10085687
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multiple indexes?

2007-04-19 Thread Erik Hatcher

Matthew,

All that is meant by "object_types" is an additional stored/indexed  
field in the Solr schema that gets added to every document providing  
context of which type it is (shoes or brands).  Then you can limit  
searches to a particular area by just filtering on type:shoes, for  
example.


Erik

p.s. I could use some new shoes!

On Apr 19, 2007, at 3:17 PM, Matthew Runo wrote:


I'll actually be doing this in Perl..

any ideas on perl? heh

++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Apr 19, 2007, at 11:59 AM, Cody Caughlan wrote:


If you're doing this in Ruby, there is an "acts_as_solr" plugin for
Rails which takes exactly this approach to store all different kinds
of Model objects in the same index...I just "took" the idea from
there...

/Cody

On 4/19/07, Matthew Runo <[EMAIL PROTECTED]> wrote:

Ah hah! This appears to be what I'm interested in doing.

I'll have to read up on object_types.

++
  | Matthew Runo
  | Zappos Development
  | [EMAIL PROTECTED]
  | 702-943-7833
++


On Apr 19, 2007, at 10:04 AM, Cody Caughlan wrote:

> Why not just store an additional "object_type" field which
> differentiates between the actual type of data you are looking  
for?

>
> So if you're looking for some shoes:
>
> (size:8 AND color:'blue') AND object_type:'shoe'
>
> Or if you're searching on brands
>
> (genre:'skater' AND brand_desc:'skater boy') AND  
object_type:'brand'

>
> I apologize if I misunderstood your question.
>
> /cody
>
> On 4/19/07, Matthew Runo <[EMAIL PROTECTED]> wrote:
>> Hey there-
>>
>> I was wondering if the following was possible, and, if so, how  
to set

>> it up...
>>
>> I want to index two different types of data, and have them  
searchable

>> from the same interface.
>>
>> For example, a group of products, with size, color, price, etc  
info.
>> And a group of brands, with brand, genre, brand description,  
etc info

>>
>> So, the info does overlap some. But a lot of the fields for each
>> "type" don't matter to the other. Is there a way to set up two
>> different schema so that both types may be indexed with relative
>> ease?
>>
>> ++
>>   | Matthew Runo
>>   | Zappos Development
>>   | [EMAIL PROTECTED]
>>   | 702-943-7833
>> ++
>>
>>
>>
>








Re: Filter question...

2007-04-19 Thread Jennifer Seaman


Is there a way to only retrieve those records that contain both the 
words "New" and "Delhi".


I'm just starting with this, put I found you need to do;
primary_state:"New Delhi"

I never used the OR yet!




Querying an index while commiting/optimizing

2007-04-19 Thread Cody Caughlan

This is more of a Lucene question than a Solr one, but... is it
possible to query a Solr(Lucene) index while it is in the middle of
performing a commit/optimize?

Some of the Lucene documentation *implies*, but on a logical level it
seems kind of crazy. But at the same time, for applications where it
takes 2 days to re-build your index it seems impractical that your app
is "down" this whole time...

Am I totally crazy?

thanks
/cody


Re: Querying an index while commiting/optimizing

2007-04-19 Thread Yonik Seeley

On 4/19/07, Cody Caughlan <[EMAIL PROTECTED]> wrote:

This is more of a Lucene question than a Solr one, but... is it
possible to query a Solr(Lucene) index while it is in the middle of
performing a commit/optimize?


Yep, no problems.  You can query concurrently with adds, deletes,
commits, and optimizes.

-Yonik


Re: Querying an index while commiting/optimizing

2007-04-19 Thread Cody Caughlan

Ok, cool. At the worse case, are you just not going to get any hits,
even though you "know" the data is in there, but it just hasnt been
indexed yet?

How does that work?

/cody

On 4/19/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

On 4/19/07, Cody Caughlan <[EMAIL PROTECTED]> wrote:
> This is more of a Lucene question than a Solr one, but... is it
> possible to query a Solr(Lucene) index while it is in the middle of
> performing a commit/optimize?

Yep, no problems.  You can query concurrently with adds, deletes,
commits, and optimizes.

-Yonik



Re: Querying an index while commiting/optimizing

2007-04-19 Thread Yonik Seeley

On 4/19/07, Cody Caughlan <[EMAIL PROTECTED]> wrote:

How does that work?



From the user perspective, you continue to "see" the previous version

of the index that was last committed until the current commit is
completely finished.

-Yonik


Re: Querying an index while commiting/optimizing

2007-04-19 Thread Cody Caughlan

PERFECT.

Thanks guys/gals.

On 4/19/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

On 4/19/07, Cody Caughlan <[EMAIL PROTECTED]> wrote:
> How does that work?

From the user perspective, you continue to "see" the previous version
of the index that was last committed until the current commit is
completely finished.

-Yonik



Re: Multiple indexes?

2007-04-19 Thread Matthew Runo
Ah. That makes sense then. I wasn't sure if that was the best way to  
go about things or not. I didn't want to end up with a bunch of  
fields that were not being used all the time  if it would cause a  
degradation in search quality.


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Apr 19, 2007, at 12:59 PM, Erik Hatcher wrote:


Matthew,

All that is meant by "object_types" is an additional stored/indexed  
field in the Solr schema that gets added to every document  
providing context of which type it is (shoes or brands).  Then you  
can limit searches to a particular area by just filtering on  
type:shoes, for example.


Erik

p.s. I could use some new shoes!

On Apr 19, 2007, at 3:17 PM, Matthew Runo wrote:


I'll actually be doing this in Perl..

any ideas on perl? heh

++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Apr 19, 2007, at 11:59 AM, Cody Caughlan wrote:


If you're doing this in Ruby, there is an "acts_as_solr" plugin for
Rails which takes exactly this approach to store all different kinds
of Model objects in the same index...I just "took" the idea from
there...

/Cody

On 4/19/07, Matthew Runo <[EMAIL PROTECTED]> wrote:

Ah hah! This appears to be what I'm interested in doing.

I'll have to read up on object_types.

++
  | Matthew Runo
  | Zappos Development
  | [EMAIL PROTECTED]
  | 702-943-7833
++


On Apr 19, 2007, at 10:04 AM, Cody Caughlan wrote:

> Why not just store an additional "object_type" field which
> differentiates between the actual type of data you are looking  
for?

>
> So if you're looking for some shoes:
>
> (size:8 AND color:'blue') AND object_type:'shoe'
>
> Or if you're searching on brands
>
> (genre:'skater' AND brand_desc:'skater boy') AND  
object_type:'brand'

>
> I apologize if I misunderstood your question.
>
> /cody
>
> On 4/19/07, Matthew Runo <[EMAIL PROTECTED]> wrote:
>> Hey there-
>>
>> I was wondering if the following was possible, and, if so,  
how to set

>> it up...
>>
>> I want to index two different types of data, and have them  
searchable

>> from the same interface.
>>
>> For example, a group of products, with size, color, price,  
etc info.
>> And a group of brands, with brand, genre, brand description,  
etc info

>>
>> So, the info does overlap some. But a lot of the fields for each
>> "type" don't matter to the other. Is there a way to set up two
>> different schema so that both types may be indexed with relative
>> ease?
>>
>> ++
>>   | Matthew Runo
>>   | Zappos Development
>>   | [EMAIL PROTECTED]
>>   | 702-943-7833
>> ++
>>
>>
>>
>










Re: [acts_as_solr] Few question on usage

2007-04-19 Thread Chris Hostetter

I don't really know alot about Ruby, but as i understand it there are more
then a few versions of something called "acts_as_solr" floating arround
... the first written by Erik as a proof of concept, and then pickedu pand
polished a bit by someone else (whose name escapes me)

all of the "serious" ruby/solr development i know about is happening as
part of the "Flare" sub-sub project...

http://wiki.apache.org/solr/Flare
http://wiki.apache.org/solr/SolRuby

...most of the people workign on it seem to hang out on the
[EMAIL PROTECTED] mailing list.  as i understand it the "solr-ruby" package
is a low level ruby<->solr API, with Flare being a higher level
reusable Rails app type thingamombob.  (can you tell i don't know a lot
about RUby or rails? ... i'm winging it)


: Date: Tue, 17 Apr 2007 10:52:00 -0700
: From: amit rohatgi <[EMAIL PROTECTED]>
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: [acts_as_solr] Few question on usage
:
: Hi
:
: Here are few question for solr integrating with ruby
:
: 1. What are other alternatives are available for ruby integration with solr
: other than acts-as_solr plugin.
: 2. acts_as_solr plugin - does it support highlighting feature
: 3. performance benchmark for acts_as_solr plugin available if any
:
:
: -thanks
: dev
:



-Hoss



Re: limiting the rows returned for a query

2007-04-19 Thread Chris Hostetter

: I want to do a simple query to the solr index.  something like
: q=stateid:1 countryid:1
:
: but i'm really only concerned with getting the record above and below a
: certain (dynamic) recordid in the search results.

you have to define what "above and below" means in your context ... do you
mean assuming a sorting on some field?  if so then you could
write a custom FunctionQuery that looks at the value for a given doc and
scores things based on how close their value is.

alternately you could have a custom RequestHandler that does the search,
and walks the DocList on the server side ... either way you need some
custom java code.

but i could be missing some easy solution ... it all depends on the
specifics of your problem (can you give a concrete example?)



-Hoss



Re: Requests per second/minute monitor?

2007-04-19 Thread Chris Hostetter

: Is there a good spot to track request rate in Solr? Has anyone
: built a monitor?

I would think it would make more sense to track this in your application
server then to add it to Solr itself.




-Hoss



Re: Leading wildcards

2007-04-19 Thread Chris Hostetter

: > Any reason that parser.setAllowLeadingWildcards(true) shouldn't be
: > the default?

i'm of two minds on this, both of which vote "don't do it"

from a predictibility standpoint, i think we should keep the default
beahvior the same as the base QueryParsers default behavior as much as
possible

from a stability standpoint, i would suggest that people should have to go
out of their way to get this behavior, since it does open up the
possiblity of a query OOMing Solr extremely easily.

In general: if we are going to change the behavior of existing syntax in
QP, it should be in ways that make the system more stable (ala:
ConstantScore Range and Prefix queries) and not less.



-Hoss



Re: help need on words with special characters

2007-04-19 Thread Chris Hostetter

: with special characters from tokenizing, sat for example A+, A1+ etc.
: because when i searched for "group A" i am getting results with A+
: aswell as A1+ and so on, is there any special way to index these type of
: words?

all fo the tokenization is controlled via the analyzers you configure in
your schema.xml -- you don't have to use any of the stuff in the example
schema, you cna change it as much as you want.

if you starting with an existing schema, and you want to udnerstand
why/how certain thigns are happening duriring analysis, the "Analysis"
tool (linked to from the admin screen) makes it easy to help you decide
which tokenier/tokenfilter changes to make...

http://localhost:8983/solr/admin/analysis.jsp?highlight=on
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters


-Hoss



Re: Leading wildcards

2007-04-19 Thread Yonik Seeley

On 4/19/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:


: > Any reason that parser.setAllowLeadingWildcards(true) shouldn't be
: > the default?

i'm of two minds on this, both of which vote "don't do it"

from a predictibility standpoint, i think we should keep the default
beahvior the same as the base QueryParsers default behavior as much as
possible


For things that return results, yes.  I think that taking away
features isn't a good thing, but adding them can be (basically,
backward compatibility).


from a stability standpoint, i would suggest that people should have to go
out of their way to get this behavior, since it does open up the
possiblity of a query OOMing Solr extremely easily.


ConstantScorePrefixQuery is used... there shouldn't be an issue with
memory, just time.


In general: if we are going to change the behavior of existing syntax in
QP, it should be in ways that make the system more stable (ala:
ConstantScore Range and Prefix queries) and not less.


One could argue producing a result rather than throwing an exception
is an improvement.

-Yonik


Re: Leading wildcards

2007-04-19 Thread Yonik Seeley

On 4/19/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> from a stability standpoint, i would suggest that people should have to go
> out of their way to get this behavior, since it does open up the
> possiblity of a query OOMing Solr extremely easily.

ConstantScorePrefixQuery is used... there shouldn't be an issue with
memory, just time.


Oops, except we aren't always talking about a prefix query.
I know at least some expanding queries automatically limit to the max
number of boolean clauses.  Not sure if all of them do though.

-Yonik


Re: Multiple indexes?

2007-04-19 Thread Chris Hostetter
: So if you're looking for some shoes:
: (size:8 AND color:'blue') AND object_type:'shoe'

: Or if you're searching on brands
: (genre:'skater' AND brand_desc:'skater boy') AND object_type:'brand'

a slight improvement on this: put your object_type restriction in a
filter query (&fq=object_type:foo) and not in your main query ... that way
it won't affect the scoring, and it will be cached uniquely so the work of
identifying the set of all shows will only be done once per commit (and
likewise for brands)



-Hoss



Re: Filter question...

2007-04-19 Thread escher2k

Thanks Jennifer. But the issue with the quotes would be that it would match
the string exactly and
not find it, if there were other words in between (e.g. New Capital Delhi).


Jennifer Seaman wrote:
> 
> 
>>Is there a way to only retrieve those records that contain both the 
>>words "New" and "Delhi".
> 
> I'm just starting with this, put I found you need to do;
> primary_state:"New Delhi"
> 
> I never used the OR yet!
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Filter-question...-tf3609407.html#a10087870
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Leading wildcards

2007-04-19 Thread Chris Hostetter

: For things that return results, yes.  I think that taking away
: features isn't a good thing, but adding them can be (basically,
: backward compatibility).

i don't know that this is really dding a feature ... it's changing syntax.
"foo:*bar" has meaning by default in the query parser ... it's meaning may
typically result in a query that doesn't match anything, but that's an
expectation people may have based on past use of QueryParser (or reading
of it's docs)

in this point, i'm just saying we should change any default meaning of
syntax ... adding _val_:"func(foo)" didn't really run any risk of doing
somethign people didn't expect (unless they have a field named_val_) ...
but people who are use to QUeryParser protecting them from foolish users
that type in leading wildcards would be in for a nasty suprise if we
change the default.



-Hoss



Re: Leading wildcards

2007-04-19 Thread Chris Hostetter

: > ConstantScorePrefixQuery is used... there shouldn't be an issue with
: > memory, just time.
:
: Oops, except we aren't always talking about a prefix query.
: I know at least some expanding queries automatically limit to the max
: number of boolean clauses.  Not sure if all of them do though.

right ... we're talking about WildCard queries with elading wildcards ...
i can't rememebr if it uses maxBooleanClauses or not either ... but even
if it does, supporting this behavior (and runningthis risk) should be
explicitly controlled by teh user (just like changing maxBooleanClauses --
if they don't set it in solrconfig.xml, we use whatever the Lucene default
is)



-Hoss



Re: Filter question...

2007-04-19 Thread Mike Klaas

On 4/19/07, escher2k <[EMAIL PROTECTED]> wrote:


Thanks Jennifer. But the issue with the quotes would be that it would match
the string exactly and
not find it, if there were other words in between (e.g. New Capital Delhi).


If you want to restrict a section of a query to a field, use brackets:

city:(.)

thus:

city:(New Delhi) --> city contains 'new' or 'delhi', highest score to
those containing both
city:(+New +Delhi) --> city contains 'new' AND city contains 'delhi'
city:"New Delhi"~1000 --> city contain 'new' with 1000 words of
'delhi', highest score to matches having the words nearby

-Mike


Re: Filter question...

2007-04-19 Thread Chris Hostetter

: not find it, if there were other words in between (e.g. New Capital Delhi).

then you should use field:"New Delhi"~3 or (+field:New +field:Delhi) what
you have now is going to match any docs that have "New" in any of the
fields you care about or Delhi in whatever you default search field is.

incidently, your use case seems like it *desperately* cries out for you to
use the dismax handler...

qt=dismax&q=New+Delhi&qf=primary_state^2+primary_country^2+primary_city^3+secondary_state+secondary_country&pf=primary_state^2+primary_country^2+primary_city^3+secondary_state+secondary_country&ps=5&mm=100

http://wiki.apache.org/solr/DisMaxRequestHandler
http://lucene.apache.org/solr/api/org/apache/solr/request/DisMaxRequestHandler.html





-Hoss



Re: Filter question...

2007-04-19 Thread escher2k

Thanks Chris. We are using dismax already :)


Chris Hostetter wrote:
> 
> 
> : not find it, if there were other words in between (e.g. New Capital
> Delhi).
> 
> then you should use field:"New Delhi"~3 or (+field:New +field:Delhi) what
> you have now is going to match any docs that have "New" in any of the
> fields you care about or Delhi in whatever you default search field is.
> 
> incidently, your use case seems like it *desperately* cries out for you to
> use the dismax handler...
> 
> qt=dismax&q=New+Delhi&qf=primary_state^2+primary_country^2+primary_city^3+secondary_state+secondary_country&pf=primary_state^2+primary_country^2+primary_city^3+secondary_state+secondary_country&ps=5&mm=100
> 
> http://wiki.apache.org/solr/DisMaxRequestHandler
> http://lucene.apache.org/solr/api/org/apache/solr/request/DisMaxRequestHandler.html
> 
> 
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Filter-question...-tf3609407.html#a10088438
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Filter question...

2007-04-19 Thread escher2k

Thanks Mike. I just tested it on one field and looks like it works fine.


Mike Klaas wrote:
> 
> On 4/19/07, escher2k <[EMAIL PROTECTED]> wrote:
>>
>> Thanks Jennifer. But the issue with the quotes would be that it would
>> match
>> the string exactly and
>> not find it, if there were other words in between (e.g. New Capital
>> Delhi).
> 
> If you want to restrict a section of a query to a field, use brackets:
> 
> city:(.)
> 
> thus:
> 
> city:(New Delhi) --> city contains 'new' or 'delhi', highest score to
> those containing both
> city:(+New +Delhi) --> city contains 'new' AND city contains 'delhi'
> city:"New Delhi"~1000 --> city contain 'new' with 1000 words of
> 'delhi', highest score to matches having the words nearby
> 
> -Mike
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Filter-question...-tf3609407.html#a10088449
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Leading wildcards

2007-04-19 Thread Yonik Seeley

On 4/19/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:

: For things that return results, yes.  I think that taking away
: features isn't a good thing, but adding them can be (basically,
: backward compatibility).

i don't know that this is really dding a feature ... it's changing syntax.
"foo:*bar" has meaning by default in the query parser ... it's meaning may
typically result in a query that doesn't match anything,


I think it's adding syntax, not changing it.
Right now, you get an exception for foo:*bar
So if we allowed it by default, I would call that 100% backward compatible.

The one issue you brought up is memory, and that should be
investigated.   I agree we don't want to make it too easy to blow up
things up.

-Yonik


Re: Multiple indexes?

2007-04-19 Thread Ryan McKinley


As this question comes up so often, i put a new page on the wiki:
 http://wiki.apache.org/solr/MultipleIndexes

We should fill in more details and link it to the front page.


Chris Hostetter wrote:

: So if you're looking for some shoes:
: (size:8 AND color:'blue') AND object_type:'shoe'

: Or if you're searching on brands
: (genre:'skater' AND brand_desc:'skater boy') AND object_type:'brand'

a slight improvement on this: put your object_type restriction in a
filter query (&fq=object_type:foo) and not in your main query ... that way
it won't affect the scoring, and it will be cached uniquely so the work of
identifying the set of all shows will only be done once per commit (and
likewise for brands)



-Hoss






Re: Leading wildcards

2007-04-19 Thread Chris Hostetter

: > i don't know that this is really dding a feature ... it's changing syntax.
: > "foo:*bar" has meaning by default in the query parser ... it's meaning may
: > typically result in a query that doesn't match anything,
:
: I think it's adding syntax, not changing it.
: Right now, you get an exception for foo:*bar
: So if we allowed it by default, I would call that 100% backward compatible.

Ah ... this is where my poor knowledge of wildcards comes in ... i thought
it treated * as a litteral, yeah i see your point now ... i retract that
objection, but reserve the right to stand behind the "let's not make it
easy to crash the box by default" objection.  :)


-Hoss



Re: Question: index performance

2007-04-19 Thread Chris Hostetter

: solr, 100 documents at a time. I was doing a commit after each of those
: but after what Yonik says I will remove it and commit only after each
: batch of 25k.

do the commit only when you think it's neccessary to expose those docs to
your search clients, one of which may be "you" checking on the progress of
your index build.

: Q1: I've got autocommit set to 1000 now.. in solrconfig.xml, should i
: disable it in this scenario?

i'm guessing you don't want that if you are doing full builds on a regular
basis.  it's intent is for indexes that are being continuously updated and
you just want to know that eventually a commit will happen 9wihtout
needing to ever call it explicilty)

: Q2: To decide which of those 25k are going to be indexed, we need to do
: a query for each (this is the main reason to optimize before a new DB
: batch is indexed), each of these 25k queries take around 30ms which is
: good enough for us, but i've observed every ~30 queries the time of one
: search goes up to 150ms or even 1200ms. Then it does another ~30, etc. I
: guess there is something happening inside the server regularly that
: causes it. Any clues what it can be and how can i minimize that time?

are these queries happening simultenously with the updates? the
autocommiting will be causing a newSearcher to be opened, and the first
search on it will have to pay some added cost.

besdies autocommit, there is nothing that happens automaticly on a
recuring basis in Solr .. there may be something else running on your box
that is using ram, which is taking away from the disk page cache, which
causes some searches to need to rerad pages (pure speculation)

: Q3: The 25k searches are done without any cumulative effect on
: performance (avg/search is ~30ms from start to end). But if inmmediately
: after start posting documents to the index tomcat peaks CPU. But if i
: stop tomcat, and then post the 25k documents without doing those
: searches they're very quick. Is there any reason why the searches would
: affect tomcat to justify this? Just to clarify, searches are NOT done at
: the same time as indexing.

i'm having trouble understanding your question ... how can you post
documenst after stopping tomcat?



-Hoss



Solr performance warnings

2007-04-19 Thread Michael Thessel

Hello,

in my logs I get from time to time this message:

INFO: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

What does this mean? What can I do to avoid this?


Cheers,

Michael


Re: Multiple Solr Cores

2007-04-19 Thread Chris Hostetter

I'm sorry to say I am *way* behind on my patch reading (and moving into my
new place this weekend where i have no net access isn't going to help) so
i can't comment on the technique (or even style) of this patch ... but if
you could do peopel a favor and open a Jira issue and post it there for
people to review i would be very much appreciated...

http://wiki.apache.org/solr/HowToContribute
http://issues.apache.org/jira/browse/SOLR

...one quick comment based on something that jumped out at me from your
mail, removing public static methods will make it really hard to apply
this patch in a backwards compatible way (since many people may have
custom request handlers that rely on those methods) ... but if you're
using a "null" key in maps to refer to the default Core/Config you should
be able to keep the existing methods arround (deprecated) as wrappers for
your new methods right?


-Hoss



Re: Solr performance warnings

2007-04-19 Thread Erik Hatcher


On Apr 19, 2007, at 7:47 PM, Michael Thessel wrote:

in my logs I get from time to time this message:

INFO: PERFORMANCE WARNING: Overlapping onDeckSearchers=2

What does this mean? What can I do to avoid this?


I think you have issued multiple commits (or optimizes) that hadn't  
fully finished.   Is that the case?


Erik



Re: [acts_as_solr] Few question on usage

2007-04-19 Thread Erik Hatcher

Sorry, I missed the original mail.   Hoss has got it right.

Personally I'd love to see acts_as_solr definitively come into the  
solr-ruby fold.


Regarding your questions:

: 1. What are other alternatives are available for ruby integration  
with solr

: other than acts-as_solr plugin.


acts_as_solr is purely for ActiveRecord (database O/R mapping)  
integration with Solr, such that when you create/update/delete  
records they get taken care of in Solr also.


For pure Ruby access to Solr without a database, use solr-ruby.  The  
0.01 gem is available as "gem install solr-ruby", but if you can I'd  
recommend you tinker with the trunk codebase too.



: 2. acts_as_solr plugin - does it support highlighting feature


This depends on which acts_as_solr you've grabbed.  As Hoss  
mentioned, there are various flavors of it floating around.   I've  
promised to speak about acts_as_solr at RailsConf next month, so I'll  
be working to get that under control even if that means resurrecting  
my initial hack and making it part of solr-ruby and hoping that the  
other implementations floating out there would like to collaborate on  
a definitive version built into the Solr codebase.



: 3. performance benchmark for acts_as_solr plugin available if any


What kind of numbers are you after?  acts_as_solr searches Solr, and  
then will fetch the records from the database to bring back model  
objects, so you have to account for the database access in the  
picture as well as Solr.


Erik



On Apr 19, 2007, at 5:30 PM, Chris Hostetter wrote:



I don't really know alot about Ruby, but as i understand it there  
are more
then a few versions of something called "acts_as_solr" floating  
arround
... the first written by Erik as a proof of concept, and then  
pickedu pand

polished a bit by someone else (whose name escapes me)

all of the "serious" ruby/solr development i know about is  
happening as

part of the "Flare" sub-sub project...

http://wiki.apache.org/solr/Flare
http://wiki.apache.org/solr/SolRuby

...most of the people workign on it seem to hang out on the
[EMAIL PROTECTED] mailing list.  as i understand it the "solr-ruby"  
package

is a low level ruby<->solr API, with Flare being a higher level
reusable Rails app type thingamombob.  (can you tell i don't know a  
lot

about RUby or rails? ... i'm winging it)


: Date: Tue, 17 Apr 2007 10:52:00 -0700
: From: amit rohatgi <[EMAIL PROTECTED]>
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: [acts_as_solr] Few question on usage
:
: Hi
:
: Here are few question for solr integrating with ruby
:
: 1. What are other alternatives are available for ruby integration  
with solr

: other than acts-as_solr plugin.
: 2. acts_as_solr plugin - does it support highlighting feature
: 3. performance benchmark for acts_as_solr plugin available if any
:
:
: -thanks
: dev
:



-Hoss




yesterday i download solr, it not support IndexInfo

2007-04-19 Thread James liu

i read it from http://wiki.apache.org/solr/IndexInfoRequestHandler

--
regards
jl


Re: yesterday i download solr, it not support IndexInfo

2007-04-19 Thread Erik Hatcher


On Apr 19, 2007, at 9:42 PM, James liu wrote:

i read it from http://wiki.apache.org/solr/IndexInfoRequestHandler


Could you elaborate?   What request did you make of Solr?   What was  
its response?   What solrconfig.xml file are you using?   The example  
config file has the indexinfo request handler mapped:  svn.apache.org/repos/asf/lucene/solr/trunk/example/solr/conf/ 
solrconfig.xml>


Erik



Re: yesterday i download solr, it not support IndexInfo

2007-04-19 Thread Ryan McKinley

James liu wrote:

i read it from http://wiki.apache.org/solr/IndexInfoRequestHandler



The IndexInfoRequestHandler was added since the solr 1.1.  You will need 
to compile the source from:

 http://svn.apache.org/repos/asf/lucene/solr/trunk/

to get the IndexInfo handler.




Re: yesterday i download solr, it not support IndexInfo

2007-04-19 Thread James liu

I know it should be configed in solrconfig.xml.

and i think it may be with newest version.

But i find no, so i just think is it problem when i download.


2007/4/20, Erik Hatcher <[EMAIL PROTECTED]>:



On Apr 19, 2007, at 9:42 PM, James liu wrote:
> i read it from http://wiki.apache.org/solr/IndexInfoRequestHandler

Could you elaborate?   What request did you make of Solr?   What was
its response?   What solrconfig.xml file are you using?   The example
config file has the indexinfo request handler mapped:  

Erik





--
regards
jl


Facet.query

2007-04-19 Thread Ge, Yao \(Y.\)
When mutiple facet queries are specified, are they booleaned as OR or
AND?
-Yao


RE: Facet.query

2007-04-19 Thread Ge, Yao \(Y.\)
Never mind. I should have read the example
(http://wiki.apache.org/solr/SimpleFacetParameters#head-1da3ab3995bc4abc
dce8e0f04be7355ba19e9b2c) first.



From: Ge, Yao (Y.) 
Sent: Thursday, April 19, 2007 10:41 PM
To: 'solr-user@lucene.apache.org'
Subject: Facet.query


When mutiple facet queries are specified, are they booleaned as OR or
AND?
-Yao


Re: Facet.query

2007-04-19 Thread James liu

I think it only concern with QueryWord,  not about Facet。

2007/4/20, Ge, Yao (Y.) <[EMAIL PROTECTED]>:


Never mind. I should have read the example
(http://wiki.apache.org/solr/SimpleFacetParameters#head-1da3ab3995bc4abc
dce8e0f04be7355ba19e9b2c) first.



From: Ge, Yao (Y.)
Sent: Thursday, April 19, 2007 10:41 PM
To: 'solr-user@lucene.apache.org'
Subject: Facet.query


When mutiple facet queries are specified, are they booleaned as OR or
AND?
-Yao





--
regards
jl