date:20080827

Re[2]: Question about search suggestion

2008-08-27 Thread Aleksey Gogolev

I searched and read about auto-complete feature. Thanks. It looks
nice, I think I should try it first.

NM> On Tue, 26 Aug 2008 15:15:21 +0300
NM> Aleksey Gogolev <[EMAIL PROTECTED]> wrote:

>> 
>> Hello.
>> 
>> I'm new to solr and I need to make a search suggest (like google
>> suggestions).
>> 

NM> Hi Aleksey,
NM> please search the archives of this list for subjects containing 
'autocomplete'
NM> or 'auto-suggest'. that should give you a few ideas and starting points.

NM> best,
NM> B

NM> _
NM> {Beto|Norberto|Numard} Meijome

NM> "The more I see the less I know for sure." 
NM>   John Lennon

NM> I speak for myself, not my employer. Contents may be hot. Slippery when wet.
NM> Reading disclaimers makes you go blind. Writing them is worse. You have been
NM> Warned.

NM> __ NOD32 3387 (20080826) Information __

NM> This message was checked by NOD32 antivirus system.
NM> http://www.eset.com

-- 
Aleksey Gogolev
developer, 
dev.co.ua
Aleksey mailto:[EMAIL PROTECTED]

Wrong sort by score

2008-08-27 Thread Yuri Jan

Hi,

I have encountered a weird problem in solr.
In one of my queries (dismax, default sorting) I noticed that the results
are not sorted by score (according to debugQuery).
The first 150 results are tied (with score 12.806474), and after those,
there is a bunch of results with higher score (12.962835).

What can be the cause?
I'm overriding the tf function in my similarity class. Can it be related?

Thanks,
Yuri

Re: Wrong sort by score

2008-08-27 Thread Yonik Seeley

On Wed, Aug 27, 2008 at 9:10 AM, Yuri Jan <[EMAIL PROTECTED]> wrote:
> I have encountered a weird problem in solr.
> In one of my queries (dismax, default sorting) I noticed that the results
> are not sorted by score (according to debugQuery).
>
> The first 150 results are tied (with score 12.806474), and after those,
> there is a bunch of results with higher score (12.962835).
>
> What can be the cause?
> I'm overriding the tf function in my similarity class. Can it be related?

Do the explain scores in the debug section match the normal scores
paired with the documents?  (add score to the fl parameter to get a
score with each document).

-Yonik

Re: Wrong sort by score

2008-08-27 Thread Yuri Jan

Actually, no...
The score in the fl are 12.806475 and 10.386531 respectively, so the results
according to that are sorted correctly.
Is it just a problem with the debugQuery?

On Wed, Aug 27, 2008 at 9:21 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

>  On Wed, Aug 27, 2008 at 9:10 AM, Yuri Jan <[EMAIL PROTECTED]> wrote:
> > I have encountered a weird problem in solr.
> > In one of my queries (dismax, default sorting) I noticed that the results
> > are not sorted by score (according to debugQuery).
> >
> > The first 150 results are tied (with score 12.806474), and after those,
> > there is a bunch of results with higher score (12.962835).
> >
> > What can be the cause?
> > I'm overriding the tf function in my similarity class. Can it be related?
>
> Do the explain scores in the debug section match the normal scores
> paired with the documents?  (add score to the fl parameter to get a
> score with each document).
>
> -Yonik
>

Re: SpellCheckComponent bug?

2008-08-27 Thread Grant Ingersoll

Hmm, sounds like a bug.  A test case would be great, but at a minimum  
file a JIRA.


Do those other terms that collate properly have multiple suggestions?

On Aug 25, 2008, at 6:24 PM, Matthew Runo wrote:


Hello folks!

I seem to be seeing a bug in the SpellCheckComponent..

Search term: Quicksilver... I get two suggestions...


2
Quicksilver



220
Quiksilver


...and it's not correctly spelled...

false

...but the collation is of the first term - not the one with the  
highest frequency?


Quicksilver

This seems to be anti-what-the-docs-say collation should do. Other,  
more popular terms (shoez, runnning, etc) all seem to collate  
properly. I'm hitting Solr via SolrJ and not really doing anything  
too fancy - using SVN head at the moment. Just wondered if anyone  
had any ideas. There are no synonyms in this system, so I don't  
think that could be it. I've rebuilt the search index.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

Re: SpellCheckComponent bug?

2008-08-27 Thread Matthew Runo

runnning does have multiple suggestions, Cunning and Running - but it  
properly picks Running. I have not noticed this for any other term,  
but I have not exhaustively tested others yet.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Aug 27, 2008, at 7:52 AM, Grant Ingersoll wrote:

Hmm, sounds like a bug.  A test case would be great, but at a  
minimum file a JIRA.


Do those other terms that collate properly have multiple suggestions?

On Aug 25, 2008, at 6:24 PM, Matthew Runo wrote:


Hello folks!

I seem to be seeing a bug in the SpellCheckComponent..

Search term: Quicksilver... I get two suggestions...


2
Quicksilver



220
Quiksilver


...and it's not correctly spelled...

false

...but the collation is of the first term - not the one with the  
highest frequency?


Quicksilver

This seems to be anti-what-the-docs-say collation should do. Other,  
more popular terms (shoez, runnning, etc) all seem to collate  
properly. I'm hitting Solr via SolrJ and not really doing anything  
too fancy - using SVN head at the moment. Just wondered if anyone  
had any ideas. There are no synonyms in this system, so I don't  
think that could be it. I've rebuilt the search index.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

Re: Wrong sort by score

2008-08-27 Thread Yonik Seeley

On Wed, Aug 27, 2008 at 9:38 AM, Yuri Jan <[EMAIL PROTECTED]> wrote:
> Actually, no...
> The score in the fl are 12.806475 and 10.386531 respectively, so the results
> according to that are sorted correctly.
> Is it just a problem with the debugQuery?

Looks like it... I guess the custom similarity isn't being used when
explain() is called.
Did you register this custom similarity in the schema?
If so, can you file a JIRA bug for this?

-Yonik


> On Wed, Aug 27, 2008 at 9:21 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
>>  On Wed, Aug 27, 2008 at 9:10 AM, Yuri Jan <[EMAIL PROTECTED]> wrote:
>> > I have encountered a weird problem in solr.
>> > In one of my queries (dismax, default sorting) I noticed that the results
>> > are not sorted by score (according to debugQuery).
>> >
>> > The first 150 results are tied (with score 12.806474), and after those,
>> > there is a bunch of results with higher score (12.962835).
>> >
>> > What can be the cause?
>> > I'm overriding the tf function in my similarity class. Can it be related?
>>
>> Do the explain scores in the debug section match the normal scores
>> paired with the documents?  (add score to the fl parameter to get a
>> score with each document).
>>
>> -Yonik
>>
>

Re: How does Solr search when a field is not specified?

2008-08-27 Thread Jake Conk

Thanks Otis! :)

On Tue, Aug 26, 2008 at 10:47 PM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> Jake,
>
> Yes, that field would have to be some kind of an analyzed field (e.g. text), 
> not string if you wanted that query to match "Jake is Testing" input.  There 
> are no built-in Lucene or Solr-specific limits on field lengths.  There is 
> one parameter called maxFieldLength in Solr's solrconfig.xml, I think, 
> which tells Lucene how many tokens to consider for indexing.  If you don't 
> want that limit, increase that parameter's value to the max.
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: Jake Conk <[EMAIL PROTECTED]>
>> To: solr-user@lucene.apache.org
>> Sent: Tuesday, August 26, 2008 4:38:09 PM
>> Subject: How does Solr search when a field is not specified?
>>
>> Hello,
>>
>> I was wondering how does Solr search when a field is not specified,
>> just a query? Say for example I got the following:
>>
>> ?q="Jake" AND "Test"
>>
>> I have a mixture of integer, string, and text columns. Some indexed,
>> some stored, and some string fields copied to text fields.
>>
>> Say I have a string field with the value "Jake is Testing" which is
>> also copied to a text field. If I did not copyField that string field
>> to a text field then would the above query not return any results if
>> the word "Jake" and "Test" are not found anywhere else since we cannot
>> do fulltext searches on string fields?
>>
>> Lastly, is there a limit how many characters can be in a string and text 
>> field?
>>
>> Thanks,
>> - Jake
>
>

Distributed Search Test

2008-08-27 Thread Ronald Aubin

Hello,
I have been performing some simple distributed search tests and don't
understand why distributed search seems to work in some circumstances but
not others.

In my setup I have compiled the example server using the solr trunk
downloaded on Aug 22nd.  I am running a sample server instance on 2 separate
hosts (localhost and "fred").  I've added a portion of  the sample docs
[a-n]*.xml to the local host solr server, and added the other portion,
[m-z]*.xml sample docs to host fred.

Assuming that I have setup things correctly, I would expect to receive a see
non zero length SolrDocumentList for any distributed search that matches
syntax in the example docs.

Specifically when I test the contents of each server separately ( using the
included TestCase ) the tests pass. This confirms that each server has
different documents.  However when I do the distributed tests, it seems the
tests pass or fail based on the initial URL passed in the
createNewSolrServer(String URL).  I realize a real junit should be self
contained, unlike this one.

junit test  testDistrbutedSearch() passes, while testDistrbutedSearch2()
fails. Why?

My understanding is that each host should send a query to all shards and
collate the responses, and return them to the client. Is this true?

Ron


Here is my TestCase;

package org.apache.solr.client.solrj.ron;

import junit.framework.TestCase;

import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.client.solrj.response.SolrPingResponse;
import org.apache.solr.common.SolrDocumentList;
import org.apache.solr.common.params.ShardParams;

public class SolrExampleDistributedTest extends TestCase {

int port = 8983;
static final String context = "/solr";

static String SOLR_SHARD1 = "localhost:8983/solr";
static String SOLR_SHARD2 = "fred:8983/solr";
static String SOLR_SHARDS = SOLR_SHARD1 + "," + SOLR_SHARD2;
static String HTTP_PREFIX = "http://";;
static String SOLR_URL1 = HTTP_PREFIX + SOLR_SHARD1;
static String SOLR_URL2 = HTTP_PREFIX + SOLR_SHARD2;
static String QUERY1 = "Samsung";
static String QUERY2 = "solr";

@Override
public void setUp() throws Exception {
super.setUp();

}

public SolrExampleDistributedTest(String name) {
super(name);
}

@Override
public void tearDown() throws Exception {
super.tearDown();
}

protected SolrServer createNewSolrServer(String url) {
try {

CommonsHttpSolrServer s = new CommonsHttpSolrServer(url);
s.setConnectionTimeout(100); // 1/10th sec
s.setDefaultMaxConnectionsPerHost(100);
s.setMaxTotalConnections(100);
return s;
} catch (Exception ex) {
throw new RuntimeException(ex);
}
}

public void testLocalhost() {
try {
SolrServer server = createNewSolrServer(SOLR_URL1);

SolrQuery query = new SolrQuery();
query.setQuery(QUERY1);
QueryResponse qr = server.query(query);
SolrDocumentList sdl = qr.getResults();
assertTrue(sdl.getNumFound() > 0);

query = new SolrQuery();
query.setQuery(QUERY2);
qr = server.query(query);
sdl = qr.getResults();
assertTrue(sdl.getNumFound() == 0);

} catch (Exception ex) {
ex.printStackTrace();
fail();
}
}

public void testRemoteHost() {
try {
SolrServer server = createNewSolrServer(SOLR_URL2);

SolrQuery query = new SolrQuery();
query.setQuery(QUERY1);
QueryResponse qr = server.query(query);
SolrDocumentList sdl = qr.getResults();
assertTrue(sdl.getNumFound() == 0);

query = new SolrQuery();
query.setQuery(QUERY2);
qr = server.query(query);
sdl = qr.getResults();
assertTrue(sdl.getNumFound() > 0);
} catch (Exception ex) {
// expected
ex.printStackTrace();
fail();
}
}

public void testDistrbutedSearch() {
try {
SolrServer server = createNewSolrServer(SOLR_URL1);

SolrQuery query = new SolrQuery();
query.setQuery(QUERY1);

query.setParam(ShardParams.SHARDS, SOLR_SHARDS);
QueryResponse qr = server.query(query);
SolrDocumentList sdl = qr.getResults();
assertTrue(sdl.getNumFound() > 0);

SolrQuery query2 = new SolrQuery();
query2.setQuery(QUERY2);
query2.setParam(ShardParams.SHARDS, SOLR_SHARDS);
QueryResponse qr2 = server.query(query);
SolrDocumentList sdl2 = qr2.getResults();
assertTrue(sdl.getNumFound() >

Re: Sorting and also looking at stored fields

2008-08-27 Thread jennyv

Aha! Yep, that's the problem (not set to store in schema.xml)! Thanks!

Re: Distributed Search Test

2008-08-27 Thread Yonik Seeley

It fails because you are using "localhost" as part of a shard name.
When you send the request to "fred" it uses the "fred" shard and the
"localhost" shard (which is the same as fred!)

-Yonik

On Wed, Aug 27, 2008 at 12:07 PM, Ronald Aubin <[EMAIL PROTECTED]> wrote:
> Hello,
>I have been performing some simple distributed search tests and don't
> understand why distributed search seems to work in some circumstances but
> not others.
>
> In my setup I have compiled the example server using the solr trunk
> downloaded on Aug 22nd.  I am running a sample server instance on 2 separate
> hosts (localhost and "fred").  I've added a portion of  the sample docs
> [a-n]*.xml to the local host solr server, and added the other portion,
> [m-z]*.xml sample docs to host fred.
>
> Assuming that I have setup things correctly, I would expect to receive a see
> non zero length SolrDocumentList for any distributed search that matches
> syntax in the example docs.
>
> Specifically when I test the contents of each server separately ( using the
> included TestCase ) the tests pass. This confirms that each server has
> different documents.  However when I do the distributed tests, it seems the
> tests pass or fail based on the initial URL passed in the
> createNewSolrServer(String URL).  I realize a real junit should be self
> contained, unlike this one.
>
> junit test  testDistrbutedSearch() passes, while testDistrbutedSearch2()
> fails. Why?
>
> My understanding is that each host should send a query to all shards and
> collate the responses, and return them to the client. Is this true?
>
> Ron
>
>
> Here is my TestCase;
>
> package org.apache.solr.client.solrj.ron;
>
> import junit.framework.TestCase;
>
> import org.apache.solr.client.solrj.SolrQuery;
> import org.apache.solr.client.solrj.SolrServer;
> import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
> import org.apache.solr.client.solrj.response.QueryResponse;
> import org.apache.solr.client.solrj.response.SolrPingResponse;
> import org.apache.solr.common.SolrDocumentList;
> import org.apache.solr.common.params.ShardParams;
>
> public class SolrExampleDistributedTest extends TestCase {
>
>int port = 8983;
>static final String context = "/solr";
>
>static String SOLR_SHARD1 = "localhost:8983/solr";
>static String SOLR_SHARD2 = "fred:8983/solr";
>static String SOLR_SHARDS = SOLR_SHARD1 + "," + SOLR_SHARD2;
>static String HTTP_PREFIX = "http://";;
>static String SOLR_URL1 = HTTP_PREFIX + SOLR_SHARD1;
>static String SOLR_URL2 = HTTP_PREFIX + SOLR_SHARD2;
>static String QUERY1 = "Samsung";
>static String QUERY2 = "solr";
>
>@Override
>public void setUp() throws Exception {
>super.setUp();
>
>}
>
>public SolrExampleDistributedTest(String name) {
>super(name);
>}
>
>@Override
>public void tearDown() throws Exception {
>super.tearDown();
>}
>
>protected SolrServer createNewSolrServer(String url) {
>try {
>
>CommonsHttpSolrServer s = new CommonsHttpSolrServer(url);
>s.setConnectionTimeout(100); // 1/10th sec
>s.setDefaultMaxConnectionsPerHost(100);
>s.setMaxTotalConnections(100);
>return s;
>} catch (Exception ex) {
>throw new RuntimeException(ex);
>}
>}
>
>public void testLocalhost() {
>try {
>SolrServer server = createNewSolrServer(SOLR_URL1);
>
>SolrQuery query = new SolrQuery();
>query.setQuery(QUERY1);
>QueryResponse qr = server.query(query);
>SolrDocumentList sdl = qr.getResults();
>assertTrue(sdl.getNumFound() > 0);
>
>query = new SolrQuery();
>query.setQuery(QUERY2);
>qr = server.query(query);
>sdl = qr.getResults();
>assertTrue(sdl.getNumFound() == 0);
>
>} catch (Exception ex) {
>ex.printStackTrace();
>fail();
>}
>}
>
>public void testRemoteHost() {
>try {
>SolrServer server = createNewSolrServer(SOLR_URL2);
>
>SolrQuery query = new SolrQuery();
>query.setQuery(QUERY1);
>QueryResponse qr = server.query(query);
>SolrDocumentList sdl = qr.getResults();
>assertTrue(sdl.getNumFound() == 0);
>
>query = new SolrQuery();
>query.setQuery(QUERY2);
>qr = server.query(query);
>sdl = qr.getResults();
>assertTrue(sdl.getNumFound() > 0);
>} catch (Exception ex) {
>// expected
>ex.printStackTrace();
>fail();
>}
>}
>
>public void testDistrbutedSearch() {
>try {
>SolrServer server = createNewSolrServer(SOLR_URL1);
>
>SolrQuery query = new SolrQuery();
>query.setQuery(QUERY1);
>
>query.setParam(ShardParams.SHARDS, SOLR_SHARDS);
>

Re: Distributed Search Test

2008-08-27 Thread Ronald Aubin

Yonik,
  Thanks for your reply.  I'm not sure if I understand completely.   Do you
mean that each solr server should be given a different shard list and not a
list containing all shards?

So in my case:
1) host fred should be given a shard list containing only locahost,
2)  localhost should be given a shard list of fred

I'll give it a try.

Thanks again

Ron

On Wed, Aug 27, 2008 at 12:21 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> It fails because you are using "localhost" as part of a shard name.
> When you send the request to "fred" it uses the "fred" shard and the
> "localhost" shard (which is the same as fred!)
>
> -Yonik
>
> On Wed, Aug 27, 2008 at 12:07 PM, Ronald Aubin <[EMAIL PROTECTED]>
> wrote:
> > Hello,
> >I have been performing some simple distributed search tests and don't
> > understand why distributed search seems to work in some circumstances but
> > not others.
> >
> > In my setup I have compiled the example server using the solr trunk
> > downloaded on Aug 22nd.  I am running a sample server instance on 2
> separate
> > hosts (localhost and "fred").  I've added a portion of  the sample docs
> > [a-n]*.xml to the local host solr server, and added the other portion,
> > [m-z]*.xml sample docs to host fred.
> >
> > Assuming that I have setup things correctly, I would expect to receive a
> see
> > non zero length SolrDocumentList for any distributed search that matches
> > syntax in the example docs.
> >
> > Specifically when I test the contents of each server separately ( using
> the
> > included TestCase ) the tests pass. This confirms that each server has
> > different documents.  However when I do the distributed tests, it seems
> the
> > tests pass or fail based on the initial URL passed in the
> > createNewSolrServer(String URL).  I realize a real junit should be self
> > contained, unlike this one.
> >
> > junit test  testDistrbutedSearch() passes, while testDistrbutedSearch2()
> > fails. Why?
> >
> > My understanding is that each host should send a query to all shards and
> > collate the responses, and return them to the client. Is this true?
> >
> > Ron
> >
> >
> > Here is my TestCase;
> >
> > package org.apache.solr.client.solrj.ron;
> >
> > import junit.framework.TestCase;
> >
> > import org.apache.solr.client.solrj.SolrQuery;
> > import org.apache.solr.client.solrj.SolrServer;
> > import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
> > import org.apache.solr.client.solrj.response.QueryResponse;
> > import org.apache.solr.client.solrj.response.SolrPingResponse;
> > import org.apache.solr.common.SolrDocumentList;
> > import org.apache.solr.common.params.ShardParams;
> >
> > public class SolrExampleDistributedTest extends TestCase {
> >
> >int port = 8983;
> >static final String context = "/solr";
> >
> >static String SOLR_SHARD1 = "localhost:8983/solr";
> >static String SOLR_SHARD2 = "fred:8983/solr";
> >static String SOLR_SHARDS = SOLR_SHARD1 + "," + SOLR_SHARD2;
> >static String HTTP_PREFIX = "http://";;
> >static String SOLR_URL1 = HTTP_PREFIX + SOLR_SHARD1;
> >static String SOLR_URL2 = HTTP_PREFIX + SOLR_SHARD2;
> >static String QUERY1 = "Samsung";
> >static String QUERY2 = "solr";
> >
> >@Override
> >public void setUp() throws Exception {
> >super.setUp();
> >
> >}
> >
> >public SolrExampleDistributedTest(String name) {
> >super(name);
> >}
> >
> >@Override
> >public void tearDown() throws Exception {
> >super.tearDown();
> >}
> >
> >protected SolrServer createNewSolrServer(String url) {
> >try {
> >
> >CommonsHttpSolrServer s = new CommonsHttpSolrServer(url);
> >s.setConnectionTimeout(100); // 1/10th sec
> >s.setDefaultMaxConnectionsPerHost(100);
> >s.setMaxTotalConnections(100);
> >return s;
> >} catch (Exception ex) {
> >throw new RuntimeException(ex);
> >}
> >}
> >
> >public void testLocalhost() {
> >try {
> >SolrServer server = createNewSolrServer(SOLR_URL1);
> >
> >SolrQuery query = new SolrQuery();
> >query.setQuery(QUERY1);
> >QueryResponse qr = server.query(query);
> >SolrDocumentList sdl = qr.getResults();
> >assertTrue(sdl.getNumFound() > 0);
> >
> >query = new SolrQuery();
> >query.setQuery(QUERY2);
> >qr = server.query(query);
> >sdl = qr.getResults();
> >assertTrue(sdl.getNumFound() == 0);
> >
> >} catch (Exception ex) {
> >ex.printStackTrace();
> >fail();
> >}
> >}
> >
> >public void testRemoteHost() {
> >try {
> >SolrServer server = createNewSolrServer(SOLR_URL2);
> >
> >SolrQuery query = new SolrQuery();
> >query.setQuery(QUERY1);
> >QueryResponse qr = server.query(query);
> >SolrDocument

java.io.FileNotFoundException: no segments* file found

2008-08-27 Thread Jeremy Hinegardner

Hi all,

I've had a multicore system running for while now, and I just cycled the
jetty server and all of a sudden I got this error:

SEVERE: java.lang.RuntimeException: java.io.FileNotFoundException: no 
segments* file found in 
org.apache.lucene.store.FSDirectory@/opt/cisearch/ci-content-search/solr/cores/0601_0/data/index:
 files:
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:899)
at org.apache.solr.core.SolrCore.(SolrCore.java:450)
at org.apache.solr.core.MultiCore.create(MultiCore.java:255)
at org.apache.solr.core.MultiCore.load(MultiCore.java:139)
at 
org.apache.solr.servlet.SolrDispatchFilter.initMultiCore(SolrDispatchFilter.java:147)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:72)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
at org.mortbay.jetty.Server.doStart(Server.java:210)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)
Caused by: java.io.FileNotFoundException: no segments* file found in 
org.apache.lucene.store.FSDirectory@/opt/cisearch/ci-content-search/solr/cores/0601_0/data/index:
 files:
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:600)
at 
org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:81)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
at 
org.apache.solr.search.SolrIndexSearcher.(SolrIndexSearcher.java:94)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:890)
... 29 more  

Of course, the odd thing is that the segments* file does exist:

  % ls -1 /opt/cisearch/ci-content-search/solr/cores/0601_0/data/index/segments*
  /opt/cisearch/ci-content-search/solr/cores/0601_0/data/index/segments_32i
  /opt/cisearch/ci-content-search/solr/cores/0601_0/data/index/segments.gen

Any ideas on what could cause this?  The only thing I can think of off the top
of my head is that the core was coming up at the moment between the
snapinstaller steps of:

  1) /bin/rm -rf ${data_dir}/${index} &&
  2) mv -f ${data_dir}/${index}.tmp$$ ${data_dir}/${index}

Any other thoughts / conjectures ?

enjoy,

-jeremy

-- 

 Jeremy Hinegardner  [EMAIL PROTECTED]

Re: Wrong sort by score

2008-08-27 Thread Yuri Jan

It seems like the debug information is using the custom similarity as it
should - the bug isn't there.
I see in the explain information the right tf value (I modified it to be 1
in my custom similarity).
The numbers in the explain seem to add up and make sense.
Is it possible that the score itself is wrong (the one that I get from fl)?

Thanks,
Yuri

On Wed, Aug 27, 2008 at 11:44 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> On Wed, Aug 27, 2008 at 9:38 AM, Yuri Jan <[EMAIL PROTECTED]> wrote:
> > Actually, no...
> > The score in the fl are 12.806475 and 10.386531 respectively, so the
> results
> > according to that are sorted correctly.
> > Is it just a problem with the debugQuery?
>
> Looks like it... I guess the custom similarity isn't being used when
> explain() is called.
> Did you register this custom similarity in the schema?
> If so, can you file a JIRA bug for this?
>
> -Yonik
>
>
> > On Wed, Aug 27, 2008 at 9:21 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> >
> >>  On Wed, Aug 27, 2008 at 9:10 AM, Yuri Jan <[EMAIL PROTECTED]> wrote:
> >> > I have encountered a weird problem in solr.
> >> > In one of my queries (dismax, default sorting) I noticed that the
> results
> >> > are not sorted by score (according to debugQuery).
> >> >
> >> > The first 150 results are tied (with score 12.806474), and after
> those,
> >> > there is a bunch of results with higher score (12.962835).
> >> >
> >> > What can be the cause?
> >> > I'm overriding the tf function in my similarity class. Can it be
> related?
> >>
> >> Do the explain scores in the debug section match the normal scores
> >> paired with the documents?  (add score to the fl parameter to get a
> >> score with each document).
> >>
> >> -Yonik
> >>
> >
>

Re: Distributed Search Test

2008-08-27 Thread Ronald Aubin

Yonik,
I now  perfectly understand. Thanks for your help. All my tests now
work.

Ron

On Wed, Aug 27, 2008 at 12:21 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> It fails because you are using "localhost" as part of a shard name.
> When you send the request to "fred" it uses the "fred" shard and the
> "localhost" shard (which is the same as fred!)
>
> -Yonik
>
> On Wed, Aug 27, 2008 at 12:07 PM, Ronald Aubin <[EMAIL PROTECTED]>
> wrote:
> > Hello,
> >I have been performing some simple distributed search tests and don't
> > understand why distributed search seems to work in some circumstances but
> > not others.
> >
> > In my setup I have compiled the example server using the solr trunk
> > downloaded on Aug 22nd.  I am running a sample server instance on 2
> separate
> > hosts (localhost and "fred").  I've added a portion of  the sample docs
> > [a-n]*.xml to the local host solr server, and added the other portion,
> > [m-z]*.xml sample docs to host fred.
> >
> > Assuming that I have setup things correctly, I would expect to receive a
> see
> > non zero length SolrDocumentList for any distributed search that matches
> > syntax in the example docs.
> >
> > Specifically when I test the contents of each server separately ( using
> the
> > included TestCase ) the tests pass. This confirms that each server has
> > different documents.  However when I do the distributed tests, it seems
> the
> > tests pass or fail based on the initial URL passed in the
> > createNewSolrServer(String URL).  I realize a real junit should be self
> > contained, unlike this one.
> >
> > junit test  testDistrbutedSearch() passes, while testDistrbutedSearch2()
> > fails. Why?
> >
> > My understanding is that each host should send a query to all shards and
> > collate the responses, and return them to the client. Is this true?
> >
> > Ron
> >
> >
> > Here is my TestCase;
> >
> > package org.apache.solr.client.solrj.ron;
> >
> > import junit.framework.TestCase;
> >
> > import org.apache.solr.client.solrj.SolrQuery;
> > import org.apache.solr.client.solrj.SolrServer;
> > import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
> > import org.apache.solr.client.solrj.response.QueryResponse;
> > import org.apache.solr.client.solrj.response.SolrPingResponse;
> > import org.apache.solr.common.SolrDocumentList;
> > import org.apache.solr.common.params.ShardParams;
> >
> > public class SolrExampleDistributedTest extends TestCase {
> >
> >int port = 8983;
> >static final String context = "/solr";
> >
> >static String SOLR_SHARD1 = "localhost:8983/solr";
> >static String SOLR_SHARD2 = "fred:8983/solr";
> >static String SOLR_SHARDS = SOLR_SHARD1 + "," + SOLR_SHARD2;
> >static String HTTP_PREFIX = "http://";;
> >static String SOLR_URL1 = HTTP_PREFIX + SOLR_SHARD1;
> >static String SOLR_URL2 = HTTP_PREFIX + SOLR_SHARD2;
> >static String QUERY1 = "Samsung";
> >static String QUERY2 = "solr";
> >
> >@Override
> >public void setUp() throws Exception {
> >super.setUp();
> >
> >}
> >
> >public SolrExampleDistributedTest(String name) {
> >super(name);
> >}
> >
> >@Override
> >public void tearDown() throws Exception {
> >super.tearDown();
> >}
> >
> >protected SolrServer createNewSolrServer(String url) {
> >try {
> >
> >CommonsHttpSolrServer s = new CommonsHttpSolrServer(url);
> >s.setConnectionTimeout(100); // 1/10th sec
> >s.setDefaultMaxConnectionsPerHost(100);
> >s.setMaxTotalConnections(100);
> >return s;
> >} catch (Exception ex) {
> >throw new RuntimeException(ex);
> >}
> >}
> >
> >public void testLocalhost() {
> >try {
> >SolrServer server = createNewSolrServer(SOLR_URL1);
> >
> >SolrQuery query = new SolrQuery();
> >query.setQuery(QUERY1);
> >QueryResponse qr = server.query(query);
> >SolrDocumentList sdl = qr.getResults();
> >assertTrue(sdl.getNumFound() > 0);
> >
> >query = new SolrQuery();
> >query.setQuery(QUERY2);
> >qr = server.query(query);
> >sdl = qr.getResults();
> >assertTrue(sdl.getNumFound() == 0);
> >
> >} catch (Exception ex) {
> >ex.printStackTrace();
> >fail();
> >}
> >}
> >
> >public void testRemoteHost() {
> >try {
> >SolrServer server = createNewSolrServer(SOLR_URL2);
> >
> >SolrQuery query = new SolrQuery();
> >query.setQuery(QUERY1);
> >QueryResponse qr = server.query(query);
> >SolrDocumentList sdl = qr.getResults();
> >assertTrue(sdl.getNumFound() == 0);
> >
> >query = new SolrQuery();
> >query.setQuery(QUERY2);
> >qr = server.query(query);
> >sdl = qr.getResults();
> >assertTrue(sdl.g

Re: Distributed Search Test

2008-08-27 Thread Yonik Seeley

On Wed, Aug 27, 2008 at 12:33 PM, Ronald Aubin <[EMAIL PROTECTED]> wrote:
>  Thanks for your reply.  I'm not sure if I understand completely.   Do you
> mean that each solr server should be given a different shard list and not a
> list containing all shards?

You could use the same shard list (as long as it doesn't contain
localhost), or you could use different ones (as long as localhost was
correctly substituted for the host you are talking to).  I'd recommend
avoiding "localhost" in the shard list unless all of your shards
happen to be on the local host.

Example: If you have hosta, hostb, then
querying hosta with shards=hosta,hostb or shards=localhost,hostb will
work (they are equivalent)
querying hostb with shards=hosta,hostb or shards=hosta,localhost will
work (they are equivalent)
BUT
querying hostb with shards=localhost,hostb is equivalent to shards=hostb,hostb

-Yonik


> So in my case:
> 1) host fred should be given a shard list containing only locahost,
> 2)  localhost should be given a shard list of fred
>
> I'll give it a try.
>
> Thanks again
>
> Ron
>
> On Wed, Aug 27, 2008 at 12:21 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
>> It fails because you are using "localhost" as part of a shard name.
>> When you send the request to "fred" it uses the "fred" shard and the
>> "localhost" shard (which is the same as fred!)
>>
>> -Yonik
>>
>> On Wed, Aug 27, 2008 at 12:07 PM, Ronald Aubin <[EMAIL PROTECTED]>
>> wrote:
>> > Hello,
>> >I have been performing some simple distributed search tests and don't
>> > understand why distributed search seems to work in some circumstances but
>> > not others.
>> >
>> > In my setup I have compiled the example server using the solr trunk
>> > downloaded on Aug 22nd.  I am running a sample server instance on 2
>> separate
>> > hosts (localhost and "fred").  I've added a portion of  the sample docs
>> > [a-n]*.xml to the local host solr server, and added the other portion,
>> > [m-z]*.xml sample docs to host fred.
>> >
>> > Assuming that I have setup things correctly, I would expect to receive a
>> see
>> > non zero length SolrDocumentList for any distributed search that matches
>> > syntax in the example docs.
>> >
>> > Specifically when I test the contents of each server separately ( using
>> the
>> > included TestCase ) the tests pass. This confirms that each server has
>> > different documents.  However when I do the distributed tests, it seems
>> the
>> > tests pass or fail based on the initial URL passed in the
>> > createNewSolrServer(String URL).  I realize a real junit should be self
>> > contained, unlike this one.
>> >
>> > junit test  testDistrbutedSearch() passes, while testDistrbutedSearch2()
>> > fails. Why?
>> >
>> > My understanding is that each host should send a query to all shards and
>> > collate the responses, and return them to the client. Is this true?
>> >
>> > Ron
>> >
>> >
>> > Here is my TestCase;
>> >
>> > package org.apache.solr.client.solrj.ron;
>> >
>> > import junit.framework.TestCase;
>> >
>> > import org.apache.solr.client.solrj.SolrQuery;
>> > import org.apache.solr.client.solrj.SolrServer;
>> > import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
>> > import org.apache.solr.client.solrj.response.QueryResponse;
>> > import org.apache.solr.client.solrj.response.SolrPingResponse;
>> > import org.apache.solr.common.SolrDocumentList;
>> > import org.apache.solr.common.params.ShardParams;
>> >
>> > public class SolrExampleDistributedTest extends TestCase {
>> >
>> >int port = 8983;
>> >static final String context = "/solr";
>> >
>> >static String SOLR_SHARD1 = "localhost:8983/solr";
>> >static String SOLR_SHARD2 = "fred:8983/solr";
>> >static String SOLR_SHARDS = SOLR_SHARD1 + "," + SOLR_SHARD2;
>> >static String HTTP_PREFIX = "http://";;
>> >static String SOLR_URL1 = HTTP_PREFIX + SOLR_SHARD1;
>> >static String SOLR_URL2 = HTTP_PREFIX + SOLR_SHARD2;
>> >static String QUERY1 = "Samsung";
>> >static String QUERY2 = "solr";
>> >
>> >@Override
>> >public void setUp() throws Exception {
>> >super.setUp();
>> >
>> >}
>> >
>> >public SolrExampleDistributedTest(String name) {
>> >super(name);
>> >}
>> >
>> >@Override
>> >public void tearDown() throws Exception {
>> >super.tearDown();
>> >}
>> >
>> >protected SolrServer createNewSolrServer(String url) {
>> >try {
>> >
>> >CommonsHttpSolrServer s = new CommonsHttpSolrServer(url);
>> >s.setConnectionTimeout(100); // 1/10th sec
>> >s.setDefaultMaxConnectionsPerHost(100);
>> >s.setMaxTotalConnections(100);
>> >return s;
>> >} catch (Exception ex) {
>> >throw new RuntimeException(ex);
>> >}
>> >}
>> >
>> >public void testLocalhost() {
>> >try {
>> >SolrServer server = createNewSolrServer(SOLR_URL1);
>> >
>> >SolrQuery query = new SolrQuery();
>> >

Replacing FAST functionality at sesam.no

2008-08-27 Thread Glenn-Erik Sandbakken

At sesam.no we want to replace a FAST (fast.no) Query Matching Server
with a Solr index.

The index we are trying to replace is not a regular index, but specially
configured to perform phrases (and sub-phrases) matches against several
large lists (like an index with only a 'title' field).

I'm not sure of a correct, or logical, name for the behavior we are
after, but it is like a combination between Shingles and exact matching.

Some examples should explain it well.

Lets say we have the following list:
> one two three
> one two
> two three
> one
> two
> three
> three two
> two one
> one three
> three one

For the query "one two three", we need hits against, and only against:
> one two three
> one two
> two three
> one
> two
> three

For the query "one two", we need hits against, and only against:
> one two
> one
> two

For the query "one three four" (or "four one three"), we need hits
against, and only against:
> one three
> one
> three

For the query "one two sesam three", we need hits against, and only
against:
> one two
> one
> two
> three

We have been testing out solr with the ShingleFilter for this, but
without luck.
I am unsure whether the reason is misconfiguration in schema.xml or that
the ShingleFilter actually don't support this type of behavior.
Attached our current schema.xml
(it is different from when I made this post to the solr-dev mailinglist,
the shingle "fieldType" is of class "solr.StrField")
Attached is screenshots of the solr/admin/analysis.jsp against this
configuration.

I'd like to know if the SchingleFilter is at all able to do what we
want.
 If it is: How can I configure schema.xml?
 If not: does there exist any other solutions that we can incorporate
into solr which will give us this behavior?

If there is no existing solution to this, we will probably end up
writing our own methods for it, extending the ShingleFilter, gladly
contributing to the solr project =)

Thanks for a great product,
Glenn-Erik



schema.xml
Description: XML document

odd 500 error

2008-08-27 Thread Andrew Nagy

Hello - I stumbled across an odd error which my intuition is telling me is a 
bug.

Here is my installation:
Solr Specification Version: 1.2.2008.08.13.13.05.16
Lucene Implementation Version: 2.4-dev 685576 - 2008-08-13 10:55:25

I did the following query today:
author:(r*a* AND fisher)

And get the following 500 error:

maxClauseCount is set to 1024

org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 
1024
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:165)
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:156)
at 
org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:63)
at org.apache.lucene.search.WildcardQuery.rewrite(WildcardQuery.java:54)
at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
at 
org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:163)
at org.apache.lucene.search.Query.weight(Query.java:94)
at org.apache.lucene.search.Searcher.createWeight(Searcher.java:175)
at org.apache.lucene.search.Searcher.search(Searcher.java:126)
at org.apache.lucene.search.Searcher.search(Searcher.java:105)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:966)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:167)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1156)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1088)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:360)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:729)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:206)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:829)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:395)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:488)


Thanks
Andrew

Re: odd 500 error

2008-08-27 Thread Yonik Seeley

On Wed, Aug 27, 2008 at 2:21 PM, Andrew Nagy <[EMAIL PROTECTED]> wrote:
> Hello - I stumbled across an odd error which my intuition is telling me is a 
> bug.

Unfortunately, wildcard queries can expand to an undefined number of terms.
This was the reason ConstantScorePrefixQuery and
ConstantScoreRangeQuery were introduced, but I never got around to
ConstantScoreWildcardQuery.  So this is a known limitation.

-Yonik


> Here is my installation:
> Solr Specification Version: 1.2.2008.08.13.13.05.16
> Lucene Implementation Version: 2.4-dev 685576 - 2008-08-13 10:55:25
>
> I did the following query today:
> author:(r*a* AND fisher)
>
> And get the following 500 error:
>
> maxClauseCount is set to 1024
>
> org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set 
> to 1024
>at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:165)
>at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:156)
>at 
> org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:63)
>at 
> org.apache.lucene.search.WildcardQuery.rewrite(WildcardQuery.java:54)
>at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
>at 
> org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:163)
>at org.apache.lucene.search.Query.weight(Query.java:94)
>at org.apache.lucene.search.Searcher.createWeight(Searcher.java:175)
>at org.apache.lucene.search.Searcher.search(Searcher.java:126)
>at org.apache.lucene.search.Searcher.search(Searcher.java:105)
>at 
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:966)
>at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)
>at 
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)
>at 
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
>at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:167)
>at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1156)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
>at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1088)
>at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:360)
>at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:729)
>at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:206)
>at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>at org.mortbay.jetty.Server.handle(Server.java:324)
>at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505)
>at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:829)
>at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
>at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211)
>at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380)
>at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:395)
>at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:488)
>
>
> Thanks
> Andrew
>

Re: Replacing FAST functionality at sesam.no

2008-08-27 Thread Otis Gospodnetic

The screenshot didn't make it (some attachments gets stripped)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Glenn-Erik Sandbakken <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, August 27, 2008 1:44:53 PM
> Subject: Replacing FAST functionality at sesam.no
> 
> At sesam.no we want to replace a FAST (fast.no) Query Matching Server
> with a Solr index.
> 
> The index we are trying to replace is not a regular index, but specially
> configured to perform phrases (and sub-phrases) matches against several
> large lists (like an index with only a 'title' field).
> 
> I'm not sure of a correct, or logical, name for the behavior we are
> after, but it is like a combination between Shingles and exact matching.
> 
> Some examples should explain it well.
> 
> Lets say we have the following list:
> > one two three
> > one two
> > two three
> > one
> > two
> > three
> > three two
> > two one
> > one three
> > three one
> 
> For the query "one two three", we need hits against, and only against:
> > one two three
> > one two
> > two three
> > one
> > two
> > three
> 
> For the query "one two", we need hits against, and only against:
> > one two
> > one
> > two
> 
> For the query "one three four" (or "four one three"), we need hits
> against, and only against:
> > one three
> > one
> > three
> 
> For the query "one two sesam three", we need hits against, and only
> against:
> > one two
> > one
> > two
> > three
> 
> We have been testing out solr with the ShingleFilter for this, but
> without luck.
> I am unsure whether the reason is misconfiguration in schema.xml or that
> the ShingleFilter actually don't support this type of behavior.
> Attached our current schema.xml
> (it is different from when I made this post to the solr-dev mailinglist,
> the shingle "fieldType" is of class "solr.StrField")
> Attached is screenshots of the solr/admin/analysis.jsp against this
> configuration.
> 
> I'd like to know if the SchingleFilter is at all able to do what we
> want.
> If it is: How can I configure schema.xml?
> If not: does there exist any other solutions that we can incorporate
> into solr which will give us this behavior?
> 
> If there is no existing solution to this, we will probably end up
> writing our own methods for it, extending the ShingleFilter, gladly
> contributing to the solr project =)
> 
> Thanks for a great product,
> Glenn-Erik

Beginners question: adding a plugin

2008-08-27 Thread Jaco

Hello,

I'm pretty new to Solr, and not a Java expert, and trying to create my own
plug in according to the instructions given in
http://wiki.apache.org/solr/SolrPlugins. I want to integrate an external
stemmer for the Dutch language by creating a new FilterFactory that will
invoke the external stemmer for a TokenStream.

First thing I want to do is just make sure I can get the plug in running.
Here's what I did:
- Take a copy of DutchStemFilterFactory.java, rename it to
TestStemFilterFactory, renamed the class to TestStemFilterFactory
- Successfully compiled the java using javac, and add the .class file to a
jar file
- Put the jar file in SOLR_HOME/lib
- Put a line  in my analyzer
definition in schema.xml
- Restart tomcat

In the Tomcat log, there is an indication that the file is found:

27-Aug-2008 20:58:25 org.apache.solr.core.SolrResourceLoader
createClassLoader
INFO: Adding 'file:/D:/Programs/Solr/lib/Test.jar' to Solr classloader

But then I get errors being reported by Tomcat further down the log file:

SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.TestStemFilterFactory'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:256)
at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:261)
at
org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:83)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
<>
Caused by: java.lang.ClassNotFoundException: solr.TestStemFilterFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
<.>

Probably some configuration issue somewhere, but I am in the dark here (as
said: not a Java expert...). I've tried to find information in mailing list
archives on this, but no luck so far. I'm Running Solr nightly build of
20.08.2008, tomcat 5.5.26 on Windows.

Any help would be much appreciated!

Cheers,

Jaco.

Re: Replacing FAST functionality at sesam.no

2008-08-27 Thread Svein Parnas



On 27. aug.. 2008, at 19.44, Glenn-Erik Sandbakken wrote:


At sesam.no we want to replace a FAST (fast.no) Query Matching Server
with a Solr index.

The index we are trying to replace is not a regular index, but  
specially
configured to perform phrases (and sub-phrases) matches against  
several

large lists (like an index with only a 'title' field).

I'm not sure of a correct, or logical, name for the behavior we are
after, but it is like a combination between Shingles and exact  
matching.


Some examples should explain it well.


In order to do this, you can´t use the ShingleFilter during indexing  
since a document like "one two three" and a query like "one two four"  
will match since they have the shingle "one two" in common.


You will get what you want, I think, if you don´t tokenize during  
indexing (some normalization will be required if your lists aren't  
normalized to begin with) and apply the ShingleFilter only to the  
queries.


Svein

Question about autocomplete feature

2008-08-27 Thread Aleksey Gogolev


Hello.

I'm trying to implement autocomplete feature using the snippet posted
by Dan.
(http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200807.mbox/[EMAIL 
PROTECTED])

Here is the snippet:















...


First I decided to make it working for solr example. So I pasted the
snippet to schema.xml. Then I edited exampledocs/hd.xml, I added the
"ac" field to each doc. Value of "ac" field is a copy of name filed:



  SP2514N
  Samsung SpinPoint P12 SP2514N - hard drive - 250 GB - 
ATA-133
  Samsung SpinPoint P12 SP2514N - hard drive - 250 GB - 
ATA-133
  Samsung Electronics Co. Ltd.
  electronics
  hard drive
  7200RPM, 8MB cache, IDE Ultra ATA-133
  NoiseGuard, SilentSeek technology, Fluid Dynamic 
Bearing (FDB) motor
  92
  6
  true



  6H500F0
  Maxtor DiamondMax 11 - hard drive - 500 GB - 
SATA-300
  Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300
  Maxtor Corp.
  electronics
  hard drive
  SATA 3.0Gb/s, NCQ
  8.5ms seek
  16MB cache
  350
  6
  true



Then I clean solr index, posted hd.xml and restarted solr server. But
when I'm trying to search for "samsu" (the part of word "samsung") I
still get no result. Seems like solr treats "ac" field like the
regular field. 

What did I do wrong?

Thanks in advance.

--
Aleksey Gogolev
developer, 
dev.co.ua
Aleksey

copyField: String vs Text Field

2008-08-27 Thread Jake Conk

Hello,

I was wondering if there was an added advantage in using 
to copy a string field to a text field?

If the field is copied to a text field then why not just make the
field a text field and eliminate copying its data?

If you are going to use full text searching on that field which you
cant do with string fields wouldn't it just make sense to keep it a
text field since it has the same abilities as a string field and more?

... Or is the reason because string fields have better performance on
matching exact strings than text fields?

Thanks,

- Jake

Re: Beginners question: adding a plugin

2008-08-27 Thread Grant Ingersoll

Instead of solr.TestStemFilterFactory, put the fully qualified  
classname for the TestStemFilterFactory, i.e.  
com.my.great.stemmer.TestStemFilterFactory.  The solr.FactoryName  
notation is just shorthand for org.apache.solr.BlahBlahBlah


-Grant

On Aug 27, 2008, at 3:27 PM, Jaco wrote:


Hello,

I'm pretty new to Solr, and not a Java expert, and trying to create  
my own

plug in according to the instructions given in
http://wiki.apache.org/solr/SolrPlugins. I want to integrate an  
external
stemmer for the Dutch language by creating a new FilterFactory that  
will

invoke the external stemmer for a TokenStream.

First thing I want to do is just make sure I can get the plug in  
running.

Here's what I did:
- Take a copy of DutchStemFilterFactory.java, rename it to
TestStemFilterFactory, renamed the class to TestStemFilterFactory
- Successfully compiled the java using javac, and add the .class  
file to a

jar file
- Put the jar file in SOLR_HOME/lib
- Put a line  in my  
analyzer

definition in schema.xml
- Restart tomcat

In the Tomcat log, there is an indication that the file is found:

27-Aug-2008 20:58:25 org.apache.solr.core.SolrResourceLoader
createClassLoader
INFO: Adding 'file:/D:/Programs/Solr/lib/Test.jar' to Solr classloader

But then I get errors being reported by Tomcat further down the log  
file:


SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.TestStemFilterFactory'
   at
org 
.apache 
.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:256)

   at
org 
.apache 
.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:261)

   at
org 
.apache 
.solr 
.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:83)

   at
org 
.apache 
.solr 
.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)

<>
Caused by: java.lang.ClassNotFoundException:  
solr.TestStemFilterFactory

   at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
<.>

Probably some configuration issue somewhere, but I am in the dark  
here (as
said: not a Java expert...). I've tried to find information in  
mailing list
archives on this, but no luck so far. I'm Running Solr nightly build  
of

20.08.2008, tomcat 5.5.26 on Windows.

Any help would be much appreciated!

Cheers,

Jaco.


--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Re: copyField: String vs Text Field

2008-08-27 Thread Yonik Seeley

Jake, copyField exists to decouple document values (on the update
size) from how they are indexed.

>From the example schema:
  

-Yonik

On Wed, Aug 27, 2008 at 4:46 PM, Jake Conk <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I was wondering if there was an added advantage in using 
> to copy a string field to a text field?
>
> If the field is copied to a text field then why not just make the
> field a text field and eliminate copying its data?
>
> If you are going to use full text searching on that field which you
> cant do with string fields wouldn't it just make sense to keep it a
> text field since it has the same abilities as a string field and more?
>
> ... Or is the reason because string fields have better performance on
> matching exact strings than text fields?
>
> Thanks,
>
> - Jake
>

Re: dataimporthandler and mysql connector jar

2008-08-27 Thread Chris Hostetter


: Can you please open a JIRA issue for this? However, we may only be able to
: fix this after 1.3 because a code freeze has been decided upon, to release
: 1.3 asap.

"code freeze" may be overstating it ... the point of the freeze is to hold 
off on new fatures and other misc refactorings and focus on bug fixes and 
documentation improvements.  

This sounds like a bug, and assuming the fix isn't insanely invasive 
there's no reason not to make bug fixes on the 1.3 branch (and merge with 
the trunk).



-Hoss

Re: copyField: String vs Text Field

2008-08-27 Thread Jake Conk

Yonik,

Thanks for the reply. Does that mean that if I were to edit the data
then the field it was copied to will not be updated? I assume it does
get deleted if I delete the record right? I understand how it can make
searching simpler by copying fields to one but would that really make
it faster? How?

Thanks,
- Jake

On Wed, Aug 27, 2008 at 2:22 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> Jake, copyField exists to decouple document values (on the update
> size) from how they are indexed.
>
> From the example schema:
>  
>
> -Yonik
>
> On Wed, Aug 27, 2008 at 4:46 PM, Jake Conk <[EMAIL PROTECTED]> wrote:
>> Hello,
>>
>> I was wondering if there was an added advantage in using 
>> to copy a string field to a text field?
>>
>> If the field is copied to a text field then why not just make the
>> field a text field and eliminate copying its data?
>>
>> If you are going to use full text searching on that field which you
>> cant do with string fields wouldn't it just make sense to keep it a
>> text field since it has the same abilities as a string field and more?
>>
>> ... Or is the reason because string fields have better performance on
>> matching exact strings than text fields?
>>
>> Thanks,
>>
>> - Jake
>>
>

Re: Wrong sort by score

2008-08-27 Thread Chris Hostetter


: It seems like the debug information is using the custom similarity as it
: should - the bug isn't there.
: I see in the explain information the right tf value (I modified it to be 1
: in my custom similarity).
: The numbers in the explain seem to add up and make sense.
: Is it possible that the score itself is wrong (the one that I get from fl)?

the score in the doclist is by definition the correct score - the 
debug info follows a differnet code path and sometimes that code path 
isn't in sink with the the actual searching/scoring code for differnet 
query types (although i was pretty confident that the test i added to 
Lucene-Java a whle back tested this for anything you can see in Solr 
without getting into crazy contrib Query classes)

it would help if you could post:

1) the full debugQuery output from a query where you see this 
disconnect, showing the all query toString info, and the score 
explanations
2) the corrisponding scores you see in the doclist
3) some more details about how your custom similarity works (can you post 
the code)
4) info on how you've configured dismax and what request params you are 
using (the output from using echoParams=all would be good)




-Hoss

Re: copyField: String vs Text Field

2008-08-27 Thread Yonik Seeley

On Wed, Aug 27, 2008 at 7:47 PM, Jake Conk <[EMAIL PROTECTED]> wrote:
> Thanks for the reply. Does that mean that if I were to edit the data
> then the field it was copied to will not be updated?

You can't really "edit" a document in Lucene or Solr, really just
overwrite an old document with an entirely new version.

> I assume it does
> get deleted if I delete the record right? I understand how it can make
> searching simpler by copying fields to one but would that really make
> it faster? How?

Searching a single field for a term is faster than searching multiple
fields for a term.
That's really only one use case though... the other being to have a
single stored field that is analyzed multiple different ways.

-Yonik

> Thanks,
> - Jake
>
> On Wed, Aug 27, 2008 at 2:22 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>> Jake, copyField exists to decouple document values (on the update
>> size) from how they are indexed.
>>
>> From the example schema:
>>  
>>
>> -Yonik
>>
>> On Wed, Aug 27, 2008 at 4:46 PM, Jake Conk <[EMAIL PROTECTED]> wrote:
>>> Hello,
>>>
>>> I was wondering if there was an added advantage in using 
>>> to copy a string field to a text field?
>>>
>>> If the field is copied to a text field then why not just make the
>>> field a text field and eliminate copying its data?
>>>
>>> If you are going to use full text searching on that field which you
>>> cant do with string fields wouldn't it just make sense to keep it a
>>> text field since it has the same abilities as a string field and more?
>>>
>>> ... Or is the reason because string fields have better performance on
>>> matching exact strings than text fields?
>>>
>>> Thanks,
>>>
>>> - Jake
>>>
>>
>

Re: copyField: String vs Text Field

2008-08-27 Thread Walter Underwood

On 8/27/08 5:54 PM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> 
> That's really only one use case though... the other being to have a
> single stored field that is analyzed multiple different ways.

We are the other use case. We take a title and put it in three
fields: one merely lowercased, one stemmed and stopped, and one
phonetic. At query time, we search all three with decreasing
weights. An exact match is weighted more than a stemmed and
stopped match, and so on.

wunder
--
Search Guy, Netflix

Re: dataimporthandler and mysql connector jar

2008-08-27 Thread Shalin Shekhar Mangar

On Thu, Aug 28, 2008 at 5:11 AM, Chris Hostetter
<[EMAIL PROTECTED]>wrote:

> "code freeze" may be overstating it ... the point of the freeze is to hold
> off on new fatures and other misc refactorings and focus on bug fixes and
> documentation improvements.

Ah ok. I was under the impression that only blocker bugs should make it
there.

>
>
> This sounds like a bug, and assuming the fix isn't insanely invasive
> there's no reason not to make bug fixes on the 1.3 branch (and merge with
> the trunk).

That's great. There are a couple of small bugs which could make it to 1.3
then.

-- 
Regards,
Shalin Shekhar Mangar.

Re: copyField: String vs Text Field

2008-08-27 Thread Jake Conk

Hi Walter,

What do you mean by when you stemmed and stopped your title field?

Thanks,
- Jake




On Wed, Aug 27, 2008 at 7:41 PM, Walter Underwood
<[EMAIL PROTECTED]> wrote:
> On 8/27/08 5:54 PM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote:
>>
>> That's really only one use case though... the other being to have a
>> single stored field that is analyzed multiple different ways.
>
> We are the other use case. We take a title and put it in three
> fields: one merely lowercased, one stemmed and stopped, and one
> phonetic. At query time, we search all three with decreasing
> weights. An exact match is weighted more than a stemmed and
> stopped match, and so on.
>
> wunder
> --
> Search Guy, Netflix
>
>
>

Re[2]: Question about search suggestion

Wrong sort by score

Re: Wrong sort by score

Re: Wrong sort by score

Re: SpellCheckComponent bug?

Re: SpellCheckComponent bug?

Re: Wrong sort by score

Re: How does Solr search when a field is not specified?

Distributed Search Test

Re: Sorting and also looking at stored fields

Re: Distributed Search Test

Re: Distributed Search Test

java.io.FileNotFoundException: no segments* file found

Re: Wrong sort by score

Re: Distributed Search Test

Re: Distributed Search Test

Replacing FAST functionality at sesam.no

odd 500 error

Re: odd 500 error

Re: Replacing FAST functionality at sesam.no

Beginners question: adding a plugin

Re: Replacing FAST functionality at sesam.no

Question about autocomplete feature

copyField: String vs Text Field

Re: Beginners question: adding a plugin

Re: copyField: String vs Text Field

Re: dataimporthandler and mysql connector jar

Re: copyField: String vs Text Field

Re: Wrong sort by score

Re: copyField: String vs Text Field

Re: copyField: String vs Text Field

Re: dataimporthandler and mysql connector jar

Re: copyField: String vs Text Field

33 matches

Site Navigation

Mail list logo

Footer information