Re: Indexing the same data in many records

2009-01-15 Thread philmccarthy

Hi,

Adding same document many times is actually the scenario I wanted to
test--indexing hits from Apache webserver logs with the source of the
referring page.

My expectation would be that the majority of hits on a given day would
originate from a small number of referrers, so each of these referring pages
would be indexed multiple times. I really wanted to check that this would
scale better than indexing the same number of different documents--your
explanation regarding term distribution explains why this is the case.

Many thanks,
Phil


Otis Gospodnetic wrote:
> 
> Phil,
> 
> Note that adding the same document multiple times and looking at the index
> size is not a very good approach.  You are adding a fixed number of
> distinct terms over and over.  In real-life scenario you will have a much
> greater term distribution, and that will affect index size.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
>> From: philmccarthy 
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, January 14, 2009 7:36:38 PM
>> Subject: Re: Indexing the same data in many records
>> 
>> 
>> Thanks Otis. I tweaked the Solr example app a little and then uploaded a
>> ~55KB document to it a couple of thousand times (changing the ID each
>> time).
>> The solr/data directory was 72MB on disc after adding the document 2000
>> times, so it seems that the index is growing by approximately 36KB for
>> each
>> document. That seems reasonable.
>> 
>> I guess I need to do some research into expected data volumes now, and
>> limits on Lucene index size.
>> 
>> Cheers,
>> Phil
>> 
>> 
>> Otis Gospodnetic wrote:
>> > 
>> > Phil,
>> > 
>> > From what you described so far, I don't see any red flags.  I would pay
>> > attention to reading those timestamps (covered on the Wiki and ML
>> > archives), that's all.
>> > 
>> > 
>> > Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> > 
>> > 
>> > 
>> > - Original Message 
>> >> From: philmccarthy 
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Tuesday, January 13, 2009 8:49:33 PM
>> >> Subject: Indexing the same data in many records
>> >> 
>> >> 
>> >> Hi,
>> >> 
>> >> I'd like to use Solr to index some webserver logs, in order to allow
>> easy
>> >> ad-hoc querying and analysis. Each Solr Document will represent a
>> single
>> >> request to the webserver, with fields for time, request URL, referring
>> >> URL
>> >> etc.
>> >> 
>> >> I'm also planning to fetch the page source of each referring URL, and
>> add
>> >> that as an indexed field in the Solr document. The aim is to allow
>> >> queries
>> >> like "find hits to /xyz.html where the referring page contains the
>> word
>> >> 'foobar'".
>> >> 
>> >> Since hundreds or even thousands of hits may all come from the same
>> >> referring page, would this approach be horribly inefficient? (Note the
>> >> page
>> >> source won't be stored in each Document, just indexed). Am I going to
>> >> dramatically increase the index size if I do this?
>> >> 
>> >> If so, is there a more elegant way to do what I want?
>> >> 
>> >> Many thanks,
>> >> Phil
>> >> 
>> >> 
>> >> 
>> >> -- 
>> >> View this message in context: 
>> >> 
>> http://www.nabble.com/Indexing-the-same-data-in-many-records-tp21448465p21448465.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> > 
>> > 
>> > 
>> 
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Indexing-the-same-data-in-many-records-tp21448465p21468706.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Indexing-the-same-data-in-many-records-tp21448465p21475019.html
Sent from the Solr - User mailing list archive at Nabble.com.



wildcard with capital letters

2009-01-15 Thread pcu

Hello,

I am working on simple prototype using solr but I did not figure out how to
configure solr to give me the right results.

for example if I use this field:










 





and put 'Koller, Julo' into this field

then get this results

PASSED: search("koller")
PASSED: search("julo")
PASSED: search("KOLLER")
PASSED: search("JULO")
PASSED: search("Koller")
PASSED: search("Julo")
PASSED: search("kolle*")
PASSED: search("jul*")
FAILED: search("KOLLE*")
java.lang.AssertionError: trying found KOLLE* in Koller, Julo expected:<0>
but was:<1>
FAILED: search("JUL*")
java.lang.AssertionError: trying found JUL* in Koller, Julo expected:<0> but
was:<1>
FAILED: search("Kolle*")
java.lang.AssertionError: trying found Kolle* in Koller, Julo expected:<0>
but was:<1>
FAILED: search("Jul*")
java.lang.AssertionError: trying found Jul* in Koller, Julo expected:<0> but
was:<1>

Why wildcard with at least one capital letters does not work.

thanks in advance

Peter
-- 
View this message in context: 
http://www.nabble.com/wildcard-with-capital-letters-tp21475396p21475396.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Import data from RSS Feed Question

2009-01-15 Thread Shalin Shekhar Mangar
On Thu, Jan 15, 2009 at 5:55 AM, Burt-Prior  wrote:

>
> Everything works and is setup correctly, but when I change the 'url'
> attribute in the entity declaration to a url on my intranet that requires
> basic authentication (username and password),  I get a HTTP 401 error when
> solr attempts to read the rss feed and update the index.
>
> Question: is there a way to specify a username and password for solr to use
> for an HttpDataSource?


No, not right now.

Any suggestions on how to solve this issue?


HttpDataSource will need to be enhanced. Right now it is a very simple
implementation using UrlConnection. We can probably switch to
commons-httpclient and use it's authentication capabilities.


> I've been using Lucene for awhile, but am new to solr.  Solr is fantastic!
>
> Thanks for your help,
> .Burt
> --
> View this message in context:
> http://www.nabble.com/Import-data-from-RSS-Feed-Question-tp21468562p21468562.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: Single facet on multiple attributes

2009-01-15 Thread Shalin Shekhar Mangar
On Wed, Jan 14, 2009 at 8:14 PM, prerna07  wrote:

>
> Hi,
>
> How can we create single facet on multiple attributes?


Do you mean to combine facets from multiple fields into one output? If yes,
you can create a copyField of all these fields and facet on that.


>
>
> Thanks,
> --
> View this message in context:
> http://www.nabble.com/Single-facet-on-multiple-attributes-tp21457259p21457259.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: wildcard with capital letters

2009-01-15 Thread Shalin Shekhar Mangar
On Thu, Jan 15, 2009 at 4:28 PM, pcu  wrote:

> Why wildcard with at least one capital letters does not work.


Prefix queries are not analysed. So the query you are making is of a
different case than the tokens in the index. Before sending the query to
Solr, you can lowercase it yourself.


>
>
> thanks in advance
>
> Peter
> --
> View this message in context:
> http://www.nabble.com/wildcard-with-capital-letters-tp21475396p21475396.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: Unwanted clustering of search results after sorting by score

2009-01-15 Thread Axel Tetzlaff

Hi,

I'm working on the problem Max described as well. We did try to omit the
norms which lead to the phenomenon that products that have a very extensive
description were more likely to have a higher score since they contained the
word more often. Due to many expands of the SynonymFilter at index-time this
grew especially ugly. But as you already pointed out we should have a deeper
look at how the score is assembled..

Nevertheless the second problem of getting a good mix of shops can be
discussed seperatly. Say we have 5 products per result page and the 10 best
matches for a search have all the same score. 8 of the products are of one
shop (A), and the two others by two other shops (B,C).

What we often get is (letter indicating a product of this shop)
1.A
2.A
3.A
4.A
5.A
  second result page 
6.A
7.B
8.A
9.C
10.  A 

but what we want to get is s.th. like this:

1.A
2.C
3.B
4.A
5.A
  second result page 
6.A
7.A
8.A
9.A
10.  A 

As you can imagine there is no uniform distribution of products over shops.
So sorting by a random field does not work out since there are shops with
10s of thousands of products and shops with less than 100 products.

So theoretically I would sort by score and then by a magic factor which gets
greater the less products of this shop (eventually with that same score) are
already in the search result. Alternativly to a second sorting criteria the
score could be diminished with as well I guess...

What really bothers me, is that this requirement seems to need an extra
iteration over the search result which keeps track of the distribution of
products and shops in the search result.

We're really thankful for any hint on howto tackle this problem,
Axel
-- 
View this message in context: 
http://www.nabble.com/Unwanted-clustering-of-search-results-after-sorting-by-score-tp20977761p21477387.html
Sent from the Solr - User mailing list archive at Nabble.com.



Customizing Solr to handle Leading Wildcard queries

2009-01-15 Thread Jana, Kumar Raja
Hi,

 

Not being able to perform Leading Wildcard queries is a major handicap.
I want to be able to perform searches like *.pdf to fetch all pdf
documents from Solr.

 

I have found quite a few threads on this topic and one of the solutions
was that this feature can be enabled by adding:

parser.setAllowLeadingWildcards(true); at Line 92 in QueryParsing.java

Unfortunately, this did not work or may be I was using a different
parser and I don't know how to configure the parsers to make this work.

 

Can someone please tell me the steps to customize Solr to enable this
feature?

 

Thanks,

Kumar



Re: Customizing Solr to handle Leading Wildcard queries

2009-01-15 Thread Erik Hatcher


On Jan 15, 2009, at 8:23 AM, Jana, Kumar Raja wrote:
Not being able to perform Leading Wildcard queries is a major  
handicap.

I want to be able to perform searches like *.pdf to fetch all pdf
documents from Solr.


For this particular case, I recommend indexing the document type as a  
separate field.  Something like type:pdf (or use a MIME type string).   
Then you can do a very direct and fast query to search or facet by  
document types.


Erik



RE: Customizing Solr to handle Leading Wildcard queries

2009-01-15 Thread Jana, Kumar Raja
Hi Erik,

Thanks for the quick reply.
I want to enable leading wildcard query searches in general. The case
mentioned in the earlier mail is just one of the many instances I use
this feature.

-Kumar




-Original Message-
From: Erik Hatcher [mailto:e...@ehatchersolutions.com] 
Sent: Thursday, January 15, 2009 7:59 PM
To: solr-user@lucene.apache.org
Subject: Re: Customizing Solr to handle Leading Wildcard queries


On Jan 15, 2009, at 8:23 AM, Jana, Kumar Raja wrote:
> Not being able to perform Leading Wildcard queries is a major  
> handicap.
> I want to be able to perform searches like *.pdf to fetch all pdf
> documents from Solr.

For this particular case, I recommend indexing the document type as a  
separate field.  Something like type:pdf (or use a MIME type string).   
Then you can do a very direct and fast query to search or facet by  
document types.

Erik



Is it just me or multicore default is broken? Can't ping

2009-01-15 Thread Julian Davchev
Hi,
I am trying to setup multicore solr. So I just download default one with
jetty...goto example/
and run
java -Dsolr.solr.home=multicore -jar start.jar


All looks smooth without errors on startup.
Also can can open admin at

http://localhost:8983/solr/core1/admin/


But then trying to ping
http://localhost:8983/solr/core1/admin/ping

I get  error 500 INTERNAL SERVER ERROR


And tons of exceptions in background starting with nullpointer

Anyone have a clue? Is solr stable to be used or multicore is something
reacently added and not to be trusted yet?




Re: Customizing Solr to handle Leading Wildcard queries

2009-01-15 Thread Glen Newton
If we are talking short single term fields (like a file field that has
a single term like "foo.pdf") then do what the DBMS b-tree indexes did
a long time ago: for every field you want a leading wildcard, insert
it in reverse order. So field file:"foo.pdf"  is also stored, indexed
as reverseField:"fdp.oof". Now when someone does a search on
reverseField, like reverseField:*oo.pdf, you reverse the query to be:
fdp.oo*

I believe some of the DBMSs kept a separate reverse b-tree to handle
leading wildcard queries.

And obviously this technique is harder to put in place for arbitrary
sections of text that have to parsed. But a special parser could be
written to handle this as well.

-glen
http://zzzoot.blogspot.com/


2009/1/15 Jana, Kumar Raja :
> Hi Erik,
>
> Thanks for the quick reply.
> I want to enable leading wildcard query searches in general. The case
> mentioned in the earlier mail is just one of the many instances I use
> this feature.
>
> -Kumar
>
>
>
>
> -Original Message-
> From: Erik Hatcher [mailto:e...@ehatchersolutions.com]
> Sent: Thursday, January 15, 2009 7:59 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Customizing Solr to handle Leading Wildcard queries
>
>
> On Jan 15, 2009, at 8:23 AM, Jana, Kumar Raja wrote:
>> Not being able to perform Leading Wildcard queries is a major
>> handicap.
>> I want to be able to perform searches like *.pdf to fetch all pdf
>> documents from Solr.
>
> For this particular case, I recommend indexing the document type as a
> separate field.  Something like type:pdf (or use a MIME type string).
> Then you can do a very direct and fast query to search or facet by
> document types.
>
>Erik
>
>



-- 

-


Re: Is it just me or multicore default is broken? Can't ping

2009-01-15 Thread Otis Gospodnetic
Not sure, I'd have to try it.  But you didn't mention which version of Solr you 
are using.  Nightly build?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Julian Davchev 
> To: solr-user@lucene.apache.org
> Sent: Thursday, January 15, 2009 9:53:37 AM
> Subject: Is it just me or multicore default is broken? Can't ping
> 
> Hi,
> I am trying to setup multicore solr. So I just download default one with
> jetty...goto example/
> and run
> java -Dsolr.solr.home=multicore -jar start.jar
> 
> 
> All looks smooth without errors on startup.
> Also can can open admin at
> 
> http://localhost:8983/solr/core1/admin/
> 
> 
> But then trying to ping
> http://localhost:8983/solr/core1/admin/ping
> 
> I get  error 500 INTERNAL SERVER ERROR
> 
> 
> And tons of exceptions in background starting with nullpointer
> 
> Anyone have a clue? Is solr stable to be used or multicore is something
> reacently added and not to be trusted yet?



Re: Customizing Solr to handle Leading Wildcard queries

2009-01-15 Thread Otis Gospodnetic
Hi ramuK,

I believe you can turn that "on" via the Lucene QueryParser, but of course such 
searches will be slo(oo)w.  You can also index reversed tokens (e.g. *kumar --> 
rakum*) or you could index n-grams with begin/end delim characters (e.g. kumar 
-> ^ k u m a r $, *kumar -> "k u m a r $")


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: "Jana, Kumar Raja" 
> To: solr-user@lucene.apache.org
> Sent: Thursday, January 15, 2009 9:49:24 AM
> Subject: RE: Customizing Solr to handle Leading Wildcard queries
> 
> Hi Erik,
> 
> Thanks for the quick reply.
> I want to enable leading wildcard query searches in general. The case
> mentioned in the earlier mail is just one of the many instances I use
> this feature.
> 
> -Kumar
> 
> 
> 
> 
> -Original Message-
> From: Erik Hatcher [mailto:e...@ehatchersolutions.com] 
> Sent: Thursday, January 15, 2009 7:59 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Customizing Solr to handle Leading Wildcard queries
> 
> 
> On Jan 15, 2009, at 8:23 AM, Jana, Kumar Raja wrote:
> > Not being able to perform Leading Wildcard queries is a major  
> > handicap.
> > I want to be able to perform searches like *.pdf to fetch all pdf
> > documents from Solr.
> 
> For this particular case, I recommend indexing the document type as a  
> separate field.  Something like type:pdf (or use a MIME type string).  
> Then you can do a very direct and fast query to search or facet by  
> document types.
> 
> Erik



Re: Searchable and Non Searchable Fields

2009-01-15 Thread Otis Gospodnetic
Con,

Sure.  You just have to specify the field name when searching:

FirstName:George (and not just: George)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: con 
> To: solr-user@lucene.apache.org
> Sent: Thursday, January 15, 2009 12:20:55 AM
> Subject: Re: Searchable and Non Searchable Fields
> 
> 
> Thanks for the reply Otis
> Even if we dont get both George and Georgeon, Can we have only the firstname
> as searchable.
> That is, If I search George, I should get firstname, lastname, and country
> of the first row, and no values from the third row should be returned
> 
> Regards
> Con
> 
> 
> 
> Otis Gospodnetic wrote:
> > 
> > Hi,
> > 
> > Your schema setup looks fine.
> > George is no the same as Georgeon, so 2) won't match a search for
> > FirstName:George
> > 
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > 
> > 
> > - Original Message 
> >> From: con 
> >> To: solr-user@lucene.apache.org
> >> Sent: Wednesday, January 14, 2009 1:23:06 AM
> >> Subject: Searchable and Non Searchable Fields
> >> 
> >> 
> >> Hi All
> >> 
> >> I am using dataimporthandler to index values from oracle db.
> >> 
> >> My sample rows are like:
> >> 
> >> 1) FirstName-> George,LastName-> Bush,  Country-> US
> >> 2) FirstName-> Georgeon, LastName-> Washington, Country-> US
> >> 3) FirstName-> Tony,   LastName-> George,   Country-> UK
> >> 4) FirstName-> Gordon,LastName-> Brown,Country-> UK
> >> 5) FirstName-> Vladimer,  LastName-> Putin,  Country-> Russia
> >> 
> >> How can i set only the FirstName field as searchable.
> >> For eg. if I search George, I should get FirstName, LastName and Country
> >> of
> >> first and second rows only, and if I search Bush no value should be
> >> returned.
> >> 
> >> I tried by providing various options for the at schema.xml
> >>  
> >>  
> >>  
> >>  
> >> But it is not providing the exact results. 
> >> 
> >> How can I change the field attributes to get this result? Or is there
> >> someother configs for this?
> >> 
> >> Expecting reply
> >> Thanks in advance
> >> con
> >> -- 
> >> View this message in context: 
> >> 
> http://www.nabble.com/Searchable-and-Non-Searchable-Fields-tp21450664p21450664.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> > 
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/Searchable-and-Non-Searchable-Fields-tp21450664p21471595.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Unwanted clustering of search results after sorting by score

2009-01-15 Thread Otis Gospodnetic
Axel,

Others may have better ideas, but the simplest idea that occurs to me right now 
is to really just go over the search results and resort them the way you 
described.  However, I don't think this is as scary as it sounds.  You don't 
really have to go through the whole result set - you only need to do this for 
the N hits you are displaying (10 in your example).  All of the data you need 
to access will already be in memory and cached, so this should be cheap, quick, 
and easy.  The magic factor that's inversely proportional to the number of 
products in a shop could be stored in a separate field at index time.

This should be doable with a function query, too.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Axel Tetzlaff 
> To: solr-user@lucene.apache.org
> Sent: Thursday, January 15, 2009 8:15:29 AM
> Subject: Re: Unwanted clustering of search results after sorting by score
> 
> 
> Hi,
> 
> I'm working on the problem Max described as well. We did try to omit the
> norms which lead to the phenomenon that products that have a very extensive
> description were more likely to have a higher score since they contained the
> word more often. Due to many expands of the SynonymFilter at index-time this
> grew especially ugly. But as you already pointed out we should have a deeper
> look at how the score is assembled..
> 
> Nevertheless the second problem of getting a good mix of shops can be
> discussed seperatly. Say we have 5 products per result page and the 10 best
> matches for a search have all the same score. 8 of the products are of one
> shop (A), and the two others by two other shops (B,C).
> 
> What we often get is (letter indicating a product of this shop)
> 1.A
> 2.A
> 3.A
> 4.A
> 5.A
>  second result page 
> 6.A
> 7.B
> 8.A
> 9.C
> 10.  A 
> 
> but what we want to get is s.th. like this:
> 
> 1.A
> 2.C
> 3.B
> 4.A
> 5.A
>  second result page 
> 6.A
> 7.A
> 8.A
> 9.A
> 10.  A 
> 
> As you can imagine there is no uniform distribution of products over shops.
> So sorting by a random field does not work out since there are shops with
> 10s of thousands of products and shops with less than 100 products.
> 
> So theoretically I would sort by score and then by a magic factor which gets
> greater the less products of this shop (eventually with that same score) are
> already in the search result. Alternativly to a second sorting criteria the
> score could be diminished with as well I guess...
> 
> What really bothers me, is that this requirement seems to need an extra
> iteration over the search result which keeps track of the distribution of
> products and shops in the search result.
> 
> We're really thankful for any hint on howto tackle this problem,
> Axel
> -- 
> View this message in context: 
> http://www.nabble.com/Unwanted-clustering-of-search-results-after-sorting-by-score-tp20977761p21477387.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: place log4j.properties

2009-01-15 Thread Matthew Runo
Have you tried placing it up in /WEB-INF/classes/? I'd think that'd be  
the root of the classpath for solr, and maybe where it's looking for  
the file?


If you figure it out, could you update the wiki?

--Matthew

On Jan 14, 2009, at 3:39 AM, Marc Sturlese wrote:



Hey there,
I have changed the log system in the nightly build to log4j  
following this

comment:

http://wiki.apache.org/solr/SolrLogging

Everything is loaded correclty but I am geting this INFO:

log4j:WARN No appenders could be found for logger
(org.apache.solr.servlet.SolrDispatchFilter).
log4j:WARN Please initialize the log4j system properly.

I think the problem is that the wepapp is not finding the  
log4j.properties.

I have tryed placing it in the firs class level:
./WEB-INF/classes/org/apache/solr/servlet/

But doesn't seem to recognize it... Any advice?

Thanks in advance

--
View this message in context: 
http://www.nabble.com/place-log4j.properties-tp21454379p21454379.html
Sent from the Solr - User mailing list archive at Nabble.com.





Help with Solr 1.3 lockups?

2009-01-15 Thread Jerome L Quinn

Hi, all.

I'm running solr 1.3 inside Tomcat 6.0.18.  I'm running a modified query
parser, tokenizer, highlighter, and have a CustomScoreQuery for dates.

After some amount of time, I see solr stop responding to update requests.
When crawling through the logs, I see the following pattern:

Jan 12, 2009 7:27:42 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
Jan 12, 2009 7:28:11 PM org.apache.solr.common.SolrException log
SEVERE: Error during auto-warming of
key:org.apache.solr.search.queryresult...@ce0f92b9:java.lang.OutOfMemoryError
at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
at org.apache.lucene.index.SegmentTermEnum.term
(SegmentTermEnum.java:167)
at org.apache.lucene.index.SegmentMergeInfo.next
(SegmentMergeInfo.java:66)
at org.apache.lucene.index.MultiSegmentReader$MultiTermEnum.next
(MultiSegmentReader.java:492)
at org.apache.lucene.search.FieldCacheImpl$7.createValue
(FieldCacheImpl.java:267)
at org.apache.lucene.search.FieldCacheImpl$Cache.get
(FieldCacheImpl.java:72)
at org.apache.lucene.search.FieldCacheImpl.getInts
(FieldCacheImpl.java:245)
at org.apache.solr.search.function.IntFieldSource.getValues
(IntFieldSource.java:50)
at org.apache.solr.search.function.SimpleFloatFunction.getValues
(SimpleFloatFunction.java:41)
at org.apache.solr.search.function.BoostedQuery$CustomScorer.
(BoostedQuery.java:111)
at org.apache.solr.search.function.BoostedQuery$CustomScorer.
(BoostedQuery.java:97)
at org.apache.solr.search.function.BoostedQuery
$BoostedWeight.scorer(BoostedQuery.java:88)
at org.apache.lucene.search.IndexSearcher.search
(IndexSearcher.java:132)
at org.apache.lucene.search.Searcher.search(Searcher.java:126)
at org.apache.lucene.search.Searcher.search(Searcher.java:105)
at org.apache.solr.search.SolrIndexSearcher.getDocListNC
(SolrIndexSearcher.java:966)
at org.apache.solr.search.SolrIndexSearcher.getDocListC
(SolrIndexSearcher.java:838)
at org.apache.solr.search.SolrIndexSearcher.access$000
(SolrIndexSearcher.java:56)
at org.apache.solr.search.SolrIndexSearcher$2.regenerateItem
(SolrIndexSearcher.java:260)
at org.apache.solr.search.LRUCache.warm(LRUCache.java:194)
at org.apache.solr.search.SolrIndexSearcher.warm
(SolrIndexSearcher.java:1518)
at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1018)
at java.util.concurrent.FutureTask$Sync.innerRun
(FutureTask.java:314)
at java.util.concurrent.FutureTask.run(FutureTask.java:149)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:896)
at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:735)

Jan 12, 2009 7:28:11 PM org.apache.tomcat.util.net.JIoEndpoint$Acceptor run
SEVERE: Socket accept failed
Throwable occurred: java.lang.OutOfMemoryError
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:414)
at java.net.ServerSocket.implAccept(ServerSocket.java:464)
at java.net.ServerSocket.accept(ServerSocket.java:432)
at
org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket
(DefaultServerSocketFactory.java:61)
at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run
(JIoEndpoint.java:310)
at java.lang.Thread.run(Thread.java:735)

<<<>
<< Java dumps core and heap at this point >>
<<<>

Jan 12, 2009 7:28:21 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
timed out: SingleInstanceLock: write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:85)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1140)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:938)
at org.apache.solr.update.SolrIndexWriter.
(SolrIndexWriter.java:116)
at org.apache.solr.update.UpdateHandler.createMainIndexWriter
(UpdateHandler.java:122)
at org.apache.solr.update.DirectUpdateHandler2.openWriter
(DirectUpdateHandler2.java:167)
at org.apache.solr.update.DirectUpdateHandler2.addDoc
(DirectUpdateHandler2.java:221)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd
(RunUpdateProcessorFactory.java:59)
at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate
(XmlUpdateRequestHandler.java:196)
at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody
(XmlUpdateRequestHandler.java:123)
at org.apache.solr.handler.RequestHandlerBase.handleRequest
(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at org.apache.solr.servlet.SolrDispatchFilter.execute
(

Solr/Lucene capabilities--Newbie Question

2009-01-15 Thread kgrogan0321

Hello,
I have been tasked with evaluating a few open source tools for implementing
an Enterprise search in a new project(Solr/Lucene being one of them).  

Can anyone help to answer if Solr/Lucene can: 
1)Handle field/row level security?
2)implement DROOLS rules on a query of multiple records?  If so how does it
work internally and are there any performance hits?
3)Handle multiple data sources?
4)Break up and dispatch queries?


I do aplogize that my question(s) are a little general, as we are only in
the beginning stages of the project.  I appreciate any help or answers
anyone can give :)

Thanks,
Karen

-- 
View this message in context: 
http://www.nabble.com/Solr-Lucene-capabilities--Newbie-Question-tp21484427p21484427.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Import data from RSS Feed Question

2009-01-15 Thread Chris Hostetter
: Everything works and is setup correctly, but when I change the 'url'
: attribute in the entity declaration to a url on my intranet that requires
: basic authentication (username and password),  I get a HTTP 401 error when
: solr attempts to read the rss feed and update the index.
: 
: Question: is there a way to specify a username and password for solr to use
: for an HttpDataSource?

RFC1738s3.1 specifies that username:password for can be included directly 
in URLs -- this has traditionally worked for http URLs when dealing with 
basic authentication, but some clients/servers reject URLs that utilize 
this feature as being "unsafe".

I haven't tried it using DataImportHandler, but i don'tbelieve ther'es 
anything in the code that would outright reject such URLs...

   http://username:passw...@hostname.tld/rss-path.xml


http://tools.ietf.org/html/rfc1738#section-3.1


-Hoss



Having no luck with build-in replication and multicore

2009-01-15 Thread Jacob Singh
Hi folks,

Here's what I've got going:

Master Server with the following config:


commit
schema.xml,stopwords.txt,elevate.xml



Slave server with the following:



http://mydomain:8080/solr/065f079c24914a4103e2a57178164bbe/replication
00:00:20
 


I think there is a bug in the JSP for the admin pages (which I can
post a patch if desired) where the replication link goes to
replication/ and index.jsp doesn't get loaded automatically (at least
on my box).  I managed to get to the dashboard by adding index.jsp,
and it seems that while the slave is polling constantly, it never
receives an update.

I tried the following:

curl 
'http://mydomain.com:8080/solr/065f079c24914a4103e2a57178164bbe/replication?command=snapshoot'


01java.lang.NullPointerException:java.lang.NullPointerException


The index has about 400 docs in it, and old style replication used to
work just fine on it.

When I run the snappull command from the slave:

curl 
'http://mydomain.com:8080/solr/065f079c24914a4103e2a57178164bbe/replication?command=snappull'


01OK


The replication page also remains unchanged and there are no docs on the slave.

Any ideas?

Thanks,
Jacob






-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com


Re: Help with Solr 1.3 lockups?

2009-01-15 Thread Mark Miller
How much RAM are you giving the JVM? Thats running out of memory loading 
a FieldCache, which can be a more memory intensive data structure. It 
pretty much points to the JVM not having enough RAM to do what you want. 
How many fields do you sort on? How many fields do you facet on? How 
much RAM do you have available and how much have you given Solr? How 
many documents are you working with?


As far as rebooting a failed server, the best technique is generally 
external. I would recommend a script/program on another machine that 
hits the Solr instance with a simple query every now and again. If you 
don't get a valid response within a reasonable amount of time, or after 
a reasonable number of tries, fire off alert emails and issue a command 
to that server to reboot the JVM. Or something to that effect.


However, you should figure out why you are running out of memory. You 
don't want to use more resources than you have available if you can help it.


- Mark

Jerome L Quinn wrote:

Hi, all.

I'm running solr 1.3 inside Tomcat 6.0.18.  I'm running a modified query
parser, tokenizer, highlighter, and have a CustomScoreQuery for dates.

After some amount of time, I see solr stop responding to update requests.
When crawling through the logs, I see the following pattern:

Jan 12, 2009 7:27:42 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
Jan 12, 2009 7:28:11 PM org.apache.solr.common.SolrException log
SEVERE: Error during auto-warming of
key:org.apache.solr.search.queryresult...@ce0f92b9:java.lang.OutOfMemoryError
at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
at org.apache.lucene.index.SegmentTermEnum.term
(SegmentTermEnum.java:167)
at org.apache.lucene.index.SegmentMergeInfo.next
(SegmentMergeInfo.java:66)
at org.apache.lucene.index.MultiSegmentReader$MultiTermEnum.next
(MultiSegmentReader.java:492)
at org.apache.lucene.search.FieldCacheImpl$7.createValue
(FieldCacheImpl.java:267)
at org.apache.lucene.search.FieldCacheImpl$Cache.get
(FieldCacheImpl.java:72)
at org.apache.lucene.search.FieldCacheImpl.getInts
(FieldCacheImpl.java:245)
at org.apache.solr.search.function.IntFieldSource.getValues
(IntFieldSource.java:50)
at org.apache.solr.search.function.SimpleFloatFunction.getValues
(SimpleFloatFunction.java:41)
at org.apache.solr.search.function.BoostedQuery$CustomScorer.
(BoostedQuery.java:111)
at org.apache.solr.search.function.BoostedQuery$CustomScorer.
(BoostedQuery.java:97)
at org.apache.solr.search.function.BoostedQuery
$BoostedWeight.scorer(BoostedQuery.java:88)
at org.apache.lucene.search.IndexSearcher.search
(IndexSearcher.java:132)
at org.apache.lucene.search.Searcher.search(Searcher.java:126)
at org.apache.lucene.search.Searcher.search(Searcher.java:105)
at org.apache.solr.search.SolrIndexSearcher.getDocListNC
(SolrIndexSearcher.java:966)
at org.apache.solr.search.SolrIndexSearcher.getDocListC
(SolrIndexSearcher.java:838)
at org.apache.solr.search.SolrIndexSearcher.access$000
(SolrIndexSearcher.java:56)
at org.apache.solr.search.SolrIndexSearcher$2.regenerateItem
(SolrIndexSearcher.java:260)
at org.apache.solr.search.LRUCache.warm(LRUCache.java:194)
at org.apache.solr.search.SolrIndexSearcher.warm
(SolrIndexSearcher.java:1518)
at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1018)
at java.util.concurrent.FutureTask$Sync.innerRun
(FutureTask.java:314)
at java.util.concurrent.FutureTask.run(FutureTask.java:149)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:896)
at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:735)

Jan 12, 2009 7:28:11 PM org.apache.tomcat.util.net.JIoEndpoint$Acceptor run
SEVERE: Socket accept failed
Throwable occurred: java.lang.OutOfMemoryError
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:414)
at java.net.ServerSocket.implAccept(ServerSocket.java:464)
at java.net.ServerSocket.accept(ServerSocket.java:432)
at
org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket
(DefaultServerSocketFactory.java:61)
at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run
(JIoEndpoint.java:310)
at java.lang.Thread.run(Thread.java:735)

<<<>
<< Java dumps core and heap at this point >>
<<<>

Jan 12, 2009 7:28:21 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
timed out: SingleInstanceLock: write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:85)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:

Request a specifc document

2009-01-15 Thread roberto
Hello,

There is any way to request a document (a field) using the doc id?

 it would be very nice if i could use an responsewriter to response
the document in an html form?

Does someone already did it?

Thanks,

-- 
"Without love, we are birds with broken wings."
Morrie


Re: Request a specifc document

2009-01-15 Thread Erik Hatcher


On Jan 15, 2009, at 3:04 PM, roberto wrote:

There is any way to request a document (a field) using the doc id?


/select?q=id:

Append &fl=field,list to return only the desired fields.


it would be very nice if i could use an responsewriter to response
the document in an html form?

Does someone already did it?


A couple of options "out of the box":

  1) the XSLT response writer, with a custom XSL to output HTML will  
work


  2) the new fangled VelocityResponseWriter (in trunk, see wiki for  
instructions)
 where you can supply a velocity template that you can customize  
to generate HTML
 [way more sensible, if you ask me, than XSLT for HTML generation  
like that - but I'm biased :)]


Erik



Re: Having no luck with build-in replication and multicore

2009-01-15 Thread Shalin Shekhar Mangar
What is the output of /replication?command=indexversion on the master?

On Fri, Jan 16, 2009 at 1:27 AM, Jacob Singh  wrote:

> Hi folks,
>
> Here's what I've got going:
>
> Master Server with the following config:
> 
>
>commit
>schema.xml,stopwords.txt,elevate.xml
>
> 
>
> Slave server with the following:
>
> 
>
>
> http://mydomain:8080/solr/065f079c24914a4103e2a57178164bbe/replication
> 
>00:00:20
> 
> 
>
> I think there is a bug in the JSP for the admin pages (which I can
> post a patch if desired) where the replication link goes to
> replication/ and index.jsp doesn't get loaded automatically (at least
> on my box).  I managed to get to the dashboard by adding index.jsp,
> and it seems that while the slave is polling constantly, it never
> receives an update.
>
> I tried the following:
>
> curl '
> http://mydomain.com:8080/solr/065f079c24914a4103e2a57178164bbe/replication?command=snapshoot
> '
> 
> 
> 0 name="QTime">1
> name="exception">java.lang.NullPointerException:java.lang.NullPointerException
> 
>
> The index has about 400 docs in it, and old style replication used to
> work just fine on it.
>
> When I run the snappull command from the slave:
>
> curl '
> http://mydomain.com:8080/solr/065f079c24914a4103e2a57178164bbe/replication?command=snappull
> '
> 
> 
> 0 name="QTime">1OK
> 
>
> The replication page also remains unchanged and there are no docs on the
> slave.
>
> Any ideas?
>
> Thanks,
> Jacob
>
>
>
>
>
>
> --
>
> +1 510 277-0891 (o)
> +91  33 7458 (m)
>
> web: http://pajamadesign.com
>
> Skype: pajamadesign
> Yahoo: jacobsingh
> AIM: jacobsingh
> gTalk: jacobsi...@gmail.com
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Is it just me or multicore default is broken? Can't ping

2009-01-15 Thread Julian Davchev
Hi,

I am trying with 1.3.0from 
http://apache.cbox.biz/lucene/solr/1.3.0/apache-solr-1.3.0.tgz

which I supposed is stable release.

Otis Gospodnetic wrote:
> Not sure, I'd have to try it.  But you didn't mention which version of Solr 
> you are using.  Nightly build?
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>   
>> From: Julian Davchev 
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, January 15, 2009 9:53:37 AM
>> Subject: Is it just me or multicore default is broken? Can't ping
>>
>> Hi,
>> I am trying to setup multicore solr. So I just download default one with
>> jetty...goto example/
>> and run
>> java -Dsolr.solr.home=multicore -jar start.jar
>>
>>
>> All looks smooth without errors on startup.
>> Also can can open admin at
>>
>> http://localhost:8983/solr/core1/admin/
>>
>>
>> But then trying to ping
>> http://localhost:8983/solr/core1/admin/ping
>>
>> I get  error 500 INTERNAL SERVER ERROR
>>
>>
>> And tons of exceptions in background starting with nullpointer
>>
>> Anyone have a clue? Is solr stable to be used or multicore is something
>> reacently added and not to be trusted yet?
>> 
>
>   



Re: Having no luck with build-in replication and multicore

2009-01-15 Thread Jacob Singh
Hi Shalin,

Thanks for responding!  This used to be a 1.3 index (could that be the issue?)

curl 
'http://mydomain.com:8080/solr/065f079c24914a4103e2a57178164bbe//replication?command=indexversion'





Best,
Jacob


On Jan 15, 2009 3:32pm, Shalin Shekhar Mangar  wrote:
> What is the output of /replication?command=indexversion on the master?
>
>
>
> On Fri, Jan 16, 2009 at 1:27 AM, Jacob Singh jacobsi...@gmail.com> wrote:
>
>
>
> > Hi folks,
>
> >
>
> > Here's what I've got going:
>
> >
>
> > Master Server with the following config:
>
> >
>
> >
>
> >commit
>
> >schema.xml,stopwords.txt,elevate.xml
>
> >
>
> >
>
> >
>
> > Slave server with the following:
>
> >
>
> >
>
> >
>
> >
>
> > http://mydomain:8080/solr/065f079c24914a4103e2a57178164bbe/replication
>
> >
>
> >00:00:20
>
> >
>
> >
>
> >
>
> > I think there is a bug in the JSP for the admin pages (which I can
>
> > post a patch if desired) where the replication link goes to
>
> > replication/ and index.jsp doesn't get loaded automatically (at least
>
> > on my box).  I managed to get to the dashboard by adding index.jsp,
>
> > and it seems that while the slave is polling constantly, it never
>
> > receives an update.
>
> >
>
> > I tried the following:
>
> >
>
> > curl '
>
> > http://mydomain.com:8080/solr/065f079c24914a4103e2a57178164bbe/replication?command=snapshoot
>
> > '
>
> >
>
> >
>
> > 0
> > name="QTime">1
> >
>
> > name="exception">java.lang.NullPointerException:java.lang.NullPointerException
>
> >
>
> >
>
> > The index has about 400 docs in it, and old style replication used to
>
> > work just fine on it.
>
> >
>
> > When I run the snappull command from the slave:
>
> >
>
> > curl '
>
> > http://mydomain.com:8080/solr/065f079c24914a4103e2a57178164bbe/replication?command=snappull
>
> > '
>
> >
>
> >
>
> > 0
> > name="QTime">1OK
>
> >
>
> >
>
> > The replication page also remains unchanged and there are no docs on the
>
> > slave.
>
> >
>
> > Any ideas?
>
> >
>
> > Thanks,
>
> > Jacob
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > --
>
> >
>
> > +1 510 277-0891 (o)
>
> > +91  33 7458 (m)
>
> >
>
> > web: http://pajamadesign.com
>
> >
>
> > Skype: pajamadesign
>
> > Yahoo: jacobsingh
>
> > AIM: jacobsingh
>
> > gTalk: jacobsi...@gmail.com
>
> >
>
>
>
>
>
>
>
> --
>
> Regards,
>
> Shalin Shekhar Mangar.
>


Re: Date Stats support using Solr

2009-01-15 Thread Chris Hostetter

(Still catching up on holiday mail) ...

: I was searching for features in Solr which would give me the maximum and
: minimum values for various numeric and name fields. I found the Stats
: Component (Solr-680) and thanks a ton for that !!! J

: Is there a similar component for date fields too? I played a bit with

I don't think so, but for just getting the min/max date value (not sure 
what other stats would make sense for dates) this would probably be a 
fairly easy patch to StatsComponent if someone wanted to take  a stab at 
it.

the only way i know of to accomplish this now is to do two queries: 
asc/desc sort by date and see what comes back.



-Hoss



Re: Highlighting not working

2009-01-15 Thread Chris Hostetter

The problem here seems to be that SolrJ can't parse the XML response 
comming back from your solr server ... can you check your servlet 
container logs and let us know:

1) exactly what URL it says SolrJ is hitting.
2) the response you get when you hit that same url in your browser?

: Caused by: javax.xml.stream.XMLStreamException: ParseError at
: [row,col]:[3,1440]
: Message: requires 'name' attribute: lst
:   at
: 
org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:284)



-Hoss



Re: understanding queryNorm

2009-01-15 Thread Chris Hostetter
:i wanted to understand how the queryNorm is calculated. i did read 
: similarity documentation of lucene it says it is
...
: what would be default q.getBoost() ?  ( as i am not giving any value 
: specifically any where in solr). t.getBoost() is 1 in my case as i am 

all queries have a boost value, even if you dont' specify one they have a 
default -- i believe it's "1" for every stock query, but a custom Impl 
could have an alternate default if it really wanted to.

thea easiest way to visuallize a lot of this is with debugQuery=true


-Hoss



Re: place log4j.properties

2009-01-15 Thread Marc Sturlese

>Have you tried placing it up in /WEB-INF/classes/? 
It worked, I was trying to put set it in
/WEB-INF/classes/org/apache/solr/servlet wich was wrong
Thanks!


Matthew Runo wrote:
> 
> Have you tried placing it up in /WEB-INF/classes/? I'd think that'd be  
> the root of the classpath for solr, and maybe where it's looking for  
> the file?
> 
> If you figure it out, could you update the wiki?
> 
> --Matthew
> 
> On Jan 14, 2009, at 3:39 AM, Marc Sturlese wrote:
> 
>>
>> Hey there,
>> I have changed the log system in the nightly build to log4j  
>> following this
>> comment:
>>
>> http://wiki.apache.org/solr/SolrLogging
>>
>> Everything is loaded correclty but I am geting this INFO:
>>
>> log4j:WARN No appenders could be found for logger
>> (org.apache.solr.servlet.SolrDispatchFilter).
>> log4j:WARN Please initialize the log4j system properly.
>>
>> I think the problem is that the wepapp is not finding the  
>> log4j.properties.
>> I have tryed placing it in the firs class level:
>> ./WEB-INF/classes/org/apache/solr/servlet/
>>
>> But doesn't seem to recognize it... Any advice?
>>
>> Thanks in advance
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/place-log4j.properties-tp21454379p21454379.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Re%3A-place-log4j.properties-tp21482994p21487883.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: collectionDistribution vs SolrReplication

2009-01-15 Thread Chris Hostetter

: I would like to know the advantages of moving from:
: a master-slave system using CollectionDistribution with all their .sh
: scripts
: http://wiki.apache.org/solr/CollectionDistribution
: to:
: use SolrReplication and his solrconfig.xml configuration.
: http://wiki.apache.org/solr/SolrReplication

in addition to other comments posted it's important to keep in mind that 
one of the original motivations for the new style of replication was to 
have a 100% java based solution, as a result, it's is the only 
replication approach that works on windows.

(in particular: it has no dependency on being able to delete hardlinks, or 
on running rsync, or on using ssh, or on having external crons, etc..)

I still haven't had a chance to really kick the tires on the java based 
replication, so i have no real experience to base either of these claims 
on, but my hunch is that:
  1) new users will find the java based replication *much* easier to get 
up an running (a lot less moving parts and external processes to deal 
with)
  2) existing users who already have the script based replication working 
for them may find the java based replication less transparent and harder 
to maniplate in tricky ways.

...that second hunch comes from the fact that since the java replication 
is all self contained in solr, and doesn't use use all of hte various 
external processes (cron, rsync, snapshooter, snappuller, ssh, etc...) 
there are less places for people to manipulate the replication when doing 
atypical' operations ... for example: during a phased rollout of some new 
code/schema, you might disable all replication by shutting down the rsyncd 
port; then disabling it for a few slaves by commenting out the snappuller 
cron before turning rsyncd back on ... etc.

these types of tricks are probably unneccessary in 90% of the use cases, 
and people who aren't use to being able to do them probably won't care, 
but if you are use to having that level of control, you might miss them.

(but as i said: i haven't had a chance to try out the java replication at 
all, so for all i know it's just as tweakable and i'm just an idiot.)

-Hoss



Re: Help with Solr 1.3 lockups?

2009-01-15 Thread Stephen Weiss
I've been wondering about this one myself - most of the services we  
have installed work this way, if they crash out for whatever reason  
they restart automatically (Apache, MySQL, even the OS itself).   
Failures are detected and corrected by the load balancers and also in  
some cases by the machine itself (like with kernel panics).   But not  
SOLR, and I'm not quite sure what to do to get it there.  We use Jetty  
but it's the same story.  It's not like it fails out all that often,  
but when it does it will still respond to HTTP requests (because Jetty  
itself is still working), which makes it a lot harder to detect a  
failure... I've tried writing something for nagios but the problem is  
that most responses solr would give to a request vary depending on  
index updates, so it's not like I can just take a checksum and compare  
it - and even then, it would only really alert us to the problem, we'd  
still have to go in and restart everything (personally I don't enjoy  
restarting servers from my blackberry nearly as much as I should).


I'd have to come up with something that can intelligently interpret  
the response and decide if the server's still working properly or not,  
and the processing time on that alone might make it too inefficient to  
run every few seconds, but at least with that we'd be able to tell the  
cluster "don't send anything to this server for now".  Is there some  
really obvious way to track if a particular servlet is still running  
properly (in either Tomcat or Jetty, because if Tomcat has this I'd  
switch) and restart the container if it's not?


Thanks!!

--
Steve

On Jan 15, 2009, at 1:57 PM, Jerome L Quinn wrote:



An even bigger problem is the fact that once Solr is wedged, it  
stays that
way until a human notices and restarts things.  The tomcat stays  
running

and there's no automatic detection that will either restart Solr, or
restart the Tomcat container.

Any suggestions on either front?

Thanks,
Jerry Quinn





Re: Delete / filter / hide query results

2009-01-15 Thread Chris Hostetter

: can't be part of a field or something like this. So let's say that the only
: way to know if a user has access rights is by calling something like
: accessRights(sessionID, docID) where docID is stored in a field.

first tip: 'stored' valuves are going to be really inefficient to deal 
with on every request, at a bare minimum make sure this field is indexed 
and make all of your custom code access it using the FieldCache.

: I then decided to use a custom SearchComponent called right after index
: querying (before faceting is done) but for what I have read it's not a good
: idea to delete results because they are stored in more than one place and it
: could break the caching system (I suppose that if I delete results for user
: A they will be deleted for user B too if he makes the same query although he
: does have access rights). Anyway I don't really understand where results are
: stored in ResponseBuilder; DocSet / DocList are pretty obscur.

A DocSet is an unordered set of documents -- in the context of a query 
it's the set of all documents matching that query.  A DocList is an 
ordered (sub-)list of documents with some metadata about the whole list -- 
in the context of a query it's the "page" of documents being returned ot 
the user; ie: docs 11-20 of 5478.  (this is all pretty well mentioned in 
the docs)

if you want to modify the DocList/DocSet included in query response, it's 
fairly easy to do -- the key is just that you shouldn't modify the 
existing DocSet/DocList objects becuase they are probably stored in the 
cache, but you are free to construct new instances and replace the ones in 
the response ... the FacetComponent will use your replacement DocSet when 
it comes.

note that applying your access control to the DocSet will be easy, because 
it's a complete set of unordered docs, you can remove anyting you want.  
but the DocList has a lot more itneresting use cases to worry about.   if 
the DocList is 11-20 or 5478 total matches, and you wnat to remove 2 you 
have to go search for what the next 2 would be to make sure you still 
return 10.  but you also have to worry about wether the orriginal 11-20 
that the QueryComponent generated were right in the first place.  when the 
user made his first request for 1-10, your security component might have 
pulled out 3, but the QueryComponent didn't know that when it picked 
11-20, so you are already off by 3 from where you should be.

this is why post-processing access control tends to be a bad idea (beyond 
just extra goodies like faceting) ... things get a lot cleaner if you 
ensure your access controls get applied at query time.

you should cosider implementing your access controls as a new type of 
query and using it as a filter ... with the new ValueSource parser hooks 
you could implement your logic as a "function" that takes a sessionId as 
input and reuse all of the existing query code.


-Hoss



Re: Questions about UUID type

2009-01-15 Thread Chris Hostetter

1) please don't cc both solr-user and solr-dev ... if you are confused 
about how something works, or having problems, please just email 
solr-user.

2) ...

: I'm confused by the UUID type comment. It's said that
...
: However¡¤ i found that if i don't specify the field and it will report the
: exception
: 
: org.apache.solr.common.SolrException: Document [test] missing required
: field: id

...the javadoc you quoted is correct an accurate for what that method 
does, it behaves as documented when the value passed to it is null, empty 
or "NEW".

but at a higher level you still need to supply a value for any 
required="true" field when indexing a doc ... that value can be null, or 
empty, or "NEW" but you have to provide one.  If you want solr to just 
take care of it for you, then specify a default value in your schema...



The current appraoch allwows Solr to be flexible to what the user wants...

* if you specify a default in the schema, solr won't complain if it's not 
in the input and will pass that value on to UUIDFIeld
* if you don't specify a default, you take responsibility for seinging 
some type of value for hte UUIDField to use
* regardless of where the input comes from UUIDField will do the right 
thing if the input is null, empty, or "NEW"



-Hoss

Re: Solr/Lucene capabilities--Newbie Question

2009-01-15 Thread Grant Ingersoll


On Jan 15, 2009, at 1:53 PM, kgrogan0321 wrote:



Hello,
I have been tasked with evaluating a few open source tools for  
implementing

an Enterprise search in a new project(Solr/Lucene being one of them).

Can anyone help to answer if Solr/Lucene can:
1)Handle field/row level security?


Yes.  This is typically handled with a Filter.



2)implement DROOLS rules on a query of multiple records?  If so how  
does it

work internally and are there any performance hits?


Not out of the box.  You would probably have to implement your own  
SearchComponent/RequestHandler to do so.  I don't know what would be  
involved here, but it sounds interesting.




3)Handle multiple data sources?


Yes.



4)Break up and dispatch queries?


In what way?  Do you mean for distributed search?  If so, then yes.





I do aplogize that my question(s) are a little general, as we are  
only in

the beginning stages of the project.  I appreciate any help or answers
anyone can give :)


No worries, all good questions.


Re: Dismax query parser with different field classes

2009-01-15 Thread Chris Hostetter

: I have a small problem with using a boost query, which is that I would like
: documents found in the boost query to be returned even if the main query
: does not include those results. So what I am effectively looking for is an
: OR between the dismax query and the boost query, rather than a required main
: query or'd with the boost query. Does anything currently exist which can
: facilitate this?

i don't think so.

I think you would need to write a custom version of QueryComponent to do 
this.

A lot of cool magic (that is as yet still largely undocumented) got added 
when the QParser and QParserPlugin apis were added for doing local params 
and even variable substitution using other request params -- but that 
still all happens at a string level.  i can't think of any way to say that 
you want multiple querys generated by different QParsers to be combined in 
a particular way


-Hoss



Re: Query about NOT (-) operator

2009-01-15 Thread Chris Hostetter

: But below query does not work

: 2.   (NOT(IBA60019_l:1) AND NOT(IBA60019_l:0)) AND
: businessType:wt.doc.WTDocument

boolean queries must have at least one "positive" expression (ie; MUST or 
SHOULD) in order to match.  the solr query parser tries to help with this 
and if the *outermost* BooleanQuery doesn't contains only negatived 
clauses, it adds a match all docs query (ie: *:*) ... but in your case, 
you have a nested booleanquery which contains only negated clauses ... so 
you need to included the match all docs query explicitly...

   +(*:* -IBA60019_l:1 -IBA60019_l:0) +businessType:wt.doc.WTDocument


-Hoss



Re: Using Lucene index in Solr

2009-01-15 Thread Chris Hostetter

: My data is stored in a database, I want Solr to look up the data in that
: database using my existing index. At the moment, I have set the 

you seem to be confusing two issues

: element in my solrconfig to point at my existing index, and checked the
: schema on my existing index using Luke but I can't get any results when
: searching in Solr.
: 
: My index was created using hibernate-search. 

...if you have an existing index, and you want to search in Solr, you have 
to create a schema.xml file that tells solr what the fields are that you 
have and what datatypes to treat them as -- in particular what analysers 
to use when querying them.

if hibernate-search built your index, you'll need to look at how it was 
configured to build the index to figure some of this out (i'm not familiar 
with hibernate-search so i can't help you there) ... the 
LUkeRequestHandler can help you spot check the raw index if you need to 
(ie: "oh, look all of hte terms are lowercased so i guess i would use a 
LowerCaseFilterFactory")

: How I can retrieve my data in Solr, using the existing Lucene index? I think
: I need to set the database connection details somewhere, just not sure
: where. I have set up a dataImport handler, but I don't want that to
: overwrite my exising index.

If you are given solr an existing index, it doens't care what database it 
was built from -- just what analsysis rules were used when it was built.  
the only thing in solr that cares about databases is the DataImportHandler 
which you could use to update your idex as new data gets added to your 
database if you want -- but first you have to create a schema.xml that 
makes sense for your index.

Alternately: create the schema.xml that you *want* to have, abandom your 
existing index and use DataImportHandler to build a new index and keep it 
up to date.


-Hoss



Re: Using Solr with an existing Lucene index

2009-01-15 Thread Chris Hostetter

: My first attempt to do this resulted in my Java program throwing a
: CorruptIndex exception. It appears as though Solr has somehow modified my
: index files in some way which causes the Lucene code to see them as corrupt
: (even though I did not, at least intentionally, try to post any documents or
: otherwise update the index through Lucene).

knowing the specifics of the exception would be helpful ... for example, 
if the exception message was "unknown format version: -X" that typically 
just means it was last touched by a newer version of Lucene then the 
one you are trying to read it with ... if hte version of lucene in solr is 
newer then then one you are using i can easily imagine this happening just 
from solr opening and closing an IndexWriter even if you never use Solr to 
add/commit any docs.

: If so, how? Is is just a matter of changing your data directory to your
: existing index data in the solrconfig.xml, for example:
:   /my/existing/lucene/index/data ?

solr expects the index to be named "index" inside the data directory -- 
but beyond that you also need to make sure your schema.xml is compatible 
(as mentioned in another thread i just replied to)



-Hoss



CoreAdmin for replication STATUS

2009-01-15 Thread Jacob Singh
Hi,

How do I find out the status of a slave's index?  I have the following scenario:

1. Boot up the slave.  I give it a core name of boot-$CoreName.

2. I call boot-$CoreName/replication?command=snappull

3. I check back every minute using cron and I want to see if the slave
has actually gotten the data.

4. When it gets the data I call
solr/admin/cores?action=RENAME&core=boot-$CoreName&other=$CoreName.

I do this because the balancer will start hitting the slave before it
has the full index otherwise.  Step 3 is the problem.  I don't have a
reliable way to know it has finished replication AFAIK.  I see in
?action=STATUS for the CoreAdmin there is a field called "current".
Is this useful for this?  If not, what is recommended.  I could hit
the admin/replication/index.jsp url and screenscrape the HTML, but I
imagine there is a better way.

Thanks,
Jacob

-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com


Re: delta index produces multiple results?

2009-01-15 Thread Chris Hostetter

: Full index is working fine, in schema.xml I implemented a uniqueKey field
: (which is of the type 'text').

using "text" as the fieldtype for a uniqueKey is almost never a good idea.  
it could easily explain the behavior you are seeing.

DataImportHandler (and all of hte update handlers) relies on the 
underlying UpdateProcessor to delete docs with identical uniqueKeys when 
you "update" an existing document ... if the uniqueKey field has an 
analyzer that produces multiple tokens (TextField frequently does) then 
the behavior becomes undefined.

stick something like StrField, or IntField for your uniqueKeyField ... or 
if you must use TextField make sure you are using the KeywordTokenizer.

if changing this still causes problems, then we'll need to see your 
schema.xml your data-config.xml, and the output of doing a search 
where you get some duplicaitons like this to help figure out what else 
might be going wrong.


-Hoss



RE: Solr FAQ entry about "Dynamically calculated range facet" topic

2009-01-15 Thread Chris Hostetter

: So did anyone put together a FAQ on this subject? I am also interested in
: seeing the different ways to get dynamic faceting to work.

in past discussions own of the big pre-reqs for doing anything interesting 
was generating stats accross the field ... the new StatsComponent can give 
you the min/mean/max/stddev for any field, so you can now make rough 
guesses at some good ranges on the client and then request them.

: In this post, Chris Hostetter dropped a piece of handler code. Is it still
: the right path to take for those generated ranges:
: $0..$20 (3)
: $20..$75 (15)
: $75..$123 (8)
: 
: "Re: Dynamically calculated range facet"
: http://www.mail-archive.com/solr-user@lucene.apache.org/msg04727.html

these days you'd want to do this in a SearchComponent ... 
probabably a subclass of FacetComponent ... but the same basic pattern 
still applies.  you're going to have a DocSet to work with, and you can do 
whatever you want ot generate your facet metadata.

the really interesting part would be getting 
SearchComponent.distributedProcess to work, because individual shards 
aren'tneccessarily going to pick the same rnages based on their local 
stats ... i guess you'd make your new Component depend on the 
StatsComponent completing and then have the coordinator compute the ranges 
and tell the shards what they should be, then aggregate ... that seems 
like it might work.




-Hoss



Re: Date Range query in Solr

2009-01-15 Thread Chris Hostetter

:   I too have a similar question on getting the query results based on
: dateRange. I have both startDate and endDate fields in my schema and if I
: want to get the query results that fall into two date values for eg: get all
: the docs. whose date is between startDate and endDate, then how can I query.

searching the mailing list archive is always a good place to start...

http://www.nabble.com/forum/Search.jtp?forum=14479&local=y&query=startDate+endDate

http://www.nabble.com/Date-Range-Query-%2B-Fields-to16108517.html#a16132427




-Hoss



Re: Does search query return specific result.?

2009-01-15 Thread Chris Hostetter

: I believe the reason is because, Solr returns the document with all of the
: "Tag" field's content.
: 
: Now, the question is: Is there a way to make it return only Tag that match
: the criteria from the same document?

not really .. highlighting with things like the NullFragmenter can 
probably make things like this work, but for an auto-suggest type 
applicaiton you're going to want to be fast -- the added processing of 
highlighting is probably not the best way to go.

the thing to remember is that you typically want one "document" for each 
"thing" that you are going to return form a "search" ... for an 
auto-suggest type application, you frequently want one doc per "word" that 
your auto-suggest queries are going to return.

There are alternate approaches however ... the ne TermsComponent for 
example makes it easy to get direct access to the TermEnum for an 
arbitrary field based on things like a prefix, or min/max doc requenty -- 
so a good basic auto-suggest can be implemented that way even with your 
docs as you have them indexed ... but if you want more control over the 
weighting/boosting of what comes back from an arbitrary query your going 
to want a special index of "Tags"


-Hoss



Re: EmbeddedSolrServer in Single Core

2009-01-15 Thread qp19

Thanks ryan,

Works like a charm. This is roughly how I ended up doing.

QP

SolrConfig solrConfig = new SolrConfig(SOLR_HOME, CONFIG_FILENAME,
null);
IndexSchema indexSchema = new IndexSchema(solrConfig, SOLR_SCHEMA,
null);

CoreContainer container = new CoreContainer(new  

SolrResourceLoader(SolrResourceLoader.locateInstanceDir()));
   CoreDescriptor dcore = new 
CoreDescriptor(container, "",  
solrConfig.getResourceLoader().getInstanceDir());
   
dcore.setConfigName(solrConfig.getResourceName());
   
dcore.setSchemaName(indexSchema.getResourceName());
   SolrCore core = new SolrCore( null, SOLR_DATA, 
solrConfig,  
indexSchema, dcore);
   container.register("", core, false); 
SolrServer server = new 
EmbeddedSolrServer(container, ""); 


ryantxu wrote:
> 
> 
> On Jan 9, 2009, at 8:12 PM, qp19 wrote:
> 
>>
>> Please bear with me. I am new to Solr. I have searched all the  
>> existing posts
>> about this and could not find an answer. I wanted to know how do I  
>> go about
>> creating a
>>
>> SolrServer using EmbeddedSolrServer. I tried to initialize this  
>> several ways
>> but was unsuccesful. I do not have multi-core. I am using solrj 1.3. I
>> attempted to use the
>>
>> depracated methods as mentioned in the SolrJ documentation the  
>> following way
>> but it fails as well with unable to locate Core.
>>
>>
>> SolrCore core = SolrCore.getSolrCore();
> 
> This function is deprecated and *really* should no be used --  
> especially for embedded solr server.  (the only chance you would have  
> for it to work is if you start up solr in a web app before calling this)
> 
>>
>>  SolrServer server = new EmbeddedSolrServer( core );
>>
> 
> Core initialization is kind of a mess, but this contains everything  
> you would need:
> 
>CoreContainer container = new CoreContainer(new  
> SolrResourceLoader(SolrResourceLoader.locateInstanceDir()));
>CoreDescriptor dcore = new CoreDescriptor(container, coreName,  
> solrConfig.getResourceLoader().getInstanceDir());
>dcore.setConfigName(solrConfig.getResourceName());
>dcore.setSchemaName(indexSchema.getResourceName());
>SolrCore core = new SolrCore( null, dataDirectory, solrConfig,  
> indexSchema, dcore);
>container.register(coreName, core, false);
> 
> 
> 
>> So far my installation is pretty basic with Solr running on Tomcat  
>> as per
>> instructions in the wiki. My solr home is outside of webapps folder  
>> i.e
>> "c:/tomcat-solr/solr". I am
>>
>> able to connect using CommonsHttpSolrServer("http://localhost:8080/solr 
>> ")
>> without a problem. The question in a nutshell is, how do I instantiate
>> EmbeddedSolrServer using  new EmbeddedSolrServer(CoreContainer
>> coreContainer, String coreName) ? Initializing CoreContainer appears  
>> to be
>> complicated when compared to SolrCore.getSolrCore() as per the  
>> examples. Is
>> there a simpler way to Initialize CoreContainer? Is a core(or  
>> CoreName)
>> necessary eventhough I don't use multi-core? Also, is it possible to
>> initialize EmbeddedSolrServer using spring? Thanks in advance for  
>> the help.
>>
> 
> yes, I use this:
> 
> 
>
>  ${dir}
>  ${dconfigFile}
>
> 
> class="org.apache.solr.client.solrj.embedded.EmbeddedSolrServer">
>  
>  
>
> 
> class="org.apache.solr.client.solrj.embedded.EmbeddedSolrServer">
>  
>  
>
> 
> 
> ryan
> 
> 

-- 
View this message in context: 
http://www.nabble.com/EmbeddedSolrServer-in-Single-Core-tp21383525p21490222.html
Sent from the Solr - User mailing list archive at Nabble.com.



New Searcher / Commit / Cache Warming Time

2009-01-15 Thread David Giffin
Hi All,

I have been trying to reduce the cpu load and time it takes to put a
new snapshot in place on our slave servers. I have tried tweaking many
of the system memory, jvm and cache size setting used by Solr. When
running a commit from the command line I'm seeing roughly 16 seconds
before the commit completes. This is a ~7gig index with no pending
changes, nothing else running, no load:

INFO: {commit=} 0 15771
Jan 15, 2009 11:29:35 PM org.apache.solr.core.SolrCore execute
INFO: [listings] webapp=/solr path=/update params={} status=0 QTime=15771

So I started disabling things. I disabled everything under 
times went down:

INFO: {commit=} 0 103
Jan 15, 2009 11:35:22 PM org.apache.solr.core.SolrCore execute
INFO: [listings] webapp=/solr path=/update params={} status=0 QTime=103

So I started adding things back in, and found that adding the

section was causing the slow down. When I comment that section commit
times go down, the cpu spikes go away. So I tried putting the
newSearcher section back in with no queries to run, same thing...
times jump up:

INFO: {commit=} 0 16306
Jan 15, 2009 11:49:32 PM org.apache.solr.core.SolrCore execute
INFO: [listings] webapp=/solr path=/update params={} status=0 QTime=16306

Do you know what would be causing "newSearcher" to create such a
delays, and cpu spikes? Is there any reason not to disable the
"newSearcher" section?

Thanks,
David


Re: Querying Solr Index for date fields

2009-01-15 Thread Chris Hostetter

: You will have to URL encode the string correctly and supply date in format
: Solr expects. Please check this: http://wiki.apache.org/solr/SolrQuerySyntax

beyond that, you may also need to worry about lucene query sytnax escaping 
...the query parser can see the ":" character and think you are searching 
for "mm:ss" in the field "-mm-ddThh"

this isn't something people typically need to worry about when range 
searching on dates (becuase query parser doesn't treat ":" as special in a 
range query) but if you are are looking for exact date matches you'll 
probably want to quote the date value...

date_field:"2009-01-09T00:00:00Z"




-Hoss



Re: Missing high-scoring results in 1.3

2009-01-15 Thread Chris Hostetter

H ... I'm wondering if the Lucene/Solr version changes are a red 
herring here ... at first blush all of these symptoms sound like invalid 
cache hits... 

: I'm seeing a really weird problem with Solr 1.3. The best match for a
: query will not show up with 10 rows, but will show up if I request more,
: sometimes 200, sometimes it takes 1000 rows.

if two queries that are functionally differnet (and produce different 
results) are mistakenly considered equivilent you could see this exact 
behavior ... queryA gets cached, queryB results in a false cache hit and 
doesn't inclue the highest scoring document that it might if it had been 
executed w/o caches.  increasing the rows param when re-executing queryB 
still results in a cache hits because of the queryResultWindowSize.  
stoping/starting solr "fixes" the probem because the caches are 
empty and queryB is one of the first things tried when the server restarts 
(before anyone has a chance to run queryA)

: Here is the relevant part of solrconfig. Note that we have added a
: JaroWinkler fuzzy search, so the dismax specs have extra decoration.

...can you elaborate on your JaroWinkler customizations?  is it possible 
that the Query objects getting generated have hashCode/equals methods 
that aren't aware of your customizations?


-Hoss



Re: Missing high-scoring results in 1.3

2009-01-15 Thread Yonik Seeley
On Thu, Jan 15, 2009 at 8:01 PM, Chris Hostetter
 wrote:
> : Here is the relevant part of solrconfig. Note that we have added a
> : JaroWinkler fuzzy search, so the dismax specs have extra decoration.
>
> ...can you elaborate on your JaroWinkler customizations?  is it possible
> that the Query objects getting generated have hashCode/equals methods
> that aren't aware of your customizations?

Nice catch - that could definitely cause this type of problem.
I just went back and re-examined the recent MultiPhraseQuery
equals/hashcode fix and verified that the bug could only result in a
cache miss and not a false it.

-Yonik


Solrj + hl.usePhraseHighlighter

2009-01-15 Thread Sachit

Hi all,

SolrQuery provides all methods related to highlighting with Solrj client
such as setHighlight(), setHighlightFragSize(), etc.
But I didn’t find any way to set the hl.usePhraseHighlighter  = true to the
query. However, I can set the same in my solrconfig.xml but my aim is to
this attribute to “true” in case of “exact match” search.

One more question which I needed to rise is about “highlighting the
wildcard”.
I went through the mailing list and found out that we can do this by
prefixing the “*” with a “?” but it is not working for me.
In fact the wild card “?” instead of “*” itself does not give me any result.

My relevant schema.xml:


  


  
  


  





all

Note: I have other fields too but specifying only relevant. I am
copyfielding every field but not "wildcard" and "highlightFields". While
creating query, I'm doing - query.setQuery("wildcard: " + searchTerm);. Also 
I'm using the default requestHandler(standard)



My relevant solrconfig.xml

 
  
   explicit
   10
   *
   2.1
   highlightFields
   300

  


Please let me know where I'm going wrong. 

Thanks 
Sachit

-- 
View this message in context: 
http://www.nabble.com/Solrj-%2B-hl.usePhraseHighlighter-tp21492830p21492830.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solrj + hl.usePhraseHighlighter

2009-01-15 Thread Sachit

Sorry I forgot to detail out the field type "wild_card" definition.

   
  


  
  


  


-- 
View this message in context: 
http://www.nabble.com/Solrj-%2B-hl.usePhraseHighlighter-tp21492830p21492981.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Having no luck with build-in replication and multicore

2009-01-15 Thread Shalin Shekhar Mangar
Hi Jacob,

You don't need to call snapshoot on the master. That is only used to create
a backup of the index files.

You are calling snappull on the master. It is only applicable for the
slaves. You don't need to issue these calls yourself at all. The
ReplicationHandler is designed to take care of these.

The master is showing indexversion as 0 because you haven't called commit on
the master yet. Can you call commit and see if replication happens on the
slave?

On Fri, Jan 16, 2009 at 2:24 AM, Jacob Singh  wrote:

> Hi Shalin,
>
> Thanks for responding!  This used to be a 1.3 index (could that be the
> issue?)
>
> curl '
> http://mydomain.com:8080/solr/065f079c24914a4103e2a57178164bbe//replication?command=indexversion
> '
> 
> 
> 0 name="QTime">00 name="generation">0
> 
>
> Best,
> Jacob
>
>
> On Jan 15, 2009 3:32pm, Shalin Shekhar Mangar 
> wrote:
> > What is the output of /replication?command=indexversion on the master?
> >
> >
> >
> > On Fri, Jan 16, 2009 at 1:27 AM, Jacob Singh jacobsi...@gmail.com>
> wrote:
> >
> >
> >
> > > Hi folks,
> >
> > >
> >
> > > Here's what I've got going:
> >
> > >
> >
> > > Master Server with the following config:
> >
> > >
> >
> > >
> >
> > >commit
> >
> > >schema.xml,stopwords.txt,elevate.xml
> >
> > >
> >
> > >
> >
> > >
> >
> > > Slave server with the following:
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > > http://mydomain:8080/solr/065f079c24914a4103e2a57178164bbe/replication
> >
> > >
> >
> > >00:00:20
> >
> > >
> >
> > >
> >
> > >
> >
> > > I think there is a bug in the JSP for the admin pages (which I can
> >
> > > post a patch if desired) where the replication link goes to
> >
> > > replication/ and index.jsp doesn't get loaded automatically (at least
> >
> > > on my box).  I managed to get to the dashboard by adding index.jsp,
> >
> > > and it seems that while the slave is polling constantly, it never
> >
> > > receives an update.
> >
> > >
> >
> > > I tried the following:
> >
> > >
> >
> > > curl '
> >
> > >
> http://mydomain.com:8080/solr/065f079c24914a4103e2a57178164bbe/replication?command=snapshoot
> >
> > > '
> >
> > >
> >
> > >
> >
> > > 0
> > > name="QTime">1
> > >
> >
> > >
> name="exception">java.lang.NullPointerException:java.lang.NullPointerException
> >
> > >
> >
> > >
> >
> > > The index has about 400 docs in it, and old style replication used to
> >
> > > work just fine on it.
> >
> > >
> >
> > > When I run the snappull command from the slave:
> >
> > >
> >
> > > curl '
> >
> > >
> http://mydomain.com:8080/solr/065f079c24914a4103e2a57178164bbe/replication?command=snappull
> >
> > > '
> >
> > >
> >
> > >
> >
> > > 0
> > > name="QTime">1OK
> >
> > >
> >
> > >
> >
> > > The replication page also remains unchanged and there are no docs on
> the
> >
> > > slave.
> >
> > >
> >
> > > Any ideas?
> >
> > >
> >
> > > Thanks,
> >
> > > Jacob
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > > --
> >
> > >
> >
> > > +1 510 277-0891 (o)
> >
> > > +91  33 7458 (m)
> >
> > >
> >
> > > web: http://pajamadesign.com
> >
> > >
> >
> > > Skype: pajamadesign
> >
> > > Yahoo: jacobsingh
> >
> > > AIM: jacobsingh
> >
> > > gTalk: jacobsi...@gmail.com
> >
> > >
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> > Regards,
> >
> > Shalin Shekhar Mangar.
> >
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: CoreAdmin for replication STATUS

2009-01-15 Thread Akshay
On Fri, Jan 16, 2009 at 4:57 AM, Jacob Singh  wrote:

> Hi,
>
> How do I find out the status of a slave's index?  I have the following
> scenario:
>
> 1. Boot up the slave.  I give it a core name of boot-$CoreName.
>
> 2. I call boot-$CoreName/replication?command=snappull
>
> 3. I check back every minute using cron and I want to see if the slave
> has actually gotten the data.
>
> 4. When it gets the data I call
> solr/admin/cores?action=RENAME&core=boot-$CoreName&other=$CoreName.
>
> I do this because the balancer will start hitting the slave before it
> has the full index otherwise.  Step 3 is the problem.  I don't have a
> reliable way to know it has finished replication AFAIK.  I see in
> ?action=STATUS for the CoreAdmin there is a field called "current".
> Is this useful for this?  If not, what is recommended.  I could hit
> the admin/replication/index.jsp url and screenscrape the HTML, but I
> imagine there is a better way.


>From the slave you can issue a http command,
boot-$CoreName/replication?command=details
this returns XML which contains a node isReplicating having boolean value.
This will tell you whether replication is in progress or completed.

Thanks,
> Jacob
>
> --
>
> +1 510 277-0891 (o)
> +91  33 7458 (m)
>
> web: http://pajamadesign.com
>
> Skype: pajamadesign
> Yahoo: jacobsingh
> AIM: jacobsingh
> gTalk: jacobsi...@gmail.com
>



-- 
Regards,
Akshay Ukey.