date:20090807

Documentation for Master-Slave Replication missing for Solr1.3. No mirror site for Solr 1.4 distribution.

2009-08-07 Thread Ninad Raut

Hi,
I want to know how to setup  master-slave configuration for Solr  1.3 . I
can't get documentation on the net. I found one for 1.4 but not for 1.3 .
ReplicationHandler is not present in 1.3.
Also, I would like to know from will I get the Solr 14. distribution. The
Solr Site lists mirrors only for 1.3 dist.
Regards,
Ninad.

Re: Documentation for Master-Slave Replication missing for Solr1.3. No mirror site for Solr 1.4 distribution.

2009-08-07 Thread Noble Paul നോബിള്‍ नोब्ळ्

1.4 is not released yet. you can grab a nightly from here
http://people.apache.org/builds/lucene/solr/nightly/

On Fri, Aug 7, 2009 at 12:47 PM, Ninad Raut wrote:
> Hi,
> I want to know how to setup  master-slave configuration for Solr  1.3 . I
> can't get documentation on the net. I found one for 1.4 but not for 1.3 .
> ReplicationHandler is not present in 1.3.
> Also, I would like to know from will I get the Solr 14. distribution. The
> Solr Site lists mirrors only for 1.3 dist.
> Regards,
> Ninad.
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Documentation for Master-Slave Replication missing for Solr1.3. No mirror site for Solr 1.4 distribution.

2009-08-07 Thread Shalin Shekhar Mangar

On Fri, Aug 7, 2009 at 12:47 PM, Ninad Raut wrote:

> Hi,
> I want to know how to setup  master-slave configuration for Solr  1.3 . I
> can't get documentation on the net. I found one for 1.4 but not for 1.3 .
> ReplicationHandler is not present in 1.3.
> Also, I would like to know from will I get the Solr 14. distribution. The
> Solr Site lists mirrors only for 1.3 dist.
> Regards,
>
>
Most documentation on the 1.3 script based replication is on the wiki at:

http://wiki.apache.org/solr/CollectionDistribution
http://wiki.apache.org/solr/SolrCollectionDistributionScripts
http://wiki.apache.org/solr/SolrCollectionDistributionStatusStats
http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline

-- 
Regards,
Shalin Shekhar Mangar.

Re: Documentation for Master-Slave Replication missing for Solr1.3. No mirror site for Solr 1.4 distribution.

2009-08-07 Thread Ninad Raut

Hi Noble,
can these builds be used in production environment? Are they stable? we are
not going live now, but in a few months we will. as such when will 1.4 be
officially released?

2009/8/7 Noble Paul നോബിള്‍ नोब्ळ् 

> 1.4 is not released yet. you can grab a nightly from here
> http://people.apache.org/builds/lucene/solr/nightly/
>
> On Fri, Aug 7, 2009 at 12:47 PM, Ninad Raut
> wrote:
> > Hi,
> > I want to know how to setup  master-slave configuration for Solr  1.3 . I
> > can't get documentation on the net. I found one for 1.4 but not for 1.3 .
> > ReplicationHandler is not present in 1.3.
> > Also, I would like to know from will I get the Solr 14. distribution. The
> > Solr Site lists mirrors only for 1.3 dist.
> > Regards,
> > Ninad.
> >
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>

CorruptIndexException: Unknown format version

2009-08-07 Thread Maximilian Hütter

Hi,

how can that happen, it is a new index, and it is already corrupt?

Did anybody else something like this?

WARN - 2009-08-07 10:44:54,925 | Solr index directory 'data/solr/index'
doesn't exist. Creating new index...
WARN - 2009-08-07 10:44:56,583 | solrconfig.xml uses deprecated
, Please update your config to use the
ShowFileRequestHandler.
WARN - 2009-08-07 10:44:56,586 | adding ShowFileRequestHandler with
hidden files: [XSLT]
ERROR - 2009-08-07 10:44:58,758 | java.lang.RuntimeException:
org.apache.lucene.index.CorruptIndexException: Unknown format version: -7
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:433)
at org.apache.solr.core.SolrCore.(SolrCore.java:216)
at org.apache.solr.core.SolrCore.getSolrCore(SolrCore.java:177)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
at
org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)


Best regards


-- 
Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel:  (+49) 0711 - 45 10 17 578
Fax:  (+49) 0711 - 45 10 17 573
e-mail :  max.huet...@blue-elephant-systems.com
Sitz   :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich

Re: mergeFactor / indexing speed

2009-08-07 Thread Chantal Ackermann

Juhu, great news, guys. I merged my child entity into the root entity, 
and changed the custom entityprocessor to handle the additional columns 
correctly.

And - indexing 160k documents now takes 5min instead of 1.5h!

(Now I can go relaxed on vacation. :-D )


Conclusion:
In my case performance was so bad because of constantly querying a 
database on a different machine (network traffic + db query per document).



Thanks for all your help!
Chantal


Avlesh Singh schrieb:

does DIH call commit periodically, or are things done in one big batch?


AFAIK, one big batch.


yes. There is no index available once the full-import started (and the 
searcher has no cache, other wise it still reads from that). There is no 
data (i.e. in the Admin/Luke frontend) visible until the import is 
finished correctly.

Re: Language Detection for Analysis?

2009-08-07 Thread Andrzej Bialecki


Otis Gospodnetic wrote:

Bradford,

If I may:

Have a look at http://www.sematext.com/products/language-identifier/index.html
And/or http://www.sematext.com/products/multilingual-indexer/index.html


.. and a Nutch plugin with similar functionality:

http://lucene.apache.org/nutch/apidocs-1.0/org/apache/nutch/analysis/lang/LanguageIdentifier.html

--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Language Detection for Analysis?

2009-08-07 Thread Jukka Zitting

Hi,

On Fri, Aug 7, 2009 at 12:31 PM, Andrzej Bialecki wrote:
> .. and a Nutch plugin with similar functionality:
>
> http://lucene.apache.org/nutch/apidocs-1.0/org/apache/nutch/analysis/lang/LanguageIdentifier.html

See also TIKA-209 [1] where I'm currently integrating the Nutch code
to work with Tika.

Tika 0.5 will have built-in language detection based on this.

[1] https://issues.apache.org/jira/browse/TIKA-209

BR,

Jukka Zitting

Help creating schema for indexable document

2009-08-07 Thread rossputin


Hi Guys.

I am struggling to create a schema with a determinist content model for a
set of documents I want to index.

My indexable documents will look something like:


  
1
code1
code2
mycategory
  


My service will be mission critical and will accept batch imports from a
potentially unreliable source.  Are there any xml schema guru's who can help
me with creating xn xsd which will work with my sample document?

Thanks in advance for your help,

 -- Ross
-- 
View this message in context: 
http://www.nabble.com/Help-creating-schema-for-indexable-document-tp24862700p24862700.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: mergeFactor / indexing speed

2009-08-07 Thread Shalin Shekhar Mangar

On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann <
chantal.ackerm...@btelligent.de> wrote:

> Juhu, great news, guys. I merged my child entity into the root entity, and
> changed the custom entityprocessor to handle the additional columns
> correctly.
> And - indexing 160k documents now takes 5min instead of 1.5h!
>

I'm a little late to the party but you may also want to look at
CachedSqlEntityProcessor.

-- 
Regards,
Shalin Shekhar Mangar.

Re: mergeFactor / indexing speed

2009-08-07 Thread Chantal Ackermann

Thanks for the tip, Shalin. I'm happy with 6 indexes running in parallel 
and completing in less than 10min, right now, but I'll have look anyway.



Shalin Shekhar Mangar schrieb:

On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann <
chantal.ackerm...@btelligent.de> wrote:


Juhu, great news, guys. I merged my child entity into the root entity, and
changed the custom entityprocessor to handle the additional columns
correctly.
And - indexing 160k documents now takes 5min instead of 1.5h!



I'm a little late to the party but you may also want to look at
CachedSqlEntityProcessor.

--
Regards,
Shalin Shekhar Mangar.

Solr 1.4 in Production Environment-- Is it stable?

2009-08-07 Thread Ninad Raut

Hi,
Has anyone used Solr 1.4 in production? There are some really nice features
in it like

   - Directly adding POJOs to Solr
   - ReplicationHandler etc.

Is 1.4 stable enought to be used in production?

Re: solr v1.4 in production?

2009-08-07 Thread Shalin Shekhar Mangar

On Wed, Jul 1, 2009 at 6:17 PM, Ed Summers  wrote:

> Here at the Library of Congress we've got several production Solr
> instances running v1.3. We've been itching to get at what will be v1.4
> and were wondering if anyone else happens to be using it in production
> yet. Any information you can provide would be most welcome.
>
>
We're using Solr 1.4 built from r793546 in production along with the new
java based replication.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr 1.4 in Production Environment-- Is it stable?

2009-08-07 Thread Otis Gospodnetic

I know a number of large companies using 1.4-dev.  But you could also wait 
another month or so and get the real 1.4.

 Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Ninad Raut 
> To: solr-user@lucene.apache.org
> Sent: Friday, August 7, 2009 7:32:17 AM
> Subject: Solr 1.4 in Production Environment-- Is it stable?
> 
> Hi,
> Has anyone used Solr 1.4 in production? There are some really nice features
> in it like
> 
>- Directly adding POJOs to Solr
>- ReplicationHandler etc.
> 
> Is 1.4 stable enought to be used in production?

Re: Language Detection for Analysis?

2009-08-07 Thread Grant Ingersoll

There are several free Language Detection libraries out there, as well  
as a few commercial ones.  I think Karl Wettin has even written one as  
a plugin for Lucene.  Nutch also has one, AIUI.  I would just Google  
"language detection".


Also see http://www.lucidimagination.com/search/?q=language+detection,  
as this has been brought up many times before and I'm sure there are  
links in the archives.


On Aug 6, 2009, at 3:46 PM, Bradford Stephens wrote:


Hey there,

We're trying to add foreign language support into our new search
engine -- languages like Arabic, Farsi, and Urdu (that don't work with
standard analyzers). But our data source doesn't tell us which
languages we're actually collecting -- we just get blocks of text. Has
anyone here worked on language detection so we can figure out what
analyzers to use? Are there commercial solutions?

Much appreciated!

--
http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Item Facet

2009-08-07 Thread David Lojudice Sobrinho

Thanks Avlesh.

But I didn't get it. How a dynamic field would aggregate values in query time?

On Thu, Aug 6, 2009 at 11:14 PM, Avlesh Singh wrote:
> Dynamic fields might be an answer. If you had a field called "product_*" and
> these were populated with the corresponding values during indexing then
> faceting on these fields will give you the desired behavior.
>
> The only catch here is that the product names have to be known upfront. A
> wildcard support for field names in facet.fl is still to come in Solr.
> Here's the issue - https://issues.apache.org/jira/browse/SOLR-247
>
> Cheers
> Avlesh
>
> On Fri, Aug 7, 2009 at 3:33 AM, David Lojudice Sobrinho
> wrote:
>
>> I can't reindex because the aggregated/grouped result should change as
>> the query changes... in other words, the result must by dynamic
>>
>> We've been thinking about a new handler for it something like:
>>
>>
>> /select?q=laptop&rows=0&itemfacet=on&itemfacet.field=product_name,min(price),max(price)
>>
>> Does it make sense? Something easier ready to use?
>>
>>
>> On Thu, Aug 6, 2009 at 6:05 PM, Ge, Yao (Y.) wrote:
>> > If you can reindex, simply rebuild the index with fields replaced by
>> > combining existing fields.
>> > -Yao
>> >
>> > -Original Message-
>> > From: David Lojudice Sobrinho [mailto:dalss...@gmail.com]
>> > Sent: Thursday, August 06, 2009 4:17 PM
>> > To: solr-user@lucene.apache.org
>> > Subject: Item Facet
>> >
>> > Hi...
>> >
>> > Is there any way to group values like shopping.yahoo.com or
>> > shopper.cnet.com do?
>> >
>> > For instance, I have documents like:
>> >
>> > doc1 - product_name1 - value1
>> > doc2 - product_name1 - value2
>> > doc3 - product_name1 - value3
>> > doc4 - product_name2 - value4
>> > doc5 - product_name2 - value5
>> > doc6 - product_name2 - value6
>> >
>> > I'd like to have a result grouping by product name with the value
>> > range per product. Something like:
>> >
>> > product_name1 - (value1 to value3)
>> > product_name2 - (value4 to value6)
>> >
>> > It is not like the current facet because the information is grouped by
>> > item, not the entire result.
>> >
>> > Any idea?
>> >
>> > Thanks!
>> >
>> > David Lojudice Sobrinho
>> >
>>
>>
>>
>> --
>> __
>>
>>   David L. S.
>> dalss...@gmail.com
>> __
>>
>



-- 
__

   David L. S.
dalss...@gmail.com
__

Re: Solr 1.4 in Production Environment-- Is it stable?

2009-08-07 Thread Jeff Newburn

We also use 1.4 which has gotten hit with load tests of up to
2000queries/sec.  Biggest thing is make sure you are using the slaves for
that kind of load.  Other than that 1.4 is pretty impressive.
-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562


> From: Otis Gospodnetic 
> Reply-To: 
> Date: Fri, 7 Aug 2009 05:26:06 -0700 (PDT)
> To: 
> Subject: Re: Solr 1.4 in Production Environment-- Is it stable?
> 
> I know a number of large companies using 1.4-dev.  But you could also wait
> another month or so and get the real 1.4.
> 
>  Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> 
> 
> 
> - Original Message 
>> From: Ninad Raut 
>> To: solr-user@lucene.apache.org
>> Sent: Friday, August 7, 2009 7:32:17 AM
>> Subject: Solr 1.4 in Production Environment-- Is it stable?
>> 
>> Hi,
>> Has anyone used Solr 1.4 in production? There are some really nice features
>> in it like
>> 
>>- Directly adding POJOs to Solr
>>- ReplicationHandler etc.
>> 
>> Is 1.4 stable enought to be used in production?
>

Re: Item Facet

2009-08-07 Thread Yao Ge


Are your product_name* fields numeric fields (integer or float)? 


Dals wrote:
> 
> Hi...
> 
> Is there any way to group values like shopping.yahoo.com or
> shopper.cnet.com do?
> 
> For instance, I have documents like:
> 
> doc1 - product_name1 - value1
> doc2 - product_name1 - value2
> doc3 - product_name1 - value3
> doc4 - product_name2 - value4
> doc5 - product_name2 - value5
> doc6 - product_name2 - value6
> 
> I'd like to have a result grouping by product name with the value
> range per product. Something like:
> 
> product_name1 - (value1 to value3)
> product_name2 - (value4 to value6)
> 
> It is not like the current facet because the information is grouped by
> item, not the entire result.
> 
> Any idea?
> 
> Thanks!
> 
> David Lojudice Sobrinho
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Item-Facet-tp24853669p24865535.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: CorruptIndexException: Unknown format version

2009-08-07 Thread Yonik Seeley

Wow, that is an interesting one...
I bet there is more than one Lucene version kicking around the
classpath somehow.
Try removing all of the servlet container's working directories.

-Yonik
http://www.lucidimagination.com

On Fri, Aug 7, 2009 at 4:41 AM, Maximilian
Hütter wrote:
> Hi,
>
> how can that happen, it is a new index, and it is already corrupt?
>
> Did anybody else something like this?
>
> WARN - 2009-08-07 10:44:54,925 | Solr index directory 'data/solr/index'
> doesn't exist. Creating new index...
> WARN - 2009-08-07 10:44:56,583 | solrconfig.xml uses deprecated
> , Please update your config to use the
> ShowFileRequestHandler.
> WARN - 2009-08-07 10:44:56,586 | adding ShowFileRequestHandler with
> hidden files: [XSLT]
> ERROR - 2009-08-07 10:44:58,758 | java.lang.RuntimeException:
> org.apache.lucene.index.CorruptIndexException: Unknown format version: -7
>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:433)
>        at org.apache.solr.core.SolrCore.(SolrCore.java:216)
>        at org.apache.solr.core.SolrCore.getSolrCore(SolrCore.java:177)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
>        at
> org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
>
>
> Best regards
>
>
> --
> Maximilian Hütter
> blue elephant systems GmbH
> Wollgrasweg 49
> D-70599 Stuttgart
>
> Tel            :  (+49) 0711 - 45 10 17 578
> Fax            :  (+49) 0711 - 45 10 17 573
> e-mail         :  max.huet...@blue-elephant-systems.com
> Sitz           :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
> Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich
>

Re: Item Facet

2009-08-07 Thread David Lojudice Sobrinho

The behavior i'm expecting is something similar to a GROUP BY in a
relational database.

SELECT product_name, model, min(price), max(price), count(*) FROM t
GROUP BY product_name, model

The current schema:

product_name (type: text)
model (type: text)
price (type: sfloat)


On Fri, Aug 7, 2009 at 11:07 AM, Yao Ge wrote:
>
> Are your product_name* fields numeric fields (integer or float)?
>
>
> Dals wrote:
>>
>> Hi...
>>
>> Is there any way to group values like shopping.yahoo.com or
>> shopper.cnet.com do?
>>
>> For instance, I have documents like:
>>
>> doc1 - product_name1 - value1
>> doc2 - product_name1 - value2
>> doc3 - product_name1 - value3
>> doc4 - product_name2 - value4
>> doc5 - product_name2 - value5
>> doc6 - product_name2 - value6
>>
>> I'd like to have a result grouping by product name with the value
>> range per product. Something like:
>>
>> product_name1 - (value1 to value3)
>> product_name2 - (value4 to value6)
>>
>> It is not like the current facet because the information is grouped by
>> item, not the entire result.
>>
>> Any idea?
>>
>> Thanks!
>>
>> David Lojudice Sobrinho
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Item-Facet-tp24853669p24865535.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
__

   David L. S.
dalss...@gmail.com
__

Is kill -9 safe or not?

2009-08-07 Thread Michael _

I've seen several threads that are one or two years old saying that
performing "kill -9" on the java process running Solr either CAN, or CAN NOT
corrupt your index.  The more recent ones seem to say that it CAN NOT, but
before I bake a kill -9 into my control script (which first tries a normal
"kill", of course), I'd like to hear the answer straight from the horse's
mouth...
I'm using Solr 1.4 nightly from about a month ago.  Can I kill -9 without
fear of having to rebuild my index?

Thanks!
Michael

Re: Preserving "C++" and other weird tokens

2009-08-07 Thread Michael _

On Thu, Aug 6, 2009 at 11:38 AM, Michael _  wrote:

> Hi everyone,
> I'm indexing several documents that contain words that the
> StandardTokenizer cannot detect as tokens.  These are words like
>   C#
>   .NET
>   C++
> which are important for users to be able to search for, but get treated as
> "C", "NET", and "C".
>
> How can I create a list of words that should be understood to be
> indivisible tokens?  Is my only option somehow stringing together a lot of
> PatternTokenizers?  I'd love to do something like  class="StandardTokenizer" tokenwhitelist=".NET C++ C#" />.
>
> Thanks in advance!
>

By the way, in case it wasn't clear: I'm not particularly tied to using the
StandardTokenizer.  Any tokenizer would be fine, if it did a reasonable job
of splitting up the input text while preserving special cases.

I'm also not averse to passing in a list of regexes, if I had to, but I'm
suspicious that that would be redoing a lot of the work done by the parser
inside the Tokenizer.

Thanks,
Michael

Re: Is kill -9 safe or not?

2009-08-07 Thread Yonik Seeley

Kill -9 will not corrupt your index, but you would lose any
uncommitted documents.

-Yonik
http://www.lucidimagination.com


On Fri, Aug 7, 2009 at 11:07 AM, Michael _ wrote:
> I've seen several threads that are one or two years old saying that
> performing "kill -9" on the java process running Solr either CAN, or CAN NOT
> corrupt your index.  The more recent ones seem to say that it CAN NOT, but
> before I bake a kill -9 into my control script (which first tries a normal
> "kill", of course), I'd like to hear the answer straight from the horse's
> mouth...
> I'm using Solr 1.4 nightly from about a month ago.  Can I kill -9 without
> fear of having to rebuild my index?
>
> Thanks!
> Michael
>

Re: Preserving "C++" and other weird tokens

2009-08-07 Thread Yonik Seeley

http://search.lucidimagination.com/search/document/2d325f6178afc00a/how_to_search_for_c

-Yonik
http://www.lucidimagination.com



On Thu, Aug 6, 2009 at 11:38 AM, Michael _ wrote:
> Hi everyone,
> I'm indexing several documents that contain words that the StandardTokenizer
> cannot detect as tokens.  These are words like
>  C#
>  .NET
>  C++
> which are important for users to be able to search for, but get treated as
> "C", "NET", and "C".
>
> How can I create a list of words that should be understood to be indivisible
> tokens?  Is my only option somehow stringing together a lot of
> PatternTokenizers?  I'd love to do something like  class="StandardTokenizer" tokenwhitelist=".NET C++ C#" />.
>
> Thanks in advance!
>

Re: Attempt to query for max id failing with exception

2009-08-07 Thread Yonik Seeley

I just tried this sample code... it worked fine for me on trunk.

-Yonik
http://www.lucidimagination.com

On Thu, Aug 6, 2009 at 8:28 PM, Reuben Firmin wrote:
> I'm using SolrJ. When I attempt to set up a query to retrieve the maximum id
> in the index, I'm getting an exception.
>
> My setup code is:
>
>     final SolrQuery params = new SolrQuery();
>     params.addSortField("id", ORDER.desc);
>     params.setRows(1);
>     params.setQuery(queryString);
> 
>     final QueryResponse queryResponse = server.query(params);
>
> This latter line is blowing up with:
>
> Not Found
>
> request: 
> http://solr.xxx.myserver/select?sort=iddesc&rows=1&q=*:*&wt=javabin&version=2.2
> org.apache.solr.common.SolrException
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(343)
>        org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(183)
>        org.apache.solr.client.solrj.request.QueryRequest#process(90)
>        org.apache.solr.client.solrj.SolrServer#query(109)
>
> There are a couple things to note -
>
>   - there is a space between id and desc which looks suspicious, but
>   swapping changing wt to XML and leaving the URL otherwise the same causes
>   solr no grief when queried via a browser
>   - the index is in fact empty - this particular section of code is bulk
>   loading our documents, and using the max id query to figure out where to
>   start from. (I can and will try catching the exception and assuming 0, but
>   ideally I wouldn't get an exception just from doing the query)
>
> Am I doing this query in the wrong way?
>
> Thanks
> Reuben
>

Re: Is kill -9 safe or not?

2009-08-07 Thread Otis Gospodnetic

Yonik,

Uncommitted (as in solr un"commit"ed) on unflushed?


Thanks,
Otis


- Original Message 
> From: Yonik Seeley 
> To: solr-user@lucene.apache.org
> Sent: Friday, August 7, 2009 11:10:49 AM
> Subject: Re: Is kill -9 safe or not?
> 
> Kill -9 will not corrupt your index, but you would lose any
> uncommitted documents.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 
> On Fri, Aug 7, 2009 at 11:07 AM, Michael _wrote:
> > I've seen several threads that are one or two years old saying that
> > performing "kill -9" on the java process running Solr either CAN, or CAN NOT
> > corrupt your index.  The more recent ones seem to say that it CAN NOT, but
> > before I bake a kill -9 into my control script (which first tries a normal
> > "kill", of course), I'd like to hear the answer straight from the horse's
> > mouth...
> > I'm using Solr 1.4 nightly from about a month ago.  Can I kill -9 without
> > fear of having to rebuild my index?
> >
> > Thanks!
> > Michael
> >

Re: Attempt to query for max id failing with exception

2009-08-07 Thread Reuben Firmin

Yep, thanks - this turned out to be a systems configuration error. Our
sysadmin hadn't opened up the http port on the server's internal network
interface; I could browse to it from outside (i.e. firefox on my machine),
but the apache landing page was being returned when CommonsHttpSolrServer
tried to get at it.

Reuben

On Fri, Aug 7, 2009 at 12:03 PM, Yonik Seeley wrote:

> I just tried this sample code... it worked fine for me on trunk.
>
> -Yonik
> http://www.lucidimagination.com
>
> On Thu, Aug 6, 2009 at 8:28 PM, Reuben Firmin wrote:
> > I'm using SolrJ. When I attempt to set up a query to retrieve the maximum
> id
> > in the index, I'm getting an exception.
> >
> > My setup code is:
> >
> > final SolrQuery params = new SolrQuery();
> > params.addSortField("id", ORDER.desc);
> > params.setRows(1);
> > params.setQuery(queryString);
> > 
> > final QueryResponse queryResponse = server.query(params);
> >
> > This latter line is blowing up with:
> >
> > Not Found
> >
> > request:
> http://solr.xxx.myserver/select?sort=iddesc&rows=1&q=*:*&wt=javabin&version=2.2
> > org.apache.solr.common.SolrException
> > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(343)
> >
>  org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(183)
> >org.apache.solr.client.solrj.request.QueryRequest#process(90)
> >org.apache.solr.client.solrj.SolrServer#query(109)
> >
> > There are a couple things to note -
> >
> >   - there is a space between id and desc which looks suspicious, but
> >   swapping changing wt to XML and leaving the URL otherwise the same
> causes
> >   solr no grief when queried via a browser
> >   - the index is in fact empty - this particular section of code is bulk
> >   loading our documents, and using the max id query to figure out where
> to
> >   start from. (I can and will try catching the exception and assuming 0,
> but
> >   ideally I wouldn't get an exception just from doing the query)
> >
> > Am I doing this query in the wrong way?
> >
> > Thanks
> > Reuben
> >
>

Re: Is kill -9 safe or not?

2009-08-07 Thread Yonik Seeley

On Fri, Aug 7, 2009 at 12:04 PM, Otis
Gospodnetic wrote:
> Yonik,
>
> Uncommitted (as in solr un"commit"ed) on unflushed?

Solr uncommitted.  Even if the docs hit the disk via a segment flush,
they aren't part of the index until the index descriptor (segments_n)
is written pointing to that new segment.

-Yonik
http://www.lucidimagination.com



> Thanks,
> Otis
>
>
> - Original Message 
>> From: Yonik Seeley 
>> To: solr-user@lucene.apache.org
>> Sent: Friday, August 7, 2009 11:10:49 AM
>> Subject: Re: Is kill -9 safe or not?
>>
>> Kill -9 will not corrupt your index, but you would lose any
>> uncommitted documents.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>> On Fri, Aug 7, 2009 at 11:07 AM, Michael _wrote:
>> > I've seen several threads that are one or two years old saying that
>> > performing "kill -9" on the java process running Solr either CAN, or CAN 
>> > NOT
>> > corrupt your index.  The more recent ones seem to say that it CAN NOT, but
>> > before I bake a kill -9 into my control script (which first tries a normal
>> > "kill", of course), I'd like to hear the answer straight from the horse's
>> > mouth...
>> > I'm using Solr 1.4 nightly from about a month ago.  Can I kill -9 without
>> > fear of having to rebuild my index?
>> >
>> > Thanks!
>> > Michael
>> >
>
>

Solr CMS Integration

2009-08-07 Thread wojtekpia


I've been asked to suggest a framework for managing a website's content and
making all that content searchable. I'm comfortable using Solr for search,
but I don't know where to start with the content management system. Is
anyone using a CMS (open source or commercial) that you've integrated with
Solr for search and are happy with? This will be a consumer facing website
with a combination or articles, blogs, white papers, etc.

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Solr-CMS-Integration-tp24868462p24868462.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Preserving "C++" and other weird tokens

2009-08-07 Thread solrcoder


Ach, sorry I didn't find this before posting! - Michael


Yonik Seeley-2 wrote:
> 
> http://search.lucidimagination.com/search/document/2d325f6178afc00a/how_to_search_for_c
> 
> -Yonik
> http://www.lucidimagination.com
> 

-- 
View this message in context: 
http://www.nabble.com/Preserving-%22C%2B%2B%22-and-other-weird-tokens-tp24848968p24868579.html
Sent from the Solr - User mailing list archive at Nabble.com.

Question regarding merging Solr indexes

2009-08-07 Thread ahammad


Hello,

I have a MultiCore setup with 3 cores. I am trying to merge the indexes of
core1 and core2 into core3. I looked at the wiki but I'm somewhat unclear on
what needs to happen.

This is what I used:

http://localhost:9085/solr/core3/admin/?action=mergeindexes&core=core3&indexDir=/solrHome/core1/data/index&indexDir=/solrHome/core2/data/index&commit=true

When I hit this I just go to the admin page for core3. Maybe the way I
reference the indexes is incorrect? What path goes there anyway? 

Thanks


-- 
View this message in context: 
http://www.nabble.com/Question-regarding-merging-Solr-indexes-tp24868670p24868670.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr CMS Integration

2009-08-07 Thread Andre Hagenbruch

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

wojtekpia schrieb:

Hi Wojtek,

> I've been asked to suggest a framework for managing a website's content and
> making all that content searchable. I'm comfortable using Solr for search,
> but I don't know where to start with the content management system. Is
> anyone using a CMS (open source or commercial) that you've integrated with
> Solr for search and are happy with? This will be a consumer facing website
> with a combination or articles, blogs, white papers, etc.

if you're comfortable with PHP you might want to look at Drupal
(http://drupal.org/project/apachesolr) which sounds like a good match
for your requirements...

Regards,

Andre
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkp8YlQACgkQ3wuzs9k1icVFSACgjRy7AOd+Aney7LDmpWTaIssz
p74AnAn+/5So+qSfpfbXOXShCYZfAppS
=zqHU
-END PGP SIGNATURE-

Re: Solr CMS Integration

2009-08-07 Thread Grant Ingersoll

lucidimagination.com is powered off of Drupal and we index it using  
Solr (but not the Drupal plugin, as we have non CMS data as well).  It  
has blogs, articles, white papers, mail archives, JIRA tickets, Wiki's  
etc.


On Aug 7, 2009, at 1:01 PM, wojtekpia wrote:



I've been asked to suggest a framework for managing a website's  
content and
making all that content searchable. I'm comfortable using Solr for  
search,

but I don't know where to start with the content management system. Is
anyone using a CMS (open source or commercial) that you've  
integrated with
Solr for search and are happy with? This will be a consumer facing  
website

with a combination or articles, blogs, white papers, etc.

Thanks,

Wojtek
--
View this message in context: 
http://www.nabble.com/Solr-CMS-Integration-tp24868462p24868462.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

localSolr install

2009-08-07 Thread Brian Klippel

Is there any sort of guide to installing and configuring localSolr into
an existing solr implementation?

 

I'm not extremely versed with java applications, but I've managed to
cobble together jetty and solr multicore fairly reliably.  I've
downloaded localLucine 2.0 and localSolr 6.1, and this is where the
guesswork starts.

 

Any help is greatly appreciated.

Re: localSolr install

2009-08-07 Thread Bhargava Sriram

Hi All,
 I also need the same information. I am planning to set up solr.
 I have data around 20 to 30 million records and those in csv formats.
 Your help is highly appreciable.

Regards,
Bhargava S Akula.


2009/8/7 Brian Klippel 

> Is there any sort of guide to installing and configuring localSolr into
> an existing solr implementation?
>
>
>
> I'm not extremely versed with java applications, but I've managed to
> cobble together jetty and solr multicore fairly reliably.  I've
> downloaded localLucine 2.0 and localSolr 6.1, and this is where the
> guesswork starts.
>
>
>
> Any help is greatly appreciated.
>
>
>
>

Re: Is kill -9 safe or not?

2009-08-07 Thread solrcoder


Thanks for the confirmation and reassurance! - Michael


Yonik Seeley-2 wrote:
> 
> On Fri, Aug 7, 2009 at 12:04 PM, Otis
> Gospodnetic wrote:
>> Yonik,
>>
>> Uncommitted (as in solr un"commit"ed) on unflushed?
> 
> Solr uncommitted.  Even if the docs hit the disk via a segment flush,
> they aren't part of the index until the index descriptor (segments_n)
> is written pointing to that new segment.
> 
> -Yonik
> http://www.lucidimagination.com
> 
>>

-- 
View this message in context: 
http://www.nabble.com/Is-kill--9-safe-or-not--tp24866506p24869260.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr CMS Integration

2009-08-07 Thread Tim Archambault

I would second that and add that you may want to consider acquia.com as they
provide a solid infrustracture to support the solr instance.

On Fri, Aug 7, 2009 at 11:20 AM, Andre Hagenbruch
wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> wojtekpia schrieb:
>
> Hi Wojtek,
>
> > I've been asked to suggest a framework for managing a website's content
> and
> > making all that content searchable. I'm comfortable using Solr for
> search,
> > but I don't know where to start with the content management system. Is
> > anyone using a CMS (open source or commercial) that you've integrated
> with
> > Solr for search and are happy with? This will be a consumer facing
> website
> > with a combination or articles, blogs, white papers, etc.
>
> if you're comfortable with PHP you might want to look at Drupal
> (http://drupal.org/project/apachesolr) which sounds like a good match
> for your requirements...
>
> Regards,
>
> Andre
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.9 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAkp8YlQACgkQ3wuzs9k1icVFSACgjRy7AOd+Aney7LDmpWTaIssz
> p74AnAn+/5So+qSfpfbXOXShCYZfAppS
> =zqHU
> -END PGP SIGNATURE-
>



-- 
Contact me:
801.850.2953 (cell or sms)
facebook: http://www.facebook.com/profile.php?id=534661678
LinkedIn: http://www.linkedin.com/profile?viewProfile=&key=3902213
website:scanalytix.com

Re: Solr CMS Integration

2009-08-07 Thread wojtekpia


Thanks for the responses. I'll give Drupal a shot. It sounds like it'll do
the trick, and if it doesn't then at least I'll know what I'm looking for.

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Solr-CMS-Integration-tp24868462p24870218.html
Sent from the Solr - User mailing list archive at Nabble.com.

PhoneticFilterFactory related questions

2009-08-07 Thread Reuben Firmin

Hi,

I have a schema with three (relevant to this question) fields: title,
author, book_content. I found that if PhoneticFilterFactory is used as a
filter on book_content, it was bringing back all kinds of unrelated results,
so I have it applied only against title and author.

Questions --

1) I have the filter set up on both the index and query analyzers for the
fieldType of title/author. When running against an index which had been
built without the phonetic filter, phonetic searches still worked. Is there
a performance benefit to applying the phonetic filter to the index analyzer
as well as the query analyzer, are there other benefits to doing so, or
should I not bother? (I.e. should I just apply the filter to the query
analyzer?)

2) Title / author matches are generally boosted, which is fine if it's an
exact match (i.e. "Shakespeare In Love" or "by William Shakespeare" are more
relevant than a book which mentions Shakespeare). However, the phonetic
filter put a bit of a spanner in the works - now if I search for "bottling",
books with the word "b*a*ttling" in the title show up above books with the
non-substituted word in the content. How can I juggle the boosting / field
setup to be something like:
a) Title/author matches (with exactly matched spelling - stemming etc is
fine)
b) Content matches (with exactly matched spelling)
c) Title/author matches (with phoneme equivalent spelling)

Do I need to create separate non-phonetic title/author fields for this, or
is there a different way to achieve the same effect?

Thanks
Reuben

Solr Security

2009-08-07 Thread Francis Yakin


Have anyone had an experience to setup the Solr Security?

http://wiki.apache.org/solr/SolrSecurity

I would like to implement using HTTP Authentication or using Path Based 
Authentication.

So, in the webdefault.xml I set like the following:



  Solr authenticated application
  /core1/*


  core1-role

  

  
BASIC
Test Realm
  

What should I put in "url-pattern" and "web-resource-name" ?

Then I set up

Realm.properties like this

guest: guest, core1-role


Francis

Re: Solr CMS Integration

2009-08-07 Thread Olivier Dobberkau



Am 07.08.2009 um 19:01 schrieb wojtekpia:

I've been asked to suggest a framework for managing a website's  
content and
making all that content searchable. I'm comfortable using Solr for  
search,

but I don't know where to start with the content management system. Is
anyone using a CMS (open source or commercial) that you've  
integrated with
Solr for search and are happy with? This will be a consumer facing  
website

with a combination or articles, blogs, white papers, etc.



Hi Wojtek,

Have a look at TYPO3. http://typo3.org/
It is quite powerful.
Ingo and I are currently implementing a SOLR extension for it.
We currently use it at http://www.be-lufthansa.com/
Contact me if you want an insight.

Many greetings,

Olivier


--
Olivier Dobberkau
. . . . . . . . . . . . . .
Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstrasse 73
D 60329 Frankfurt/Main

Fon:  +49 (0)69 - 247 52 18 - 0
Fax:  +49 (0)69 - 247 52 18 - 99

Mail: olivier.dobber...@dkd.de
Web: http://www.dkd.de

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer: Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

Aktuelle Projekte:
http://bewegung.taz.de - Launch (Ruby on Rails)
http://www.hans-im-glueck.de - Relaunch (TYPO3)
http://www.proasyl.de - Relaunch (TYPO3)

Re: Solr CMS Integration

2009-08-07 Thread Paul Libbrecht


Hello Wojtek,

I don't want to discourage all the famous CMSs around nor solr uptake  
but xwiki is quite a powerful CMS and has a search that is lucene based.


paul


Le 07-août-09 à 22:42, Olivier Dobberkau a écrit :


I've been asked to suggest a framework for managing a website's  
content and
making all that content searchable. I'm comfortable using Solr for  
search,
but I don't know where to start with the content management system.  
Is
anyone using a CMS (open source or commercial) that you've  
integrated with
Solr for search and are happy with? This will be a consumer facing  
website

with a combination or articles, blogs, white papers, etc.


Have a look at TYPO3. http://typo3.org/
It is quite powerful.
Ingo and I are currently implementing a SOLR extension for it.
We currently use it at http://www.be-lufthansa.com/
Contact me if you want an insight.




smime.p7s
Description: S/MIME cryptographic signature

spellcheck component in 1.4 distributed

2009-08-07 Thread mike anderson

I am e-mailing to inquire about the status of the spellchecking component in
1.4 (distributed). I saw SOLR-785, but it is unreleased and for 1.5. Any
help would be much appreciated.
Thanks in advance,
Mike

Re: solr v1.4 in production?

2009-08-07 Thread Ian Connor

Pubget has been using 1.4 for a while now to make the replication easier.

http://pubget.com

We compiled a while back and are thinking of updating to the latest build to
start playing with distributed spell checking.

On Fri, Aug 7, 2009 at 7:42 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Wed, Jul 1, 2009 at 6:17 PM, Ed Summers  wrote:
>
> > Here at the Library of Congress we've got several production Solr
> > instances running v1.3. We've been itching to get at what will be v1.4
> > and were wondering if anyone else happens to be using it in production
> > yet. Any information you can provide would be most welcome.
> >
> >
> We're using Solr 1.4 built from r793546 in production along with the new
> java based replication.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

-- 
Regards,

Ian Connor

Can multiple Solr webapps access the same lucene index files?

2009-08-07 Thread Mark Diggory

Hello,

I have a question I can't find an answer to in the list.  Can mutliple solr
webapps (for instance in separate cluster nodes) share the same lucene index
files stored within a shared filesystem?  We do this with a custom Lucene
search application right now, I'm trying to switch to using solr and am
curious if we can use the same deployment strategy.

Mark

MoreLikeThis: How to get quality terms from html from content stream?

2009-08-07 Thread Jay Hill

I'm using the MoreLikeThisHandler with a content stream to get documents
from my index that match content from an html page like this:
http://localhost:8080/solr/mlt?stream.url=http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2009/08/06/SP5R194Q13.DTL&mlt.fl=body&rows=4&debugQuery=true

But, not surprisingly, the query generated is meaningless because a lot of
the markup is picked out as terms:

body:li body:href  body:div body:class body:a body:script body:type body:js
body:ul body:text body:javascript body:style body:css body:h body:img
body:var body:articl body:ad body:http body:span body:prop


Does anyone know a way to transform the html so that the content can be
parsed out of the content stream and processed w/o the markup? Or do I need
to write my own HTMLParsingMoreLikeThisHandler?

If I parse the content out to a plain text file and point the stream.url
param to file:///parsedfile.txt it works great.

-Jay

How to use key with facet.prefix?

2009-08-07 Thread Jón Helgi Jónsson

I'm trying to facet multiple times on same field using key.

This works fine except when I use prefixes for these facets.

What I got so far (and not functional):
..
&facet=true
&facet.field=category&f.category.facet.prefix=01
&facet.field={!key=subcat}category&f.subcat.facet.prefix=00

This will give me 2 facets in results, one named 'category' and
another 'subcat' like expected. But prefix for key 'subcat' is ignored
and the other prefix is used for both facets.

How do I use key with prefixes or am I barking up the wrong tree here?

Thanks!

Re: Can multiple Solr webapps access the same lucene index files?

2009-08-07 Thread Otis Gospodnetic

Yes, they could all point to an index that lives on a NAS or SAN, for example.  
You'd still have to make sure only one server is writing to the index at a 
time.  Zookeeper can help with coordination of that.

 Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
> From: Mark Diggory 
> To: solr-user@lucene.apache.org
> Sent: Friday, August 7, 2009 8:16:46 PM
> Subject: Can multiple Solr webapps access the same lucene index files?
> 
> Hello,
> 
> I have a question I can't find an answer to in the list.  Can mutliple solr
> webapps (for instance in separate cluster nodes) share the same lucene index
> files stored within a shared filesystem?  We do this with a custom Lucene
> search application right now, I'm trying to switch to using solr and am
> curious if we can use the same deployment strategy.
> 
> Mark

Re: Question regarding merging Solr indexes

2009-08-07 Thread Shalin Shekhar Mangar

On Fri, Aug 7, 2009 at 10:45 PM, ahammad  wrote:

>
> Hello,
>
> I have a MultiCore setup with 3 cores. I am trying to merge the indexes of
> core1 and core2 into core3. I looked at the wiki but I'm somewhat unclear
> on
> what needs to happen.
>
> This is what I used:
>
>
> http://localhost:9085/solr/core3/admin/?action=mergeindexes&core=core3&indexDir=/solrHome/core1/data/index&indexDir=/solrHome/core2/data/index&commit=true
>
> When I hit this I just go to the admin page for core3. Maybe the way I
> reference the indexes is incorrect? What path goes there anyway?
>

Look at
http://wiki.apache.org/solr/MergingSolrIndexes#head-0befd0949a54b6399ff926062279afec62deb9ce

-- 
Regards,
Shalin Shekhar Mangar.

Re: 99.9% uptime requirement

2009-08-07 Thread Chris Hostetter


: Subject: 99.9% uptime requirement
: In-Reply-To: <4a730d0f.3050...@btelligent.de>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking





-Hoss

Re: solr/home in web.xml relative to web server home

2009-08-07 Thread Chris Hostetter


: the environment variable (env-entry) in web.xml to configure the solr/home is
: relative to the web server's working directory. I find this unusual as all the
: servlet paths are relative to the web applications directory (webapp context,
: that is). So, I specified solr/home relative to the web app dir, as well, at
: first.

the intention is not that the SOlr Home dir be configured inside the 
web.xml -- it is *possible* to specify the solr home dir in the web.xml as 
you describe, but that's relaly just a fallback for people who really, 
really, want to bake all of the information into the war.  

solr.war is an application -- when you run hte paplication you specify (at 
run time) some configuration information.  Hardcoding that config 
information into the war file is akin to compile a C++ program with all of 
hte config options hardcoded -- you can do it, but it's not very generic, 
and requires a lot of hacking whenever you want to upgrade.

: (In my case, I want to deliver the solr web application including a custom
: entity processor, so that is why I want to include the solr war as part of my
: release cycle. It is easier to deliver that to the system administration than
: to provide them with partial packages they have to install into an already
: installed war, imho.)

you don't have to "install into an already installed war" to add custom 
plugins .. you just have to put the jar file for your custom plugins into 
a "lib" directory instead of your solr home dir.

This is really no different then something like the Apache HTTPD server. 
there is the application (the binary httpd / solr.war) there is your 
configuration (httpd.conf / solr home dir) and there are custom modules 
you can choose to load (libmod_entityprocessor.so / 
your-entityprocessor.jar)



-Hoss

Re: solr indexing on same set of records with different value of unique field, not working fine.

2009-08-07 Thread Chris Hostetter


: Sorry, schema.xml file is here in this mail...

in the schema.xml file you attached, the uniqueKey field is "evid"

you only provided one example of the type of input you are indexing, and 
in that example...

: > 501

...but in your orriginal email (see below) you said you were using a 
timestamp field as the uniqueKey, and you didn't understand why reindexing 
hte same 100 docs twice didn't give you 200 docs. that example uniqueKey 
value isn't a timestamp, so i don't really understand what you're talking 
about.  if you index that doc over and over with the schema.xml you sent, 
then it's constaintly going to replace it self over and over again because 
hte uniqueKey field (evid) is the same (501) everytime.

: > > : Here, i specified 20 fields in schema.xml file. the unoque field i set
: > > was,
: > > : currentTimeStamp field.
: > > : So, when i run the loader program (which loads xml data into solr) it
: > > creates
: > > : currentTimestamp value...and loads into solr.
: > > : : For this situation,
: > > : i stopped the loader program, after 100 records indexed into solr.
: > > : Then again, i run the loader program for the SAME 100 records to indexed
: > > : means,
: > > : the solr results 100, rather than 200.
: > > : : Because, i set currentTimeStamp field as uniqueField. So i expect the
: > > result
: > > : as 200, if i run again the same 100 records...
: > > : : Any suggestions please...




-Hoss

Re: update some index documents after indexing process is done with DIH

2009-08-07 Thread Chris Hostetter


: What is confusing me now is that I have to implement my logic in

you're certianly in a fuzzy grey area here ... none of this stuff was 
designed for the kind of thing you're doing.

: But in processCommit, having access to the core I can get the IndexReader
: but I still don't know how to get the IndexWriter and SolrInputDocuments in

you don't get direct access ot the IndexWriter ... instead your 
UpdateProcessor uses the SolrCore to get an UpdateRequestProcessorChain to 
add (ie: replace) the SolrInputDocuments you made based on what you saw in 
the orriginal SolrInputDocuments.

for a second i was thinking that you'd have to worry about checking some 
threadlocal variable to keep yourself from going into an infinite loop, 
but then i remembered that you can configured named 
UpdateRequestProcessorChains ... so your default Chain can use your custom 
component, and you can create a simple chain (that bybasses your custom 
component) for your component to call processAdd()/processCommit() on.



-Hoss

Re: Reasonable number of maxWarming searchers

2009-08-07 Thread Chris Hostetter


:  Is there a problem if i set maxWarmingSearchers to something like 30 or 40?

my personal opinion: anything higher then 3 indicates a serious 
architecture problem.

On a master, doing lots of updates, the "warming" time should be zero, so 
there shouldn't ever be more then 2 searchers at one time -- 3 is being 
generous incase you just happen to get some paralell rapid fire 
add/commit pairs ... beyond that you're better off just letting any ohter 
concurrent commit calls block for the few milli-seconds it will take to 
finish the commit.

:  Also, how do I disable the cache warming? Is setting autowarmCount's
: to 0 enough?

yes, but even better: make the cache sizes zero, that way if someone 
accidently does query your master, you won't waste ram caching it.




-Hoss

54 matches

Mail list logo