Re: use a solr-built index with lucene?

2010-04-09 Thread Paul Libbrecht
This looks like an interesting avenue for a smooth transition from  
lucene to solr.


thanks for more hints you find around.
(e.g. maybe it is not too hard to pre-generate a schema.xml from an  
actual index for the field-types?)


paul


Le 09-avr.-10 à 02:32, Erik Hatcher a écrit :


Yes... gotta jive with schema.xml though.

Erik

On Apr 8, 2010, at 7:18 PM, Tommy Chheng wrote:

If i build an index with solr, is it possible to use the index  
folder with lucene?


--
Tommy Chheng
Programmer and UC Irvine Graduate Student
Twitter @tommychheng
http://tommy.chheng.com







Re: use a solr-built index with lucene?

2010-04-09 Thread Tommy Chheng
 I was thinking of the reverse case: from solr to lucene. lucene 
doesn't use a schema.xml


Tommy Chheng
Programmer and UC Irvine Graduate Student
Twitter @tommychheng
http://tommy.chheng.com


On 4/9/10 12:15 AM, Paul Libbrecht wrote:
This looks like an interesting avenue for a smooth transition from 
lucene to solr.


thanks for more hints you find around.
(e.g. maybe it is not too hard to pre-generate a schema.xml from an 
actual index for the field-types?)


paul


Le 09-avr.-10 à 02:32, Erik Hatcher a écrit :


Yes... gotta jive with schema.xml though.

Erik

On Apr 8, 2010, at 7:18 PM, Tommy Chheng wrote:

If i build an index with solr, is it possible to use the index 
folder with lucene?


--
Tommy Chheng
Programmer and UC Irvine Graduate Student
Twitter @tommychheng
http://tommy.chheng.com







Replication process on Master/Slave slowing down slave read/search performance

2010-04-09 Thread Marcin

Hi guys,

I have noticed that Master/Slave replication process is slowing down 
slave read/search performance during replication being done.



please help
cheers


Re: Replication process on Master/Slave slowing down slave read/search performance

2010-04-09 Thread Marco Martinez
Hi Marcin,

This is because when you do the replication, all the caches are rebuild
cause the index has changed, so the searchs performance decrease. You can
change your architecture to a multicore one to reduce the impact of the
replication. Using two cores, one to do the replication, and other to
search, when the replication is done, do a swap of the cores so the caches
are updated all the time.

Regards


Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2010/4/9 Marcin 

> Hi guys,
>
> I have noticed that Master/Slave replication process is slowing down slave
> read/search performance during replication being done.
>
>
> please help
> cheers
>


Solr giving 500's

2010-04-09 Thread william pink
Hi,

I was seeing this error from Solr this morning

"Severe_errors_in_solr_configuration__Check_your_log_files_for_more_detailed_infomation_on_what_may_be_wrong__If_you_want_solr_to_continue_after_configuration_errors_changeabortOnConfigurationErrorfalseabortOnConfigurationError__in_solrconfigxml
___javalangRuntimeException_javaioFileNotFoundException_no_segments_file_found_in_orgapachelucenestoreFSDirectoryoptsolrsolrdataindex_files__at_orgapachesolrcoreSolrCoregetSearcherSolrCorejava433__at_orgapachesolrcoreSolrCoreinitSolrCorejava216
__at_orgapachesolrcoreSolrCoregetSolrCoreSolrCorejava177__at_orgapachesolrservletSolrDispatchFilterinitSolrDispatchFilterjava69__at_orgmortbayjettyservletFilterHolderdoStartFilterHolderjava99__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__
at_orgmortbayjettyservletServletHandlerinitializeServletHandlerjava594__at_orgmortbayjettyservletContextstartContextContextjava139__at_orgmortbayjettywebappWebAppContextstartContextWebAppContextjava1218__at_orgmortbayjettyhandlerContextHandlerdoStartContextHandlerjava500
__at_orgmortbayjettywebappWebAppContextdoStartWebAppContextjava448__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyhandlerHandlerCollectiondoStartHandlerCollectionjava147__at_
orgmortbayjettyhandlerContextHandlerCollectiondoStartContextHandlerCollectionjava161__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyhandlerHandlerCollectiondoStartHandlerCollectionjava147__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__
at_orgmortbayjettyhandlerHandlerWrapperdoStartHandlerWrapperjava117__at_orgmortbayjettyServerdoStartServerjava210__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayxmlXmlConfigurationmainXmlConfigurationj"
at http://127.0.0.1:8983/solr in
/var/www/aspire/releases/20100407094800/vendor/plugins/acts_as_solr/lib/acts_as_solr.rb:49:in


I have been playing around with getting logging configured but it doesn't
seem to output anything to the log file I used this link
http://wiki.apache.org/solr/LoggingInDefaultJettySetup as a guide

I noticed there was nothing in the index folder which at this moment in time
I am not sure why but I had a copy of the index from yesterday afternoon so
I have copied that into its place but now I am seeing the following errors

RuntimeError (Solr exception 500
"Severe_errors_in_solr_configuration__Check_your_log_files_for_more_detailed_infomation_on_what_may_be_wrong__If_you_want_solr_to_continue_after_configuration_errors_changeabortOnConfigurationErrorfalseabortOnConfigurationErro
__in_solrconfigxml___javalangRuntimeException_javaioFileNotFoundException_no_segments_file_found_in_orgapachelucenestoreFSDirectoryoptsolrsolrdataindex_files__ratpfrq__of8kfdt__rb4afnm__rb4atis__r7pvfnm__rb4dtis__of8ktis__rb4ffdt__r970fnm
__ratpnrm__rb4gfrq__rb4cfrq__rb4efrq__r970nrm__rb4dprx__ratpfdt__rb4gnrm__of8k_2pmdel__r6jlfrq__r7pvtis__lj9hfdt__of8knrm__of8kfnm__r88sfdt__rb4aprx__r6jlfdt__r970_2sdel__r88s_2sdel__rb4dfdx__r6jltii__r7pvfdx__rb4ctii__rb4ffdx__r6jlfnm__ra76frq
__rb4dfdt__rb4dnrm__rb4ffrq__rb4btii__r72ifnm__r7pvtii__r9p3frq__r88stii__lj9htii__rb4ftii__lj9hprx__r72inrm__of8kfrq__r7pvprx__rb4cfdx__rb4afrq__r9p3fdt__rb4gfdt__r72itii__of8ktii__lj9hnrm__rb4g_1del__r6jlnrm__ra76nrm__rb4gprx__r8ontii__r88stis__r72itis
__rb4ffnm__rb4f_1del__rb4afdx__ra76tii__r72iprx__of8kfdx__ratptis__r9p3nrm__r9p3_2gdel__ratpprx__rb4gfnm__ra76fdt__rb4cprx__r8onfrq__r8onnrm__lj9hfrq__r8onprx__rb4a_5del__rb4dtii__r8ontis__rb4e_2del__rb4gfdx__rb4etii__r7pvnrm__rb4fnrm__rb4enrm
__lj9htis__rb4efnm__r9p3fdx__rb4ftis__rb4efdt__r72ifdt__rb4gtii__rb4anrm__r8onfnm__of8kprx__r9p3fnm__r970frq__r7pvfrq__r9p3tii__r88sfrq__rb4cfnm__rb4efdx__ratptii__rb4dfrq__ratpfnm__r88snrm__r72ifdx__rb4btis__rb4fprx__r9p3tis__ra76tis__r88sprx__r
88sfnm__r6jlfdx__r8on_3ddel__lj9hfdx__r970tii__r7pvfdt__ra76fnm__r72ifrq__r970tis__r6jl_nvdel__rb4eprx__r88sfdx__rb4cnrm__r9p3prx__rb4afdt__rb4bfnm__rb4bfdt_segmentsgen__rb4bfdx__r8onfdx__lj9hfnm__rb4bfrq__rb4bnrm__r970fdt__r6jltis__rb4dfnm__
b4gtis__rb4ctis__ra76prx__ratp_13del__r970prx__r7pv_3jdel__rb4bprx__r970fdx__ra76fdx__r6jlprx__ra76_2adel__"
at http://127.0.0.1:8983/solr in
/var/www/webapp/releases/20100407094800/vendor/plugins/acts_as_solr/lib/acts_as_solr.rb:49:in
`execute' while performing search {:query=>"(visible_to_candidates_b:(true)
AND site_id_t:(68)) AND (type_s:Vacancy OR type_s:VacancyLite);opening_at_d
desc", :operator=>nil, :rows=>20, :start=>0, :field_list=>["pk_i",
"score"]})

As you can see it uses a standalone Solr installation but uses the ruby
acts_as_solr plugin to interact with the index, I am not really sure what to
do after this apart from reindex which could take a long time.

Any suggestions?

If anyone has any ideas on the logging too that would be great!

Thanks,
Will


Re: Faceting on a multi-valued field by index

2010-04-09 Thread Erik Hatcher
Though if you added a prefix to all your root id's, say "root"  
format, then you could use facet.prefix=root


Erik

On Apr 8, 2010, at 10:24 PM, Lance Norskog wrote:


Nope! Lucene is committed to maintaining the order of values added to
a field, but does not have this feature.

On Thu, Apr 8, 2010 at 6:44 PM, Blargy  wrote:


Is there anyway to facet on a multi-valued field at a particular  
index?


For example, I have a field category_ids which is multi-valued  
containing
category ids. The first value in that field is always the root  
category and
I would like to be able to facet on just that one field. Is this  
possible

without explicitly creating a separate field?

Thanks
--
View this message in context: 
http://n3.nabble.com/Faceting-on-a-multi-valued-field-by-index-tp707436p707436.html
Sent from the Solr - User mailing list archive at Nabble.com.





--
Lance Norskog
goks...@gmail.com




Re: use a solr-built index with lucene?

2010-04-09 Thread Erik Hatcher

Oh, sorry, I got the direction backwards in my initial reply.

Yes, of course you can use an index from Solr with Lucene directly.   
It's just a Lucene index.  Just make sure you use the same version of  
Lucene (pull the JARs from solr.war, I'd say).  For example, you can  
open a "Solr index" with Luke.


If you're using a Lucene app against a live Solr index, be careful  
with locking (in short, the default lock setting in solrconfig.xml  
isn't set for sharing the index between two processes).


Erik


On Apr 9, 2010, at 3:18 AM, Tommy Chheng wrote:

I was thinking of the reverse case: from solr to lucene. lucene  
doesn't use a schema.xml


Tommy Chheng
Programmer and UC Irvine Graduate Student
Twitter @tommychheng
http://tommy.chheng.com


On 4/9/10 12:15 AM, Paul Libbrecht wrote:
This looks like an interesting avenue for a smooth transition from  
lucene to solr.


thanks for more hints you find around.
(e.g. maybe it is not too hard to pre-generate a schema.xml from an  
actual index for the field-types?)


paul


Le 09-avr.-10 à 02:32, Erik Hatcher a écrit :


Yes... gotta jive with schema.xml though.

   Erik

On Apr 8, 2010, at 7:18 PM, Tommy Chheng wrote:

If i build an index with solr, is it possible to use the index  
folder with lucene?


--
Tommy Chheng
Programmer and UC Irvine Graduate Student
Twitter @tommychheng
http://tommy.chheng.com









Re: Solr giving 500's

2010-04-09 Thread Yonik Seeley
Looks like you're missing one of the index files... segments_
It points to all the other index files.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague


On Fri, Apr 9, 2010 at 6:20 AM, william pink  wrote:
> Hi,
>
> I was seeing this error from Solr this morning
>
> "Severe_errors_in_solr_configuration__Check_your_log_files_for_more_detailed_infomation_on_what_may_be_wrong__If_you_want_solr_to_continue_after_configuration_errors_changeabortOnConfigurationErrorfalseabortOnConfigurationError__in_solrconfigxml
> ___javalangRuntimeException_javaioFileNotFoundException_no_segments_file_found_in_orgapachelucenestoreFSDirectoryoptsolrsolrdataindex_files__at_orgapachesolrcoreSolrCoregetSearcherSolrCorejava433__at_orgapachesolrcoreSolrCoreinitSolrCorejava216
> __at_orgapachesolrcoreSolrCoregetSolrCoreSolrCorejava177__at_orgapachesolrservletSolrDispatchFilterinitSolrDispatchFilterjava69__at_orgmortbayjettyservletFilterHolderdoStartFilterHolderjava99__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__
> at_orgmortbayjettyservletServletHandlerinitializeServletHandlerjava594__at_orgmortbayjettyservletContextstartContextContextjava139__at_orgmortbayjettywebappWebAppContextstartContextWebAppContextjava1218__at_orgmortbayjettyhandlerContextHandlerdoStartContextHandlerjava500
> __at_orgmortbayjettywebappWebAppContextdoStartWebAppContextjava448__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyhandlerHandlerCollectiondoStartHandlerCollectionjava147__at_
> orgmortbayjettyhandlerContextHandlerCollectiondoStartContextHandlerCollectionjava161__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyhandlerHandlerCollectiondoStartHandlerCollectionjava147__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__
> at_orgmortbayjettyhandlerHandlerWrapperdoStartHandlerWrapperjava117__at_orgmortbayjettyServerdoStartServerjava210__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayxmlXmlConfigurationmainXmlConfigurationj"
> at http://127.0.0.1:8983/solr in
> /var/www/aspire/releases/20100407094800/vendor/plugins/acts_as_solr/lib/acts_as_solr.rb:49:in
>
>
> I have been playing around with getting logging configured but it doesn't
> seem to output anything to the log file I used this link
> http://wiki.apache.org/solr/LoggingInDefaultJettySetup as a guide
>
> I noticed there was nothing in the index folder which at this moment in time
> I am not sure why but I had a copy of the index from yesterday afternoon so
> I have copied that into its place but now I am seeing the following errors
>
> RuntimeError (Solr exception 500
> "Severe_errors_in_solr_configuration__Check_your_log_files_for_more_detailed_infomation_on_what_may_be_wrong__If_you_want_solr_to_continue_after_configuration_errors_changeabortOnConfigurationErrorfalseabortOnConfigurationErro
> __in_solrconfigxml___javalangRuntimeException_javaioFileNotFoundException_no_segments_file_found_in_orgapachelucenestoreFSDirectoryoptsolrsolrdataindex_files__ratpfrq__of8kfdt__rb4afnm__rb4atis__r7pvfnm__rb4dtis__of8ktis__rb4ffdt__r970fnm
> __ratpnrm__rb4gfrq__rb4cfrq__rb4efrq__r970nrm__rb4dprx__ratpfdt__rb4gnrm__of8k_2pmdel__r6jlfrq__r7pvtis__lj9hfdt__of8knrm__of8kfnm__r88sfdt__rb4aprx__r6jlfdt__r970_2sdel__r88s_2sdel__rb4dfdx__r6jltii__r7pvfdx__rb4ctii__rb4ffdx__r6jlfnm__ra76frq
> __rb4dfdt__rb4dnrm__rb4ffrq__rb4btii__r72ifnm__r7pvtii__r9p3frq__r88stii__lj9htii__rb4ftii__lj9hprx__r72inrm__of8kfrq__r7pvprx__rb4cfdx__rb4afrq__r9p3fdt__rb4gfdt__r72itii__of8ktii__lj9hnrm__rb4g_1del__r6jlnrm__ra76nrm__rb4gprx__r8ontii__r88stis__r72itis
> __rb4ffnm__rb4f_1del__rb4afdx__ra76tii__r72iprx__of8kfdx__ratptis__r9p3nrm__r9p3_2gdel__ratpprx__rb4gfnm__ra76fdt__rb4cprx__r8onfrq__r8onnrm__lj9hfrq__r8onprx__rb4a_5del__rb4dtii__r8ontis__rb4e_2del__rb4gfdx__rb4etii__r7pvnrm__rb4fnrm__rb4enrm
> __lj9htis__rb4efnm__r9p3fdx__rb4ftis__rb4efdt__r72ifdt__rb4gtii__rb4anrm__r8onfnm__of8kprx__r9p3fnm__r970frq__r7pvfrq__r9p3tii__r88sfrq__rb4cfnm__rb4efdx__ratptii__rb4dfrq__ratpfnm__r88snrm__r72ifdx__rb4btis__rb4fprx__r9p3tis__ra76tis__r88sprx__r
> 88sfnm__r6jlfdx__r8on_3ddel__lj9hfdx__r970tii__r7pvfdt__ra76fnm__r72ifrq__r970tis__r6jl_nvdel__rb4eprx__r88sfdx__rb4cnrm__r9p3prx__rb4afdt__rb4bfnm__rb4bfdt_segmentsgen__rb4bfdx__r8onfdx__lj9hfnm__rb4bfrq__rb4bnrm__r970fdt__r6jltis__rb4dfnm__
> b4gtis__rb4ctis__ra76prx__ratp_13del__r970prx__r7pv_3jdel__rb4bprx__r970fdx__ra76fdx__r6jlprx__ra76_2adel__"
> at http://127.0.0.1:8983/solr in
> /var/www/webapp/releases/20100407094800/vendor/plugins/acts_as_solr/lib/acts_as_solr.rb:49:in
> `execute' while performing search {:query=>"(visible_to_candidates_b:(true)
> AND site_id_t:(68)) AND (type_s:Vacancy OR type_s:VacancyLite);opening_at_d
> desc", :operator=>nil, :rows=>20, :start=>0, :field_list=>["pk_i",
> "score"]})
>
> As you can see it uses a standalone Solr installation but uses the ruby
> acts_as_solr plugin to interact with t

Re: Minimum Should Match the other way round

2010-04-09 Thread MitchK

Hoss,

before I ran into some missunderstandings, I want to come back to topic
first. I will have a look at some classes later, to find out whether some
other ideas which are not directly related to this topic (like the
multiword-synonyms at query-time) will work or not. I'm sorry for beeing
off-topic.

Chris Hostetter-3 wrote:
> 
> where the analyzer matters is in creating that numeric field at index time 
> ... hence my suggestion of having an analyzer chain that exactly matches 
> the field you are interested in, but ending with a TokenCountingFilter -- 
> it can take care of creating the "numeric-ish" (padded) field value when 
> the docs are indexed.
> 

Okay, as I have understood you mean something like this:







This fieldType should "store" (or let's say index) the number of tokens as
something like "005" for 5 token, right?

My problem is that I don't know how to query this field. 
I know what you mean with appending the query with "Add +titleLen:[* TO
MAX_LEN]" - but I don't know how to retrive the MAX_LEN information for a
specific query, since it depends in some cases of what an analyzer-chain
will be used at the tokenLen-field.

For example: I think it makes sense to use a WordDelimiterFilter at the end
of my TokenFilter-chain.
If my document is something like "The secrets of the iPhone 3G", than I want
to index it as "The secrets of the iPhone 3 G" (3G is going to be indexed as
two tokens).
This means, that the document length is increased by one token.

However, maybe I missunderstood your point:
"- Pick MAX_LEN Based On Number Of Query Clauses From Super" 
since I thought, that the number of query clauses depends on the number of
whitespaces in my query. If I am wrong, and it depends on the result of my
analyzer-chain, there is no problem. But I am not sure, if this is the case
or not.

Thank you for help.

- Mitch
-- 
View this message in context: 
http://n3.nabble.com/Minimum-Should-Match-the-other-way-round-tp694867p708264.html
Sent from the Solr - User mailing list archive at Nabble.com.


refreshing synonyms.txt - or other configs

2010-04-09 Thread Markus.Rietzler
 
i am wondering how config files like synonyms.txt or stopwords.txt can
be refreshed without restarting of solr,
maybe also how changes in solrconfig.xml or schema.xml can be refreshed?

i can use a multicore setup - i just tested it with a "multicore"-setup
with one one core (core0), there i can
call /solr/admin/cores?action=RELOAD&core=core0 and changes in
synonyms.txt are getting active.

i also understand that this should work in a master/slave setup, where
configfiles under /conf are replicated (at least when doing a a commit
or optimize on an index).

but whats with a standard setup? is there a way to do this? we have not
yet decided how we run our production servers. at the moment were
developing a enterprise search for our intranet...

markus


RE: index corruption / deployment strategy

2010-04-09 Thread Nagelberg, Kallin
Thanks Erik,

I forwarded your thoughts to management and put in good word for Lucid 
Imagination.

Regards,
Kallin Nagelberg

-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Sent: Thursday, April 08, 2010 2:18 PM
To: solr-user@lucene.apache.org
Subject: Re: index corruption / deployment strategy

Kallin,

It's a very rare report, and practically impossible I'm told, to  
corrupt the index these days thanks to Lucene's improvements over the  
last several releases (ignoring hardware malfunctions).

A single index is the best way to go, in my opinion - though at your  
scale you're probably looking at sharding it and using distributed  
search.  So you'll have multiple physical indexes, one for each shard,  
and a single virtual index in the eyes of your searching clients.

Backups, of course, are sensible, and Solr's replication capabilities  
can help here by requesting them periodically.  You'll be using  
replication anyway to scale to your query volume.

As for hardware scaling considerations, there are variables to  
consider like how faceting, sorting, and querying speed across a  
single large index versus sharding.  I'm guessing you'll be best with  
at least two shards, though possibly more considering these variables.

Erik
 @ Lucid Imagination

p.s. have your higher-ups give us a call if they'd like to discuss  
their concerns and consider commercial support for your mission  
critical big scale use of Solr :)



On Apr 8, 2010, at 1:33 PM, Nagelberg, Kallin wrote:
> I've been doing work evaluating Solr for use on a hightraffic  
> website for sometime and things are looking positive. I have some  
> concerns from my higher-ups that I need to address. I have suggested  
> that we use a single index in order to keep things simple, but there  
> are suggestions to split are documents amongst different indexes.
>
> The primary motivation for this split is a worry about potential  
> index corruption. IE, if we only have one index and it becomes  
> corrupt what do we do? I never considered this to be an issue since  
> we would have backups etc., but I think they have had issues with  
> other search technology in the past where one big index resulted in  
> frequent and difficult to recover from corruption. Do you think this  
> is a concern with Solr? If so, what would you suggest to mitigate  
> the risk?
>
> My second question involves general deployment strategy. We will  
> expect about 50 million documents, each on average a few paragraphs,  
> and our website receives maybe 10 million hits a day. Can anyone  
> provide an idea of # of servers, clustering/replication setup etc.  
> that might be appropriate for this scenario? I'm interested to hear  
> what other's experience is with similar situations.
>
> Thanks,
> -Kallin Nagelberg
>



Re: Replication process on Master/Slave slowing down slave read/search performance

2010-04-09 Thread Walter Underwood
You don't need multi-core. Solr already does this automatically. It creates a 
new Searcher and auto-warms the cache.

But, it will still be slow. If you use auto-warming, it uses most of one CPU, 
which slows down queries during warming. Also, warming isn't perfect, so 
queries will be slower after switching to the new Searcher. If you don't use 
warming, the cold cache will make queries slower.

There is no way to get around this. Solr throws away all the caches after 
replication, so there is a performance hit. In the system I ran, it took a few 
minutes to recover, so I staggered the replications 10 minutes apart across the 
search farm.

wunder

On Apr 9, 2010, at 3:00 AM, Marco Martinez wrote:

> Hi Marcin,
> 
> This is because when you do the replication, all the caches are rebuild
> cause the index has changed, so the searchs performance decrease. You can
> change your architecture to a multicore one to reduce the impact of the
> replication. Using two cores, one to do the replication, and other to
> search, when the replication is done, do a swap of the cores so the caches
> are updated all the time.
> 
> Regards
> 
> 
> Marco Martínez Bautista
> http://www.paradigmatecnologico.com
> Avenida de Europa, 26. Ática 5. 3ª Planta
> 28224 Pozuelo de Alarcón
> Tel.: 91 352 59 42
> 
> 
> 2010/4/9 Marcin 
> 
>> Hi guys,
>> 
>> I have noticed that Master/Slave replication process is slowing down slave
>> read/search performance during replication being done.
>> 
>> 
>> please help
>> cheers
>> 







RE: solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-04-09 Thread Demian Katz
I've given it a try, and it definitely seems to have improved the situation.  
However, there is still one weird case that's clearly related to term 
positions.  If I do this search, it fails:

title:"love customs in eighteenthcentury spain"

...but if I do this search, it succeeds:

title:"love customs in in eighteenthcentury spain"

(note the duplicate "in").

- Demian

> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Thursday, April 08, 2010 11:20 AM
> To: solr-user@lucene.apache.org
> Subject: Re: solr.WordDelimiterFilterFactory problem with hyphenated
> terms?
> 
> I'm not all that familiar with the underlying issues, but of the two
> I'd
> pick moving the WordDelimiterFactory rather than setting increments =
> "false".
> 
> But that's at least partly a guess
> 
> Best
> Erick
> 
> On Thu, Apr 8, 2010 at 11:00 AM, Demian Katz
> wrote:
> 
> > Thanks for looking into this -- I appreciate the help (and feel a
> little
> > better that there seems to be a bug at work here and not just my
> total
> > incomprehension).
> >
> > Sorry for any confusion over the UnicodeNormalizationFactory --
> that's
> > actually a plug-in from the SolrMarc project (
> > http://code.google.com/p/solrmarc/) that slipped into my example.
> Also,
> > as you guessed, my default operator is indeed set to "AND."
> >
> > It sounds to me that, of your two proposed work-arounds, moving the
> > StopFilterFactory after WordDelimiterFactory is the least disruptive.
> I'm
> > guessing that disabling position increments across the board might
> have
> > implications for other types of phrase searches, while filtering
> stopwords
> > later in the chain should be more functionally equivalent, if
> slightly less
> > efficient (potentially more terms to examine).  Would you agree with
> this
> > assessment?  If not, what possible negative side effects am I
> forgetting
> > about?
> >
> > thanks,
> > Demian
> >
> > > -Original Message-
> > > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > > Sent: Wednesday, April 07, 2010 10:04 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: solr.WordDelimiterFilterFactory problem with
> hyphenated
> > > terms?
> > >
> > > Well, for a quick trial using trunk, I had to remove the
> > > UnicodeNormalizationFactory, is that yours?
> > >
> > > But with that removed, I get the results you do, ASSUMING that
> you've
> > > set
> > > your default operator to AND in schema.xml...
> > >
> > > Believe it or not, it all changes and all your queries return a hit
> if
> > > you
> > > do one of two things (I did this in both index and query when
> testing
> > > 'cause
> > > I'm lazy):
> > > 1> move the inclusion of the StopFilterFactory after
> > > WordDelimiterFactory
> > > or
> > > 2> for StopFilterFactory, set enablePositionIncrements="false"
> > >
> > > I think either of these might work in your situation...
> > >
> > > On doing some more investigation, it appears that if a hyphenated
> word
> > > is
> > > immediately after a stopword AND the above is true (stop factory
> > > included
> > > before WordDelimiterFactory and enablePositionIncrements="true"),
> then
> > > the
> > > search fails. I indexed this title:
> > >
> > > Love-customs in eighteenth-century Spain for nineteenth-century
> > >
> > > Searching in solr/admin/form.jsp for:
> > > title:(nineteenth-century)
> > >
> > > fails. But if I remove the "for" from the title, the above query
> works.
> > > Searching for
> > > title:(love-customs)
> > > always works.
> > >
> > > Finally, (and it's *really* time to go to sleep now), just setting
> > > enablePositionIncrements="false" in the "index" portion of the
> schema
> > > also
> > > causes things to work.
> > >
> > > Developer folks:
> > > I didn't see anything in a quick look in SOLR or Lucene JIRAs,
> should I
> > > refine this a bit (really, sleepy time is near) and add a JIRA?
> > >
> > > Best
> > > Erick
> > >
> > > On Wed, Apr 7, 2010 at 10:29 AM, Demian Katz
> > > wrote:
> > >
> > > > Hello.  It has been a few weeks, and I haven't gotten any
> responses.
> > > >  Perhaps my question is too complicated -- maybe a better
> approach is
> > > to try
> > > > to gain enough knowledge to answer it myself.  My gut feeling is
> > > still that
> > > > it's something to do with the way term positions are getting
> handled
> > > by the
> > > > WordDelimiterFilterFactory, but I don't have a good understanding
> of
> > > how
> > > > term positions are calculated or factored into searching.  Can
> anyone
> > > > recommend some good reading to familiarize myself with these
> concepts
> > > in
> > > > better detail?
> > > >
> > > > thanks,
> > > > Demian
> > > >
> > > > From: Demian Katz
> > > > Sent: Tuesday, March 16, 2010 9:47 AM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: solr.WordDelimiterFilterFactory problem with hyphenated
> > > terms?
> > > >
> > > > This is my first post on this list -- apologies if this has been
> > > dis

Re: solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-04-09 Thread Robert Muir
but this behavior is correct, as you have position increments enabled.
if you want the second query (which has 2 gaps) to match, you need to either
use slop, or disable these increments alltogether.

On Fri, Apr 9, 2010 at 11:44 AM, Demian Katz wrote:

> I've given it a try, and it definitely seems to have improved the
> situation.  However, there is still one weird case that's clearly related to
> term positions.  If I do this search, it fails:
>
> title:"love customs in eighteenthcentury spain"
>
> ...but if I do this search, it succeeds:
>
> title:"love customs in in eighteenthcentury spain"
>
> (note the duplicate "in").
>
> - Demian
>
> > -Original Message-
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Thursday, April 08, 2010 11:20 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: solr.WordDelimiterFilterFactory problem with hyphenated
> > terms?
> >
> > I'm not all that familiar with the underlying issues, but of the two
> > I'd
> > pick moving the WordDelimiterFactory rather than setting increments =
> > "false".
> >
> > But that's at least partly a guess
> >
> > Best
> > Erick
> >
> > On Thu, Apr 8, 2010 at 11:00 AM, Demian Katz
> > wrote:
> >
> > > Thanks for looking into this -- I appreciate the help (and feel a
> > little
> > > better that there seems to be a bug at work here and not just my
> > total
> > > incomprehension).
> > >
> > > Sorry for any confusion over the UnicodeNormalizationFactory --
> > that's
> > > actually a plug-in from the SolrMarc project (
> > > http://code.google.com/p/solrmarc/) that slipped into my example.
> > Also,
> > > as you guessed, my default operator is indeed set to "AND."
> > >
> > > It sounds to me that, of your two proposed work-arounds, moving the
> > > StopFilterFactory after WordDelimiterFactory is the least disruptive.
> > I'm
> > > guessing that disabling position increments across the board might
> > have
> > > implications for other types of phrase searches, while filtering
> > stopwords
> > > later in the chain should be more functionally equivalent, if
> > slightly less
> > > efficient (potentially more terms to examine).  Would you agree with
> > this
> > > assessment?  If not, what possible negative side effects am I
> > forgetting
> > > about?
> > >
> > > thanks,
> > > Demian
> > >
> > > > -Original Message-
> > > > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > > > Sent: Wednesday, April 07, 2010 10:04 PM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: solr.WordDelimiterFilterFactory problem with
> > hyphenated
> > > > terms?
> > > >
> > > > Well, for a quick trial using trunk, I had to remove the
> > > > UnicodeNormalizationFactory, is that yours?
> > > >
> > > > But with that removed, I get the results you do, ASSUMING that
> > you've
> > > > set
> > > > your default operator to AND in schema.xml...
> > > >
> > > > Believe it or not, it all changes and all your queries return a hit
> > if
> > > > you
> > > > do one of two things (I did this in both index and query when
> > testing
> > > > 'cause
> > > > I'm lazy):
> > > > 1> move the inclusion of the StopFilterFactory after
> > > > WordDelimiterFactory
> > > > or
> > > > 2> for StopFilterFactory, set enablePositionIncrements="false"
> > > >
> > > > I think either of these might work in your situation...
> > > >
> > > > On doing some more investigation, it appears that if a hyphenated
> > word
> > > > is
> > > > immediately after a stopword AND the above is true (stop factory
> > > > included
> > > > before WordDelimiterFactory and enablePositionIncrements="true"),
> > then
> > > > the
> > > > search fails. I indexed this title:
> > > >
> > > > Love-customs in eighteenth-century Spain for nineteenth-century
> > > >
> > > > Searching in solr/admin/form.jsp for:
> > > > title:(nineteenth-century)
> > > >
> > > > fails. But if I remove the "for" from the title, the above query
> > works.
> > > > Searching for
> > > > title:(love-customs)
> > > > always works.
> > > >
> > > > Finally, (and it's *really* time to go to sleep now), just setting
> > > > enablePositionIncrements="false" in the "index" portion of the
> > schema
> > > > also
> > > > causes things to work.
> > > >
> > > > Developer folks:
> > > > I didn't see anything in a quick look in SOLR or Lucene JIRAs,
> > should I
> > > > refine this a bit (really, sleepy time is near) and add a JIRA?
> > > >
> > > > Best
> > > > Erick
> > > >
> > > > On Wed, Apr 7, 2010 at 10:29 AM, Demian Katz
> > > > wrote:
> > > >
> > > > > Hello.  It has been a few weeks, and I haven't gotten any
> > responses.
> > > > >  Perhaps my question is too complicated -- maybe a better
> > approach is
> > > > to try
> > > > > to gain enough knowledge to answer it myself.  My gut feeling is
> > > > still that
> > > > > it's something to do with the way term positions are getting
> > handled
> > > > by the
> > > > > WordDelimiterFilterFactory, but I don't have a good understanding
> > of
> > > >

Re: "json.nl=arrarr" does not work with "facet.date"

2010-04-09 Thread fabritw

Apologies for the second post, I noticed the "json.nl=arrarr" does work with
"facet.field" but not with "facet.date"?

Is there a separate parameter required for "facet.date" to make it display
as an array?

Any help is much appreciated, Will


{
 "responseHeader":{
  "status":0,
  "QTime":2,
  "params":{
"facet.date.start":"NOW/YEAR-5YEARS",
"facet":"true",
"indent":"yes",
"facet.limit":"5",
"facet.date":"date",
"json.nl":"arrarr",
"wt":"json",
"rows":"0",
"q":"*:*",
"facet.field":"date",
"facet.date.gap":"+1YEAR",
"facet.date.end":"NOW"}},
 "response":{"numFound":1265,"start":0,"docs":[]
 },
 "facet_counts":{
  "facet_queries":{},
  "facet_fields":{
"date":
[
 ["2010-01-19T00:00:00Z",63],
 ["2010-01-20T00:00:00Z",61],
 ["2010-01-29T00:00:00Z",60],
 ["2010-01-25T00:00:00Z",56],
 ["2010-01-21T00:00:00Z",55]]},
  "facet_dates":{
"date":{
 "2005-01-01T00:00:00Z":0,
 "2006-01-01T00:00:00Z":0,
 "2007-01-01T00:00:00Z":0,
 "2008-01-01T00:00:00Z":0,
 "2009-01-01T00:00:00Z":2,
 "2010-01-01T00:00:00Z":1263,
 "gap":"+1YEAR",
 "end":"2011-01-01T00:00:00Z"
-- 
View this message in context: 
http://n3.nabble.com/json-nl-arrarr-does-not-work-with-facet-date-tp708730p708800.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: "json.nl=arrarr" does not work with "facet.date"

2010-04-09 Thread Yonik Seeley
On Fri, Apr 9, 2010 at 1:04 PM, fabritw  wrote:
>
> Apologies for the second post, I noticed the "json.nl=arrarr" does work with
> "facet.field" but not with "facet.date"?

Hmmm, this is because date faceting uses a SimpleOrderedMap instead of
a NamedList (implying that access-like-a-map is more important than
the order of the elements).

If order is more important here, then it should have been a NamedList.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague


> Is there a separate parameter required for "facet.date" to make it display
> as an array?
>
> Any help is much appreciated, Will
>
>
> {
>  "responseHeader":{
>  "status":0,
>  "QTime":2,
>  "params":{
>        "facet.date.start":"NOW/YEAR-5YEARS",
>        "facet":"true",
>        "indent":"yes",
>        "facet.limit":"5",
>        "facet.date":"date",
>        "json.nl":"arrarr",
>        "wt":"json",
>        "rows":"0",
>        "q":"*:*",
>        "facet.field":"date",
>        "facet.date.gap":"+1YEAR",
>        "facet.date.end":"NOW"}},
>  "response":{"numFound":1265,"start":0,"docs":[]
>  },
>  "facet_counts":{
>  "facet_queries":{},
>  "facet_fields":{
>        "date":
>        [
>         ["2010-01-19T00:00:00Z",63],
>         ["2010-01-20T00:00:00Z",61],
>         ["2010-01-29T00:00:00Z",60],
>         ["2010-01-25T00:00:00Z",56],
>         ["2010-01-21T00:00:00Z",55]]},
>  "facet_dates":{
>        "date":{
>         "2005-01-01T00:00:00Z":0,
>         "2006-01-01T00:00:00Z":0,
>         "2007-01-01T00:00:00Z":0,
>         "2008-01-01T00:00:00Z":0,
>         "2009-01-01T00:00:00Z":2,
>         "2010-01-01T00:00:00Z":1263,
>         "gap":"+1YEAR",
>         "end":"2011-01-01T00:00:00Z"
> --
> View this message in context: 
> http://n3.nabble.com/json-nl-arrarr-does-not-work-with-facet-date-tp708730p708800.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: "json.nl=arrarr" does not work with "facet.date"

2010-04-09 Thread fabritw


Yonik Seeley-2-2 wrote:
> 
> If order is more important here, then it should have been a NamedList.
> 

Hi Yonik, thanks for your quick reply!

Unfortunately I cannot use the NamedList  as I need to use the dateField
parameters in my query also.

I am trying to compile a list of facets, displaying each year and a
corresponding count of matches.

(i.e. "2010-01-01T00:00:00Z":1263, )

I need to parse through this list with javascript so would like to set the
output to an array if possible?

- Will
-- 
View this message in context: 
http://n3.nabble.com/json-nl-arrarr-does-not-work-with-facet-date-tp708730p708877.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Minimum Should Match the other way round

2010-04-09 Thread MitchK

I have searched for a tutorial in Lucene - instead of Solr itself - and I've
found something on lucenetutorials.com:

 String querystr = args.length > 0 ? args[0] : "lucene";

// the "title" arg specifies the default field to use
// when no field is explicitly specified in the query.
Query q = new QueryParser(
Version.LUCENE_CURRENT, "title", analyzer).parse(querystr);


If I am right, than I can call getClauses() or clauses() to my booleanQuery
object of my targetField and I can get the number of clauses from the
returned result. 
Does this number already consider the number of clauses (or what I really
mean: token) after the analyzer has worked on them?

It would be really nice to feel certain of that.

Kind regards
- Mitch
-- 
View this message in context: 
http://n3.nabble.com/Minimum-Should-Match-the-other-way-round-tp694867p708945.html
Sent from the Solr - User mailing list archive at Nabble.com.


Questions about Solr

2010-04-09 Thread noel
Hi, I would like to know the answer to the following:

- How am I able to use wildcard searches with Solr? EX: querying Ado with a 
result that would retrieve something like Adolescent.

- Phrase searches with stop words completely ruin the query and finds no 
results. How can I query something like "To be or not to be" with stop words 
enabled?

- I use synonyms for certain keywords. However, when I search for a specific 
phrase which does contain synonyms, results with the synonyms rank higher than 
the ones that have the exact term. How can that be fixed?

Thanks,
Noel



Re: Questions about Solr

2010-04-09 Thread Smiley, David W.
If the user query is not going to have wildcards then use NGrams.  I talk about 
the black art of ngrams in my book.  There are multiple ways of configuring it. 
 If the query will have wildcards, Solr comes with a sample schema with a field 
type named, "text_rev" (I think that's what it's named) which supports wildcard 
searches such as "ado*".  You could add the wildcard if it's not there.  I've 
done this sort of thing with various boosting to get exact matches scored 
higher.

For doing wildcards in a query string against NGram indexes, you'll have to 
wait till I am granted permission by my employer to open-source this (~2 
months).

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

On Apr 9, 2010, at 2:42 PM,   wrote:

> - How am I able to use wildcard searches with Solr? EX: querying Ado with a 
> result that would retrieve something like Adolescent.









Re: StreamingUpdateSolrServer hangs

2010-04-09 Thread Yonik Seeley
Stephen, were you running stock Solr 1.4, or did you apply any of the
SolrJ patches?
I'm trying to figure out if anyone still has any problems, or if this
was fixed with SOLR-1711:

* SOLR-1711: SolrJ - StreamingUpdateSolrServer had a race condition that
  could halt the streaming of documents. (Attila Babo via yonik)

Also note that people may want this patch if dealing with i18n:

* SOLR-1595: StreamingUpdateSolrServer used the platform default character
  set when streaming updates, rather than using UTF-8 as the HTTP headers
  indicated, leading to an encoding mismatch. (hossman, yonik)


-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague



On Fri, Feb 5, 2010 at 3:20 PM, Stephen Meyer  wrote:
> I am trying to use the StreamingUpdateSolrServer to index a bunch of
> bibliographic data and it is hanging up every time I run it. Sometimes it
> hangs after about 100k records (after about 2 minutes), sometimes after 4M
> records (after about 80 minutes) and all different intervals in between. It
> appears to be the same issue described here:
>
> https://issues.apache.org/jira/browse/SOLR-1543
>
> The thread dump (included below) seems to indicate that a lock isn't being
> released because somewhere in the thread chain after adding a
> SolrInputDocument.
>
> Is there some kind of Solr equivalent to closing a session like you do in an
> ORM like Hibernate?
>
> Thanks,
> -Steve
> --
> Stephen Meyer
> Library Application Developer
> UW-Madison Libraries
> 312F Memorial Library
> 728 State St.
> Madison, WI 53706
>
> sme...@library.wisc.edu
> 608-265-2844 (ph)
>
>
> "Just don't let the human factor fail to be a factor at all."
> - Andrew Bird, "Tables and Chairs"
>
> Full thread dump Java HotSpot(TM) Client VM (1.5.0_22-147 mixed mode):
>
> "pool-1-thread-6" prio=5 tid=0x00d26d50 nid=0x1043c00 in Object.wait()
> [0xb0e0d000..0xb0e0dd90]
>        at java.lang.Object.wait(Native Method)
>        - waiting on <0x0bbe29f8> (a
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
>        at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)
>        - locked <0x0bbe29f8> (a
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
>        at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
>        at
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
>        at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
>        at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
>        at
> org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner.run(StreamingUpdateSolrServer.java:153)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
>        at java.lang.Thread.run(Thread.java:613)
>
> "pool-1-thread-5" prio=5 tid=0x00d11530 nid=0x1042e00 in Object.wait()
> [0xb0d8c000..0xb0d8cd90]
>        at java.lang.Object.wait(Native Method)
>        - waiting on <0x0bbe29f8> (a
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
>        at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)
>        - locked <0x0bbe29f8> (a
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
>        at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
>        at
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
>        at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
>        at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
>        at
> org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner.run(StreamingUpdateSolrServer.java:153)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
>        at java.lang.Thread.run(Thread.java:613)
>
> "MultiThreadedHttpConnectionManager cleanup" daemon prio=5 tid=0x00d13630
> nid=0x10fba00 in Object.wait() [0xb0d0b000..0xb0d0bd90]
>        at java.lang.Object.wait(Native Method)
>        - waiting on <0x0bbb0270> (a java.lang.ref.ReferenceQueue$Lock)
>        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:120)
>        - locked <0x0bbb0270> (a java.lang.ref.ReferenceQueue$Lock)
>        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:136)
>        at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(

Re: use a solr-built index with lucene?

2010-04-09 Thread Lance Norskog
Are the Trie types in Lucene 2.9.2?

Otherwise, be sure to use the old int (or sint?) types in your schema.

On Fri, Apr 9, 2010 at 4:12 AM, Erik Hatcher  wrote:
> Oh, sorry, I got the direction backwards in my initial reply.
>
> Yes, of course you can use an index from Solr with Lucene directly.  It's
> just a Lucene index.  Just make sure you use the same version of Lucene
> (pull the JARs from solr.war, I'd say).  For example, you can open a "Solr
> index" with Luke.
>
> If you're using a Lucene app against a live Solr index, be careful with
> locking (in short, the default lock setting in solrconfig.xml isn't set for
> sharing the index between two processes).
>
>        Erik
>
>
> On Apr 9, 2010, at 3:18 AM, Tommy Chheng wrote:
>
>> I was thinking of the reverse case: from solr to lucene. lucene doesn't
>> use a schema.xml
>>
>> Tommy Chheng
>> Programmer and UC Irvine Graduate Student
>> Twitter @tommychheng
>> http://tommy.chheng.com
>>
>>
>> On 4/9/10 12:15 AM, Paul Libbrecht wrote:
>>>
>>> This looks like an interesting avenue for a smooth transition from lucene
>>> to solr.
>>>
>>> thanks for more hints you find around.
>>> (e.g. maybe it is not too hard to pre-generate a schema.xml from an
>>> actual index for the field-types?)
>>>
>>> paul
>>>
>>>
>>> Le 09-avr.-10 à 02:32, Erik Hatcher a écrit :
>>>
 Yes... gotta jive with schema.xml though.

   Erik

 On Apr 8, 2010, at 7:18 PM, Tommy Chheng wrote:

> If i build an index with solr, is it possible to use the index folder
> with lucene?
>
> --
> Tommy Chheng
> Programmer and UC Irvine Graduate Student
> Twitter @tommychheng
> http://tommy.chheng.com
>

>>>
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: including external files in config by corename

2010-04-09 Thread Shawn Heisey

On 4/8/2010 1:15 PM, Chris Hostetter wrote:

...i suspect you want something like...

   

where handlers.xml looks like...









   


The xpointer you mentioned above didn't work.  I finally found something 
that did, though:


href="/index/solr/config/requestHandlers.xml#xpointer(/*/node())" />


I wouldn't have found this without your help.  A thousand thanks.

Shawn



OOM while indexing with Tika

2010-04-09 Thread Lance Norskog
There is a low-level memory "leak" (really an unfortunate retention)
in Lucene which can cause OOMs when using the Tika tools on large
files like PDF.
A patch will be in the trunk sometime soon.

http://markmail.org/thread/lhr7wodw4ctsekik

https://issues.apache.org/jira/browse/LUCENE-2387

-- 
Lance Norskog
goks...@gmail.com


Solr date "NOW" - format?

2010-04-09 Thread Shawn Heisey
I've been trying to work out how SOLR thinks about dates internally so I 
can boost newer documents.  My post_date field is stored as seconds 
since the epoch, so I think the following is probably what I want.  I 
used 3.17 instead of the 3.16 in all the examples because my own math 
suggests that's a more accurate number:


recip(ms(NOW,product(post_date,1000)),3.17e-11,1,1)

Reading the solr 1.4 book, I am not very clear on how to configure qf, 
bf, and pf in the dismax requestHandler, specifically in regards to 
using the function above in conjunction with the field-based boosts that 
I want to try.  Is there a place I can go to find some better examples, 
and find out what all the other fields in the example config do, such as mm?


Thanks,
Shawn



Re: Solr date "NOW" - format?

2010-04-09 Thread Lance Norskog
The example function seems to round time to years, so you're boosting by year?

Your dates are stored as UTC 64-bit longs counting the number of
milliseconds since Jan 1, 1970. That's it. They're in milliseconds
whether you supplied them that way or not. So I think the example is
what you want.

Function queries are notoriously slow. Another way to boost by year is
with range queries:
[NOW-6MONTHS TO NOW]^5.0 ,
[NOW-1YEARS TO NOW-6MONTHS]^3.0
[NOW-2YEARS TO NOW-1YEARS]^2.0
[* TO NOW-2YEARS]^1.0

Notice that you get to have a non-linear curve when you select the
ranges by hand.

On Fri, Apr 9, 2010 at 4:32 PM, Shawn Heisey  wrote:
> I've been trying to work out how SOLR thinks about dates internally so I can
> boost newer documents.  My post_date field is stored as seconds since the
> epoch, so I think the following is probably what I want.  I used 3.17
> instead of the 3.16 in all the examples because my own math suggests that's
> a more accurate number:
>
> recip(ms(NOW,product(post_date,1000)),3.17e-11,1,1)
>
> Reading the solr 1.4 book, I am not very clear on how to configure qf, bf,
> and pf in the dismax requestHandler, specifically in regards to using the
> function above in conjunction with the field-based boosts that I want to
> try.  Is there a place I can go to find some better examples, and find out
> what all the other fields in the example config do, such as mm?
>
> Thanks,
> Shawn
>
>



-- 
Lance Norskog
goks...@gmail.com


Benchmarking Solr

2010-04-09 Thread Blargy

I am about to deploy Solr into our production environment and I would like to
do some benchmarking to determine how many slaves I will need to set up.
Currently the only way I know how to benchmark is to use Apache Benchmark
but I would like to be able to send random requests to the Solr... not just
one request over and over.

I have a sample data set of 5000 user entered queries and I would like to be
able to use AB to benchmark against all these random queries. Is this
possible?

FYI our current index is ~1.5 gigs with ~5m documents and we will be using
faceting quite extensively. Are average requests per/day is ~2m. We will be
running RHEL with about 8-12g ram. Any idea how many slaves might be
required to handle our load?

Thanks
-- 
View this message in context: 
http://n3.nabble.com/Benchmarking-Solr-tp709561p709561.html
Sent from the Solr - User mailing list archive at Nabble.com.