Re: Solr Fields Multilingue

2014-06-30 Thread benjelloun
Hello again,

I have a document which have 3 different language text(arabic, english,
frensh).
i have just this result:

"language_s": [
  "en"
]

thanks for help,
Best regards,
Anass BENJELLOUN



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Fields-Multilingue-tp4144223p4144708.html
Sent from the Solr - User mailing list archive at Nabble.com.


[ANNOUNCE] Luke 4.9.0 released

2014-06-30 Thread Dmitry Kan
Hello,

Luke 4.9.0 has been released. Download it here:

https://github.com/DmitryKey/luke/releases/tag/4.9.0

The release has been tested against the solr-4.9.0 index.

Most of the changes are in the org.getopt.luke.plugins.FsDirectory.java
class, thus concern Lucene over Hadoop users.

Remember to pass the following JVM parameter when starting luke:

java -XX:MaxPermSize=512m -jar luke-with-deps.jar

or alternatively, use luke.bat or luke.sh to launch luke from the command
line.

Enjoy,

Dmitry Kan
-- 
Dmitry Kan
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Tomcat or Jetty to use with solr in production ?

2014-06-30 Thread gurunath
Hi,

Confused with lot of reviews on Jetty and tomcat along with solr 4.7 ?, Is
there any better option for production. want to know the complexity's with
tomcat and jetty in future, as i want to cluster with huge data on solr.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tomcat-or-Jetty-to-use-with-solr-in-production-tp4144712.html
Sent from the Solr - User mailing list archive at Nabble.com.


Integrating solr with Hadoop

2014-06-30 Thread gurunath
Hi,

I want to setup solr in production, Initially the data set i am using is of
small scale, the size of data will grow gradually. I have heard about using
"*Big Data Work for Hadoop and Solr*", Is this a better option for large
data or better to go ahead with tomcat or jetty server with solr.

Thanks




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Integrating-solr-with-Hadoop-tp4144715.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search results not as expected.

2014-06-30 Thread Modassar Ather
Thanks for the details Chris.

Regards,
Modassar


On Fri, Jun 27, 2014 at 3:33 AM, Chris Hostetter 
wrote:

>
> : *ab:(system entity) OR ab:authorization* : Number of results returned 2
> : which is not expected.
> : It seems this query makes the previous terms as OR if the next term is
> : introduced by an OR.
>
> in general, that's they way the "boolean" operators like AND/OR work in
> all of the various parser variants that use that syntax...
>
> http://searchhub.org//2011/12/28/why-not-and-or-and-not/
>
> ...if you want only one clause to be required, and one to be optional,
> then you need to use the prefix notation and leave the default q.op=OR
> (ie: by default, clauses are SHOULD -- since there is no prefix operator
> for that)
>
>   +ab:(system entity  ab:authorization
>
> : For the reference mm (Minimum 'Should' match) is set to 100% and parser
> : used is edismax.
>
> in the specific case of edismax, the fact that mm is ignored when
> operators are specified is a long standing issue with much debate as to
> what the "correct" behavior should be...
>
> https://issues.apache.org/jira/browse/SOLR-2649
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: how to log ngroups

2014-06-30 Thread Aman Tandon
Hi Umesh,

Thanks alot this might help me.

With Regards
Aman Tandon


On Mon, Jun 30, 2014 at 11:34 AM, Umesh Prasad  wrote:

> Hi Aman,
> You can implement and register a last-component which extracts the
> ngroups from response and adds it to response.
> You can checkout tutorial about SearchComponent here
> <
> http://sujitpal.blogspot.in/2011/04/custom-solr-search-components-2-dev.html
> >
> ..
>
>
>
>
>
> On 29 June 2014 20:31, Aman Tandon  wrote:
>
> > Any help here?
> >
> > With Regards
> > Aman Tandon
> >
> >
> > On Thu, Jun 26, 2014 at 7:32 PM, Aman Tandon 
> > wrote:
> >
> > > Hi,
> > >
> > > I am grouping in my results and also applying the group limit. Is there
> > is
> > > any way to log the ngroups as well along with hits.
> > >
> >
>
>
>
> --
> ---
> Thanks & Regards
> Umesh Prasad
>


solr dedup on specific fields

2014-06-30 Thread Ali Nazemian
Hi,
I used solr 4.8 for indexing the web pages that come from nutch. I know
that solr deduplication operation works on uniquekey field. So I set that
to URL field. Everything is OK. except that I want after duplication
detection solr try not to delete all fields of old document. I want some
fields remain unchanged. For example assume I have a data field called
"read" with Boolean value "true" for specific document. I want all fields
of new document overwrites except the value of this field. Is that
possible? How?
Regards.

-- 
A.Nazemian


Re: Tomcat or Jetty to use with solr in production ?

2014-06-30 Thread Ahmet Arslan
Hi,

solr test cases use embedded jetty therefore jetty is the recommended one.

Ahmet



On Monday, June 30, 2014 12:08 PM, gurunath  wrote:
Hi,

Confused with lot of reviews on Jetty and tomcat along with solr 4.7 ?, Is
there any better option for production. want to know the complexity's with
tomcat and jetty in future, as i want to cluster with huge data on solr.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tomcat-or-Jetty-to-use-with-solr-in-production-tp4144712.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Tomcat or Jetty to use with solr in production ?

2014-06-30 Thread Otis Gospodnetic
Hi Gurunath,

In 90% of our engagements with various Solr customers we see Jetty, which
we also recommend and use ourselves for Solr + our own services and
products.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/



On Mon, Jun 30, 2014 at 5:07 AM, gurunath  wrote:

> Hi,
>
> Confused with lot of reviews on Jetty and tomcat along with solr 4.7 ?, Is
> there any better option for production. want to know the complexity's with
> tomcat and jetty in future, as i want to cluster with huge data on solr.
>
> Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Tomcat-or-Jetty-to-use-with-solr-in-production-tp4144712.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


How do I use multiple boost functions?

2014-06-30 Thread Bhoomit Vasani
Hello,

I want to boost using multiple functions.

e.g.
{!boost b=recip(geodist(destination,1.293841,103.846487),1,1000,1000)
b="if(exists(query({!v=$b1})),100,0)"
}

when I use above query Solr only considers second function.

-- 
-- 
Thanks & Regards,
Bhoomit Vasani | SE @ Mygola
WE are LIVE !
91-8892949849


Re: How do I use multiple boost functions?

2014-06-30 Thread Jack Krupansky
Do you want them to be additive or multiplicative? Just add or multiply them 
yourself with the "add"/"sum" or "mul"/"product" functions.


See:
https://cwiki.apache.org/confluence/display/solr/Function+Queries

If you are using the dismax or edismax query parsers you can also use 
separate request parameters for each boost.


See:
https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser

-- Jack Krupansky

-Original Message- 
From: Bhoomit Vasani

Sent: Monday, June 30, 2014 7:30 AM
To: solr-user@lucene.apache.org
Subject: How do I use multiple boost functions?

Hello,

I want to boost using multiple functions.

e.g.
{!boost b=recip(geodist(destination,1.293841,103.846487),1,1000,1000)
b="if(exists(query({!v=$b1})),100,0)"
}

when I use above query Solr only considers second function.

--
--
Thanks & Regards,
Bhoomit Vasani | SE @ Mygola
WE are LIVE !
91-8892949849 



Re: How do I use multiple boost functions?

2014-06-30 Thread Ahmet Arslan
Hi,

Use edismax query parser. boost parameter can take multiple values.

&boost=recip(geodist(destination,1.293841,103.846487),1,1000,1000)
&boost=if(exists(query({!v=$b1})),100,0)




On Monday, June 30, 2014 2:30 PM, Bhoomit Vasani  wrote:
Hello,

I want to boost using multiple functions.

e.g.
{!boost b=recip(geodist(destination,1.293841,103.846487),1,1000,1000)
b="if(exists(query({!v=$b1})),100,0)"
}

when I use above query Solr only considers second function.

-- 
-- 
Thanks & Regards,
Bhoomit Vasani | SE @ Mygola
WE are LIVE !
91-8892949849



RE: SlowFuzzySearch

2014-06-30 Thread Allison, Timothy B.
I've been away from parsers for a bit, but you should be able to subclass a 
getFuzzyQuery() (or similar) call fairly easily.

Again, last time I looked, it used the automaton (fast) for <=2 and backed off 
to truly slow for > 2.  Note that transposition is only operational for the 
automaton, not yet for the SlowFuzzyQuery.

Might want to take a look at LUCENE-5205 and SOLR-5410.  Those 
offer a parser that uses SlowFuzzyQuery for exactly your use 
case.

The recommended solution for handling fuzziness > 2 (I think), though, is to 
use character ngrams as in the SpellChecker.

Best,

   Tim

-Original Message-
From: Michael Tobias [mailto:mich...@tobias.org.uk] 
Sent: Sunday, June 29, 2014 8:17 PM
To: solr-user@lucene.apache.org
Subject: SlowFuzzySearch

Hi guys

I know that Solr now has a fast Fuzzy Search capability for levenshtein 
distances of up to 2, but I would like to use distances of 3 or 4 (up to half 
the word length if possible).

I have been told it is possible to use an older fuzzy search version called 
SlowFuzzyQuery but I am not sure how to use it.  I realise it will be slow(er) 
but my database will be reasonably small and I would like to test out the 
performance to see if it is a feasible option.  Is it still part of the Solr 
code or must I install it separately?

Any examples of its usage? And for distances of 2 or less does it actually 
perform a fast fuzzy search or must I revert to using the ~ syntax for those 
faster fuzzy searches?

All help appreciated.

Michael



RE: Multiterm analysis in complexphrase query

2014-06-30 Thread Allison, Timothy B.
Ahmet, please correct me if I'm wrong, but the ComplexPhraseQueryParser does 
not perform analysis (as you, Michael, point out).  The SpanQueryParser in 
LUCENE-5205 does perform analysis and might meet your needs.  Work on it has 
gone on pause, though, so you'll have to build from the patch or the 
LUCENE-5205 branch.  Let me know if you have any questions.

LUCENE-5470 and LUCENE-5504 would move multiterm analysis farther down and make 
it available to all parsers that use QueryParserBase, including the 
ComplexPhraseQueryParser.

Best,

Tim

-Original Message-
From: Michael Ryan [mailto:mr...@moreover.com] 
Sent: Sunday, June 29, 2014 11:09 AM
To: solr-user@lucene.apache.org
Subject: Multiterm analysis in complexphrase query

I've been using a modified version of the complex phrase query parser patch 
from https://issues.apache.org/jira/browse/SOLR-1604 in Solr 3.6, and I'm 
currently upgrading to 4.9, which has this built-in.

I'm having trouble with using accents in wildcard queries, support for which 
was added in https://issues.apache.org/jira/browse/SOLR-2438. In 3.6, I was 
using a modified version of SolrQueryParser, which simply used 
ComplexPhraseQueryParser in place of QueryParser. In the version of 
ComplexPhraseQParserPlugin in 4.9, it just directly uses 
ComplexPhraseQueryParser, and doesn't go through SolrQueryParser at all. 
SolrQueryParserBase.analyzeIfMultitermTermText() is where the multiterm 
analysis magic happens.

So, my problem is that ComplexPhraseQParserPlugin/ComplexPhraseQueryParser 
doesn't use SolrQueryParserBase, which breaks doing fun things like this:
{!complexPhrase}"barac* óba*a"
And expecting it to match "Barack Obama".

Anyone run into this before, or have a way to get this working?

-Michael


Re: How do I use multiple boost functions?

2014-06-30 Thread Bhoomit Vasani
Thanks, I tried this but finally bf(additive boost) param worked well for
me.

Thanks for the help :)


On Mon, Jun 30, 2014 at 5:14 PM, Ahmet Arslan 
wrote:

> Hi,
>
> Use edismax query parser. boost parameter can take multiple values.
>
> &boost=recip(geodist(destination,1.293841,103.846487),1,1000,1000)
> &boost=if(exists(query({!v=$b1})),100,0)
>
>
>
>
> On Monday, June 30, 2014 2:30 PM, Bhoomit Vasani 
> wrote:
> Hello,
>
> I want to boost using multiple functions.
>
> e.g.
> {!boost b=recip(geodist(destination,1.293841,103.846487),1,1000,1000)
> b="if(exists(query({!v=$b1})),100,0)"
> }
>
> when I use above query Solr only considers second function.
>
> --
> --
> Thanks & Regards,
> Bhoomit Vasani | SE @ Mygola
> WE are LIVE !
> 91-8892949849
>
>


-- 
-- 
Thanks & Regards,
Bhoomit Vasani | SE @ Mygola
WE are LIVE !
91-8892949849


Re: How do I use multiple boost functions?

2014-06-30 Thread Bhoomit Vasani
It turns out that the solution I was looking to is additive boost.

Thanks for the help :)


On Mon, Jun 30, 2014 at 5:14 PM, Jack Krupansky 
wrote:

> Do you want them to be additive or multiplicative? Just add or multiply
> them yourself with the "add"/"sum" or "mul"/"product" functions.
>
> See:
> https://cwiki.apache.org/confluence/display/solr/Function+Queries
>
> If you are using the dismax or edismax query parsers you can also use
> separate request parameters for each boost.
>
> See:
> https://cwiki.apache.org/confluence/display/solr/The+
> Extended+DisMax+Query+Parser
>
> -- Jack Krupansky
>
> -Original Message- From: Bhoomit Vasani
> Sent: Monday, June 30, 2014 7:30 AM
> To: solr-user@lucene.apache.org
> Subject: How do I use multiple boost functions?
>
>
> Hello,
>
> I want to boost using multiple functions.
>
> e.g.
> {!boost b=recip(geodist(destination,1.293841,103.846487),1,1000,1000)
> b="if(exists(query({!v=$b1})),100,0)"
> }
>
> when I use above query Solr only considers second function.
>
> --
> --
> Thanks & Regards,
> Bhoomit Vasani | SE @ Mygola
> WE are LIVE !
> 91-8892949849
>



-- 
-- 
Thanks & Regards,
Bhoomit Vasani | SE @ Mygola
WE are LIVE !
91-8892949849


Re: Solr Fields Multilingue

2014-06-30 Thread Erick Erickson
First, please open a new thread rather than reply
to an old one, see http://people.apache.org/~hossman/#threadhijack

Second, you haven't explained what it is you
need to have happen or what you expect.

As far as I know, the language detection code
tries to identify _the_ language and picks one,
I don't think it tries to detect all the languages
in a given document, just tries to pick the
"best".

Erick

On Mon, Jun 30, 2014 at 12:35 AM, benjelloun  wrote:
> Hello again,
>
> I have a document which have 3 different language text(arabic, english,
> frensh).
> i have just this result:
>
> "language_s": [
>   "en"
> ]
>
> thanks for help,
> Best regards,
> Anass BENJELLOUN
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Fields-Multilingue-tp4144223p4144708.html
> Sent from the Solr - User mailing list archive at Nabble.com.


solrcloud "indexing completed" event

2014-06-30 Thread Giovanni Bricconi
Hello

I have one application that queries solr; when the index version changes
this application has to redo some tasks.

Since I have more than one solr server, I would like to start these tasks
when all solr nodes are synchronized.

With master/slave configuration the application simply watched
http://myhost:8080/solr/admin/cores?action=STATUS&core=0bis
on each solr node and checked that the commit time msec was equal. When the
time changes and becomes equal on all the nodes the replication is complete
and it is safe to restart the tasks.

Now I would like to switch to a solrcloud configuration, splitting the core
0bis in 3 shards, with 2 replicas for each shard.

After refeeding the collection I tried the same approach calling

http://myhost:8080/solr/admin/cores?action=STATUS&core=0bis_shard3_replica2

for each core of the collection, but with suprise I have found that on the
same stripe the version of the index, the number of segments and even the
commit time msec was different!!

I was thinking that it was possible to check some parameter on each
stripe's core to check that everithing was up to date, but this does not
seem to be true.

Is it possible somehow to capture the "commit done on every core of the
collection" event?

Thank you

Giovanni


Re: Solr Fields Multilingue

2014-06-30 Thread benjelloun
Hello,

Ok thanks,
i have another question :)

here is my schema:









is this correct? because when i index documents then search on this field
"AllChamp" that don't do analyzer and filter. any idea?

Exemple: 

I search for : AllChamp:presenton  --> num result=0
   AllChamp:présenton  --> num result=1

thanks for help,
best regards,
Anass BENJELLOUN





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Fields-Multilingue-tp4144223p4144781.html
Sent from the Solr - User mailing list archive at Nabble.com.


unable to start solr instance

2014-06-30 Thread Niklas Langvig
Hello,
We havet o solr instances running on linux/tomcat7
Both have been working fine, now only 1 works. The other seems to have crashed 
or something.

SolrCore Initialization Failures
* collection1: 
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
Error initializing QueryElevationComponent.

We havn't changed anything in the setup.

Earlier 4 days ago I could see in the logs

5000java.io.FileNotFoundException: 
/opt/solr410/document/collection1/data/tlog/tlog.2494137 (Too many 
open files)org.apache.solr.common.SolrException: 
java.io.FileNotFoundException: 
/opt/solr410/document/collection1/data/tlog/tlog.2494137 (Too many 
open files)
 at 
org.apache.solr.update.TransactionLog.(TransactionLog.java:182)
 at 
org.apache.solr.update.TransactionLog.(TransactionLog.java:140)
 at 
org.apache.solr.update.UpdateLog.ensureLog(UpdateLog.java:796)
 at 
org.apache.solr.update.UpdateLog.delete(UpdateLog.java:409)
 at 
org.apache.solr.update.DirectUpdateHandler2.delete(DirectUpdateHandler2.java:284)
 at 
org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:77)
 at 
org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
 at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalDelete(DistributedUpdateProcessor.java:460)
 at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(DistributedUpdateProcessor.java:1036)
 at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:721)
 at 
org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
 at 
org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
 at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
 at 
org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
 at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at 
org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
 at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269)
 at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
 at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
 at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
 at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
 at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.FileNotFoundException: 
/opt/solr410/document/collection1/data/tlog/tlog.2494137 (Too many 
open files)
 at java.io.RandomAccessFile.open(Native Method)
 at 
java.io.RandomAccessFile.(RandomAccessFile.java:233)
 at 
org.apache.solr.update.TransactionLog.(TransactionLog.java:151)
 ... 31 more
500
* 

we tried to restart solr and now we get the error collection1: 
org.apache.solr.common.SolrException:org.apache.solr.common.SolrExcep

Re: Solr Fields Multilingue

2014-06-30 Thread Uwe Reh

Am 30.06.2014 16:57, schrieb benjelloun:

"AllChamp" that don't do analyzer and filter. any idea?

Exemple:
I search for : AllChamp:presenton  --> num result=0
AllChamp:présenton  --> num result=1


Hi Anass,

any analyzer means any modification (no ICU-Normalisation).
"copyField" copys just the raw input not the processed tokens from the 
source field(s). Maybe that's your misconception.


Uwe



Re: Solr Fields Multilingue

2014-06-30 Thread benjelloun
hey,

Thats true i know it was that but any idea of how can i resolve that ?

best regards 
Anass BENJELLOUN



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Fields-Multilingue-tp4144223p4144790.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: unable to start solr instance

2014-06-30 Thread Markus Jelsma
(Too many open files)

Try raising the limit from probably 1024 to 4k-16k orso.
 
 
-Original message-
> From:Niklas Langvig 
> Sent: Monday 30th June 2014 17:09
> To: solr-user@lucene.apache.org
> Subject: unable to start solr instance
> 
> Hello,
> We havet o solr instances running on linux/tomcat7
> Both have been working fine, now only 1 works. The other seems to have 
> crashed or something.
> 
> SolrCore Initialization Failures
> * collection1: 
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
> Error initializing QueryElevationComponent.
> 
> We havn't changed anything in the setup.
> 
> Earlier 4 days ago I could see in the logs
> 
> 500 name="QTime">0 name="msg">java.io.FileNotFoundException: 
> /opt/solr410/document/collection1/data/tlog/tlog.2494137 (Too 
> many open files)org.apache.solr.common.SolrException: 
> java.io.FileNotFoundException: 
> /opt/solr410/document/collection1/data/tlog/tlog.2494137 (Too 
> many open files)
>  at 
> org.apache.solr.update.TransactionLog.(TransactionLog.java:182)
>  at 
> org.apache.solr.update.TransactionLog.(TransactionLog.java:140)
>  at 
> org.apache.solr.update.UpdateLog.ensureLog(UpdateLog.java:796)
>  at 
> org.apache.solr.update.UpdateLog.delete(UpdateLog.java:409)
>  at 
> org.apache.solr.update.DirectUpdateHandler2.delete(DirectUpdateHandler2.java:284)
>  at 
> org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:77)
>  at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
>  at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalDelete(DistributedUpdateProcessor.java:460)
>  at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(DistributedUpdateProcessor.java:1036)
>  at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:721)
>  at 
> org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
>  at 
> org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
>  at 
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
>  at 
> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
>  at 
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>  at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>  at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>  at 
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
>  at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448)
>  at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269)
>  at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>  at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>  at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>  at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>  at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>  at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
>  at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>  at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
>  at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
>  at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
>  at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>  at java.lang.Thread.run(Thread.java:722)
> Caused by: java.io.FileNotFoundException: 
> /opt/solr410/document/collection1/data/tlog/tlog.2494137 (Too 
> many open files)
>   

MultiCollection AddCore fails

2014-06-30 Thread cpalm
Hi,
I have a maintenance use case where there are 2 collections defined, and I
need to do an remove core on one of the collections, and then be able to add
core that collection back in.

I can successfully remove the core with the 2ndary collection, but after I
add that core/collection back in
the cloud page shows 2 shards for that collection and when it is queried,
the Exception comes back
no servers hosting shards.

Below is the steps I did to create this issue on Jetty solr 4.8.1.

Is an add core with Multiple collections supported?

The Userinterface  on the admin page of the server indicates that it is.

Thanks,
Chris
---STEPS to replicate this--

1. Copy Example folder to example folder 2

C:\solr-4.8.1\solr-4.8.1\example>java
-Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf
-DzkRun -DnumShards=2 -jar start.jar

solr-4.8.1\example2>java -Djetty.port=7574 -DzkHost=localhost:9983 -jar
start.jar

2. Add colllection via collection api.
http://localhost:8983/solr/admin/collections?action=CREATE&name=realtimecollection&collection.configName=realtimecollection&numShards=1&createNodeSet=10.29.12.109:8983_solr

3.load cloud page
realtimecollection_shard1_replica1:
org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
Specified config does not exist in ZooKeeper:realtimecollection

Please check your logs for more information

I think this is because realtimecollection hasn't been upconfiged yet, after
a server restart
this goes away. 

4.restart jetty

5. goto core admin page,click realtime collection, verify realtime
collection is present, and click unload
and verify collection is unloaded on cloud page

6. Goto core admin add core to try and add it back in with the following
settings.
name: realtimecollection_shard1_replica1
instanceDir:
C:\solr-4.8.1\solr-4.8.1\example\solr\realtimecollection_shard1_replica1\
dataDir:
C:\solr-4.8.1\solr-4.8.1\example\solr\realtimecollection_shard1_replica1\data\
config: solrconfig.xml
schema: schema.xml
collection: realtimecollection
shard:shard1

Returns success.

7. query realtime collection and get back no servers hosting shard.
When viewing on the cloud page is shows 2 shards now for realtime collection
and active status, but can't query against it





--
View this message in context: 
http://lucene.472066.n3.nabble.com/MultiCollection-AddCore-fails-tp4144795.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Fields Multilingue

2014-06-30 Thread Erick Erickson
Again, please open a new thread for new questions.
Do _not_ just reply-to then change the subject, it
stays in the same thread anyway.

Best,
Erick

On Mon, Jun 30, 2014 at 7:57 AM, benjelloun  wrote:
> Hello,
>
> Ok thanks,
> i have another question :)
>
> here is my schema:
>
>  required="false" stored="false"/>
>  required="false" multiValued="true"/>
>  required="false" multiValued="true"/> type="text_ar" indexed="true" stored="true" required="false"
> multiValued="true"/>
>
> 
> 
> 
>
> is this correct? because when i index documents then search on this field
> "AllChamp" that don't do analyzer and filter. any idea?
>
> Exemple:
>
> I search for : AllChamp:presenton  --> num result=0
>AllChamp:présenton  --> num result=1
>
> thanks for help,
> best regards,
> Anass BENJELLOUN
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Fields-Multilingue-tp4144223p4144781.html
> Sent from the Solr - User mailing list archive at Nabble.com.


NPE when using facets with the MLT handler.

2014-06-30 Thread SafeJava T
I am getting an NPE when using facets with the MLT handler.  I googled for
other npe errors with facets, but this trace looked different from the ones
I found. We are using Solr 4.9-SNAPSHOT.

I have reduced the query to the most basic form I can:

q=id:XXX&mlt.fl=mlt_field&facet=true&facet.field=id
I changed it to facet on id, to ensure that the field was present in all
results.

Any ideas on how to work around this?


java.lang.NullPointerException at
org.apache.solr.search.facet.SimpleFacets.addFacets(SimpleFacets.java:375)
at
org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:211)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1955) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:769)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368) at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:744)

Thanks,
Tom


CopyField can't copy analyzers and Filters

2014-06-30 Thread benjelloun
here is my schema: 












 when i index documents then search on this field "AllChamp" that don't do
analyzer and filter.
I know that CopyField can't copy analyzers and Filters, so how to keep
analyzer and filter on Field: "AllChamp"?

Exemple: 

I search for : AllChamp:presenton  --> num result=0 
   AllChamp:présenton  --> num result=1 

thanks for help, 
best regards, 
Anass BENJELLOUN 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/CopyField-can-t-copy-analyzers-and-Filters-tp4144803.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Fields Multilingue

2014-06-30 Thread benjelloun
Hello,

its ok i did it :)

thanks


2014-06-30 17:48 GMT+02:00 Erick Erickson [via Lucene] <
ml-node+s472066n4144798...@n3.nabble.com>:

> Again, please open a new thread for new questions.
> Do _not_ just reply-to then change the subject, it
> stays in the same thread anyway.
>
> Best,
> Erick
>
> On Mon, Jun 30, 2014 at 7:57 AM, benjelloun <[hidden email]
> > wrote:
>
> > Hello,
> >
> > Ok thanks,
> > i have another question :)
> >
> > here is my schema:
> >
> >  indexed="true"
> > required="false" stored="false"/>
> >  > required="false" multiValued="true"/>
> >  > required="false" multiValued="true"/> > type="text_ar" indexed="true" stored="true" required="false"
> > multiValued="true"/>
> >
> > 
> > 
> > 
> >
> > is this correct? because when i index documents then search on this
> field
> > "AllChamp" that don't do analyzer and filter. any idea?
> >
> > Exemple:
> >
> > I search for : AllChamp:presenton  --> num result=0
> >AllChamp:présenton  --> num result=1
> >
> > thanks for help,
> > best regards,
> > Anass BENJELLOUN
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Fields-Multilingue-tp4144223p4144781.html
>
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Solr-Fields-Multilingue-tp4144223p4144798.html
>  To unsubscribe from Solr Fields Multilingue, click here
> 
> .
> NAML
> 
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Fields-Multilingue-tp4144223p4144805.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Integrating solr with Hadoop

2014-06-30 Thread Erick Erickson
Whoa! You're confusing a couple of things I think.

The only real connection Solr <-> Hadoop _may_
be that Solr can have its indexes stored on HDFS.
Well, you can also create map/reduce jobs that
will index the data via M/R and merge them
into a live index in Solr (assuming it's storing its
indexes there).

But this question is very confused:
"Is this a better option for large data or better
to go ahead with tomcat or jetty server with solr."

No matter what, you're still running Solr
in a tomcat or Jetty server. Hadoop has
nothing to do with that. Except, as I mentioned
earlier, the actual index _may_ be stored
on HDFS if you select the right directory
implementation in your solroconfig.xml file.

So we need a better statement of what you're
trying to accomplish before anyone can say
much useful here.

Best,
Erick

On Mon, Jun 30, 2014 at 2:19 AM, gurunath  wrote:
> Hi,
>
> I want to setup solr in production, Initially the data set i am using is of
> small scale, the size of data will grow gradually. I have heard about using
> "*Big Data Work for Hadoop and Solr*", Is this a better option for large
> data or better to go ahead with tomcat or jetty server with solr.
>
> Thanks
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Integrating-solr-with-Hadoop-tp4144715.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tomcat or Jetty to use with solr in production ?

2014-06-30 Thread Erick Erickson
The only thing I would add is that if you _already_
are a tomcat shop and have considerable
expertise running Tomcat, it might just be easier
to stick with what you know.

But if you have a choice, Jetty is where I'd go.

Best,
Erick

On Mon, Jun 30, 2014 at 4:06 AM, Otis Gospodnetic
 wrote:
> Hi Gurunath,
>
> In 90% of our engagements with various Solr customers we see Jetty, which
> we also recommend and use ourselves for Solr + our own services and
> products.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On Mon, Jun 30, 2014 at 5:07 AM, gurunath  wrote:
>
>> Hi,
>>
>> Confused with lot of reviews on Jetty and tomcat along with solr 4.7 ?, Is
>> there any better option for production. want to know the complexity's with
>> tomcat and jetty in future, as i want to cluster with huge data on solr.
>>
>> Thanks
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Tomcat-or-Jetty-to-use-with-solr-in-production-tp4144712.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>


Re: Integrating solr with Hadoop

2014-06-30 Thread Shawn Heisey
On 6/30/2014 3:19 AM, gurunath wrote:
> I want to setup solr in production, Initially the data set i am using is of
> small scale, the size of data will grow gradually. I have heard about using
> "*Big Data Work for Hadoop and Solr*", Is this a better option for large
> data or better to go ahead with tomcat or jetty server with solr.

Regardless of whether you integrate with hadoop or not, running Solr
requires a servlet container, like jetty (included in the download),
tomcat, or one of the other choices available.  The jetty that's
included in Solr is strongly recommended, unless you already have
extensive experience with another servlet container.

For Solr 5.0, that requirement is expected to disappear -- Solr will
hopefully be an actual application, not a webapp that requires a servlet
container.

Thanks,
Shawn



Re: Integrating solr with Hadoop

2014-06-30 Thread Jay Vyas
Minor clarification:

The storage of indices uses the Hadoop file system API- not hdfs specifically - 
so connection is actually not to hdfs ...  Solr can distribute indices for 
failover / reliability/ scaling to any hcfs compliant filesystem.



> On Jun 30, 2014, at 11:55 AM, Erick Erickson  wrote:
> 
> Whoa! You're confusing a couple of things I think.
> 
> The only real connection Solr <-> Hadoop _may_
> be that Solr can have its indexes stored on HDFS.
> Well, you can also create map/reduce jobs that
> will index the data via M/R and merge them
> into a live index in Solr (assuming it's storing its
> indexes there).
> 
> But this question is very confused:
> "Is this a better option for large data or better
> to go ahead with tomcat or jetty server with solr."
> 
> No matter what, you're still running Solr
> in a tomcat or Jetty server. Hadoop has
> nothing to do with that. Except, as I mentioned
> earlier, the actual index _may_ be stored
> on HDFS if you select the right directory
> implementation in your solroconfig.xml file.
> 
> So we need a better statement of what you're
> trying to accomplish before anyone can say
> much useful here.
> 
> Best,
> Erick
> 
>> On Mon, Jun 30, 2014 at 2:19 AM, gurunath  wrote:
>> Hi,
>> 
>> I want to setup solr in production, Initially the data set i am using is of
>> small scale, the size of data will grow gradually. I have heard about using
>> "*Big Data Work for Hadoop and Solr*", Is this a better option for large
>> data or better to go ahead with tomcat or jetty server with solr.
>> 
>> Thanks
>> 
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Integrating-solr-with-Hadoop-tp4144715.html
>> Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to integrate nlp in solr

2014-06-30 Thread Aman Tandon
Hi Alex,

I was try to get knowledge from these tutorials
http://www.slideshare.net/teofili/natural-language-search-in-solr &
https://wiki.apache.org/solr/OpenNLP: this one is kinda bit explaining but
the real demo is not present.
e.g. query: I want blue color college bags, then how using nlp it will work
and how it will search, there is no such brief explanation out there, i
will be thankful to you if you can help me in this.

With Regards
Aman Tandon


On Mon, Jun 30, 2014 at 6:38 AM, Alexandre Rafalovitch 
wrote:

> On Sun, Jun 29, 2014 at 10:19 PM, Aman Tandon 
> wrote:
> > the appropriate results
> What are those specifically? You need to be a bit more precise about
> what you are trying to achieve. Otherwise, there are too many NLP
> branches and too many approaches.
>
> Regards,
>Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>


ANNOUNCE: Apache Solr Reference Guide for Solr 4.9 available

2014-06-30 Thread Cassandra Targett
The Lucene PMC is pleased to announce the availability of the Apache Solr
Reference Guide for Solr 4.9. The 408 page PDF is the definitive user
manual for Solr 4.9.

The Solr Reference Guide can be downloaded from the Apache mirror network:

https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/

Cassandra


Re: CollapsingQParserPlugin throws Exception when useFilterForSortedQuery=true

2014-06-30 Thread Joel Bernstein
Sure, go ahead create the ticket. I think there is more we can here as
well. I suspect we can get the CollapsingQParserPlugin to work with
useFilterForSortedQuery=true if scoring is not needed for the collapse.
I'll take a closer look at this.

Joel Bernstein
Search Engineer at Heliosearch


On Mon, Jun 30, 2014 at 1:43 AM, Umesh Prasad  wrote:

> Hi Joel,
> Thanks a lot for clarification ..  An error message would indeed be a
> good thing ..   Should I open a jira item for same ?
>
>
>
> On 28 June 2014 19:08, Joel Bernstein  wrote:
>
> > OK, I see the problem. When you use  true
> >  Solr builds a docSet in a way that seems to be
> > incompatible with the CollapsingQParserPlugin. With
> > 
> > true , Solr doesn't run the main query again
> when
> > collecting the DocSet. The getDocSetScore() method is expecting the main
> > query to present, because the CollapsingQParserPlugin may need the scores
> > generated from the main query, to select the group head.
> >
> > I think trying to make  true
> >  compatible with CollapsingQParsePlugin is
> > probably not possible. So, a nice error message would be a good thing.
> >
> > Joel Bernstein
> > Search Engineer at Heliosearch
> >
> >
> > On Tue, Jun 24, 2014 at 3:31 AM, Umesh Prasad 
> > wrote:
> >
> > > Hi ,
> > > Found another bug with CollapsignQParserPlugin. Not a critical one.
> > >
> > > It throws an exception when used with
> > >
> > >  true 
> > >
> > > Patch attached (against 4.8.1 but reproducible in other branches also)
> > >
> > >
> > > 518 T11 C0 oasc.SolrCore.execute [collection1] webapp=null path=null
> > >
> >
> params={q=*%3A*&fq=%7B%21collapse+field%3Dgroup_s%7D&defType=edismax&bf=field%28test_ti%29}
> > > hits=2 status=0 QTime=99
> > > 4557 T11 C0 oasc.SolrCore.execute [collection1] webapp=null path=null
> > >
> >
> params={q=*%3A*&fq=%7B%21collapse+field%3Dgroup_s+nullPolicy%3Dexpand+min%3Dtest_tf%7D&defType=edismax&bf=field%28test_ti%29&sort=}
> > > hits=4 status=0 QTime=15
> > > 4587 T11 C0 oasc.SolrException.log ERROR
> > > java.lang.UnsupportedOperationException: Query  does not implement
> > > createWeight
> > > at org.apache.lucene.search.Query.createWeight(Query.java:80)
> > > at
> > >
> >
> org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:684)
> > > at
> > > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
> > > at
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocSetScore(SolrIndexSearcher.java:879)
> > > at
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:902)
> > > at
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1381)
> > > at
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:478)
> > > at
> > >
> >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:461)
> > > at
> > >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
> > > at
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
> > > at org.apache.solr.util.TestHarness.query(TestHarness.java:295)
> > > at org.apache.solr.util.TestHarness.query(TestHarness.java:278)
> > > at
> > org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:676)
> > > at
> > org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:669)
> > > at
> > >
> >
> org.apache.solr.search.TestCollapseQParserPlugin.testCollapseQueries(TestCollapseQParserPlugin.java:106)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > > at
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > at java.lang.reflect.Method.invoke(Method.java:606)
> > > at
> > >
> >
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
> > > at
> > >
> >
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
> > > at
> > >
> >
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
> > > at
> > >
> >
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
> > > at
> > >
> >
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
> > > at
> > >
> >
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> > > at
> > >
> >
> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
> > > at
> > >
> >
> org.apache.l

Re: solrcloud "indexing completed" event

2014-06-30 Thread Erick Erickson
The paradigm is different. In SolrCloud when a client sends an indexing
request to any node in the system, when the response comes back all the
nodes (leaders, followers, etc) have _all_ received the update and
processed it. So you don't have to care in the same way.

As far as different segments, versions, and all that this is entirely expected.
Considering the above. Packet->leader. leader->follower. Each of them is
independently indexing the documents, there is no replication. So, since
the two servers started at different times, things like the autocommit interval
can kick in at different times and the indexes diverge in terms of segment
counts, version numbers, whatever. They'll return the same _documents_,
but

FWIW,
Erick

On Mon, Jun 30, 2014 at 7:55 AM, Giovanni Bricconi
 wrote:
> Hello
>
> I have one application that queries solr; when the index version changes
> this application has to redo some tasks.
>
> Since I have more than one solr server, I would like to start these tasks
> when all solr nodes are synchronized.
>
> With master/slave configuration the application simply watched
> http://myhost:8080/solr/admin/cores?action=STATUS&core=0bis
> on each solr node and checked that the commit time msec was equal. When the
> time changes and becomes equal on all the nodes the replication is complete
> and it is safe to restart the tasks.
>
> Now I would like to switch to a solrcloud configuration, splitting the core
> 0bis in 3 shards, with 2 replicas for each shard.
>
> After refeeding the collection I tried the same approach calling
>
> http://myhost:8080/solr/admin/cores?action=STATUS&core=0bis_shard3_replica2
>
> for each core of the collection, but with suprise I have found that on the
> same stripe the version of the index, the number of segments and even the
> commit time msec was different!!
>
> I was thinking that it was possible to check some parameter on each
> stripe's core to check that everithing was up to date, but this does not
> seem to be true.
>
> Is it possible somehow to capture the "commit done on every core of the
> collection" event?
>
> Thank you
>
> Giovanni


ExtractingRequestHandler - extracted files caching?

2014-06-30 Thread Gili Nachum
Hello,

I plan to use ExtractingRequestHandler to index binary files text plus app
metadata (like literal.downloadCount and others) into a single document.
I expect the app metadata to change much more often than the binary file
itself. I would hate to have to extract text from the binary file whenever
I need to re-index the doc because of a metadata change.
Is there a some extraction caching solution for files content? or some
other workaround?

Thanks!


Strategy for removing an active shard from zookeeper

2014-06-30 Thread tomasv
Hello All, 
(I'm a newbie, so if my terminology is incorrect or my concepts are wrong,
please point me in the right direction)(This is the first of several
questions to come)

I've inherited a SOLR 4 cloud installation and we're having some issues with
disk space on one of our shards.

We currently have 64 servers serving a collection. The collection is managed
by a zookeeper instance. There are two servers for each shard (32 replicated
shards).

We have a service that is constantly running and inserting new records into
our collection as we get new data to be indexed.

One of our shards is growing (on disk)  disproportionately  quickly. When
the disk gets full, we start getting 500-series errors from the SOLR system
and our websites start to fail.

Currently, when we start seeing these errors, and IT sees that the disk is
full on this particular server, the folks in IT delete the /data directory
and restart the server (linux based). This has the effect of causing the
shard to reboot and re-load itself from its paired partner.

But I would expect that there is a more elegant way to recover from this
event.

Can anyone point me to a strategy that may be used in an instance such as
this? Should we be taking steps to save the indexed information prior to
restarting the server (more on this in a separate question). Should we be
backing up something (anything) prior to the restart?

(I'm still going through the SOLR wiki; so if the answer is there a link is
appreciated).

Thanks!





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strategy-for-removing-an-active-shard-from-zookeeper-tp4144892.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexing non-stored fields

2014-06-30 Thread tomasv
Hello All, (warning: newbie question)

In our schema.xml we have defined many fields such as:


Other fields are defined as this:


Q: If my server is restarted/ rebooted, will I still be able to search for
documents using the "firstname" field? Or will my records need to be
re-indexed before I can search by first name?
It seems that after a re-boot, I can search for the "stored='true'" fields
but not the "stored='false'" fields.

Am I interpreting this correctly? or am I missing something?

Thanks for any help or links! (Still working through the wiki)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-non-stored-fields-tp4144893.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing non-stored fields

2014-06-30 Thread Shawn Heisey
> Hello All, (warning: newbie question)
>
> In our schema.xml we have defined many fields such as:
> 
>
> Other fields are defined as this:
> 
>
> Q: If my server is restarted/ rebooted, will I still be able to search for
> documents using the "firstname" field? Or will my records need to be
> re-indexed before I can search by first name?
> It seems that after a re-boot, I can search for the "stored='true'" fields
> but not the "stored='false'" fields.
>
> Am I interpreting this correctly? or am I missing something?

Fields that are not stored simply mean that they will not be returned in
search results. If they are indexed, then you will be able to search on
those fields.

This should be the case before or after a restart.

Thanks,
Shawn





Re: Indexing non-stored fields

2014-06-30 Thread tomasv
Thanks for the quick response.

Follow-up newbie question:
If the fields are not stored, how is the server able to search for them
after a restart? Where does it get the data to be searched?

Example:  "bob" (firstname) is indexed but not stored. After initial
indexing, I query for "firstname:(bob)" and I get my document back. But if
I restart the server, where does the server go to retrieve information that
will allow me to query for "bob" once again? It would seem that "bob" got
stored someplace if I can query on it after a restart.

My untrained mind thinks that searching for "firstname:(bob)" (after a
restart) will fail, but that searching for "recordid:(12345)" (in my
original example) will succeed since it was indexed+stored.

(stored + indexed makes total sense to me; it's the indexed but NOT stored
that I can't get my head around).

Thanks!



On Mon, Jun 30, 2014 at 5:05 PM, Shawn Heisey-4 [via Lucene] <
ml-node+s472066n4144894...@n3.nabble.com> wrote:

> > Hello All, (warning: newbie question)
> >
> > In our schema.xml we have defined many fields such as:
> > 
> >
> > Other fields are defined as this:
> > 
> >
> > Q: If my server is restarted/ rebooted, will I still be able to search
> for
> > documents using the "firstname" field? Or will my records need to be
> > re-indexed before I can search by first name?
> > It seems that after a re-boot, I can search for the "stored='true'"
> fields
> > but not the "stored='false'" fields.
> >
> > Am I interpreting this correctly? or am I missing something?
>
> Fields that are not stored simply mean that they will not be returned in
> search results. If they are indexed, then you will be able to search on
> those fields.
>
> This should be the case before or after a restart.
>
> Thanks,
> Shawn
>
>
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Indexing-non-stored-fields-tp4144893p4144894.html
>  To unsubscribe from Indexing non-stored fields, click here
> 
> .
> NAML
> 
>



-- 
/*---
 * Tomas at Home
 * dadk...@gmail.com
 * -*/




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-non-stored-fields-tp4144893p4144895.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: CopyField can't copy analyzers and Filters

2014-06-30 Thread Steve McKay
Three fields: AllChamp_ar, AllChamp_fr, AllChamp_en. Then query them with 
dismax.

On Jun 30, 2014, at 11:53 AM, benjelloun  wrote:

> here is my schema: 
> 
>  required="false" stored="false"/>
>  required="false" multiValued="true"/>
> 
>  required="false" multiValued="true"/>
> 
>  required="false" multiValued="true"/>
> 
> 
> 
> 
> 
> when i index documents then search on this field "AllChamp" that don't do
> analyzer and filter.
> I know that CopyField can't copy analyzers and Filters, so how to keep
> analyzer and filter on Field: "AllChamp"?
> 
> Exemple: 
> 
> I search for : AllChamp:presenton  --> num result=0 
>   AllChamp:présenton  --> num result=1 
> 
> thanks for help, 
> best regards, 
> Anass BENJELLOUN 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/CopyField-can-t-copy-analyzers-and-Filters-tp4144803.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Tomcat or Jetty to use with solr in production ?

2014-06-30 Thread Steve McKay
Seconding this. Solr works fine on Jetty. Solr also works fine on Tomcat. The 
Solr community largely uses Jetty, so most of the resources on the Web are for 
running Solr on Jetty, but if you have a reason to use Tomcat and know what 
you're doing then Tomcat is a fine choice.

On Jun 30, 2014, at 11:58 AM, Erick Erickson  wrote:

> The only thing I would add is that if you _already_
> are a tomcat shop and have considerable
> expertise running Tomcat, it might just be easier
> to stick with what you know.
> 
> But if you have a choice, Jetty is where I'd go.
> 
> Best,
> Erick
> 
> On Mon, Jun 30, 2014 at 4:06 AM, Otis Gospodnetic
>  wrote:
>> Hi Gurunath,
>> 
>> In 90% of our engagements with various Solr customers we see Jetty, which
>> we also recommend and use ourselves for Solr + our own services and
>> products.
>> 
>> Otis
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>> 
>> 
>> 
>> On Mon, Jun 30, 2014 at 5:07 AM, gurunath  wrote:
>> 
>>> Hi,
>>> 
>>> Confused with lot of reviews on Jetty and tomcat along with solr 4.7 ?, Is
>>> there any better option for production. want to know the complexity's with
>>> tomcat and jetty in future, as i want to cluster with huge data on solr.
>>> 
>>> Thanks
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Tomcat-or-Jetty-to-use-with-solr-in-production-tp4144712.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 



Re: Indexing non-stored fields

2014-06-30 Thread Steve McKay
Stored doesn't mean "stored to disk", more like "stored verbatim". When you 
index a field, Solr analyzes the field value and makes it part of the index. 
The index is persisted to disk when you commit, which is why it sticks around 
after a restart. Searching the index, mapping from search terms to doc ids, is 
very fast. However, the index is very very bad at going in reverse, from doc 
ids to terms. That's where stored fields come in. When you store a field, Solr 
takes the field value and stores the entire value separate from the index. This 
makes it trivial to get the value for a particular doc id, but it's terrible 
for searching.

So the stored attribute and the indexed attribute have different purposes. 
Indexed means you want to be able to search on the value, and stored means you 
want to be able to see the value in search results.

On Jun 30, 2014, at 8:15 PM, tomasv  wrote:

> Thanks for the quick response.
> 
> Follow-up newbie question:
> If the fields are not stored, how is the server able to search for them
> after a restart? Where does it get the data to be searched?
> 
> Example:  "bob" (firstname) is indexed but not stored. After initial
> indexing, I query for "firstname:(bob)" and I get my document back. But if
> I restart the server, where does the server go to retrieve information that
> will allow me to query for "bob" once again? It would seem that "bob" got
> stored someplace if I can query on it after a restart.
> 
> My untrained mind thinks that searching for "firstname:(bob)" (after a
> restart) will fail, but that searching for "recordid:(12345)" (in my
> original example) will succeed since it was indexed+stored.
> 
> (stored + indexed makes total sense to me; it's the indexed but NOT stored
> that I can't get my head around).
> 
> Thanks!
> 
> 
> 
> On Mon, Jun 30, 2014 at 5:05 PM, Shawn Heisey-4 [via Lucene] <
> ml-node+s472066n4144894...@n3.nabble.com> wrote:
> 
>>> Hello All, (warning: newbie question)
>>> 
>>> In our schema.xml we have defined many fields such as:
>>> 
>>> 
>>> Other fields are defined as this:
>>> 
>>> 
>>> Q: If my server is restarted/ rebooted, will I still be able to search
>> for
>>> documents using the "firstname" field? Or will my records need to be
>>> re-indexed before I can search by first name?
>>> It seems that after a re-boot, I can search for the "stored='true'"
>> fields
>>> but not the "stored='false'" fields.
>>> 
>>> Am I interpreting this correctly? or am I missing something?
>> 
>> Fields that are not stored simply mean that they will not be returned in
>> search results. If they are indexed, then you will be able to search on
>> those fields.
>> 
>> This should be the case before or after a restart.
>> 
>> Thanks,
>> Shawn
>> 
>> 
>> 
>> 
>> 
>> --
>> If you reply to this email, your message will be added to the discussion
>> below:
>> 
>> http://lucene.472066.n3.nabble.com/Indexing-non-stored-fields-tp4144893p4144894.html
>> To unsubscribe from Indexing non-stored fields, click here
>> 
>> .
>> NAML
>> 
>> 
> 
> 
> 
> -- 
> /*---
> * Tomas at Home
> * dadk...@gmail.com
> * -*/
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-non-stored-fields-tp4144893p4144895.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing non-stored fields

2014-06-30 Thread tomasv
Thank you Very much for that explanation. Well done!
-tomas
On Jun 30, 2014 5:55 PM, "Steve McKay-4 [via Lucene]" <
ml-node+s472066n4144902...@n3.nabble.com> wrote:

> Stored doesn't mean "stored to disk", more like "stored verbatim". When
> you index a field, Solr analyzes the field value and makes it part of the
> index. The index is persisted to disk when you commit, which is why it
> sticks around after a restart. Searching the index, mapping from search
> terms to doc ids, is very fast. However, the index is very very bad at
> going in reverse, from doc ids to terms. That's where stored fields come
> in. When you store a field, Solr takes the field value and stores the
> entire value separate from the index. This makes it trivial to get the
> value for a particular doc id, but it's terrible for searching.
>
> So the stored attribute and the indexed attribute have different purposes.
> Indexed means you want to be able to search on the value, and stored means
> you want to be able to see the value in search results.
>
> On Jun 30, 2014, at 8:15 PM, tomasv <[hidden email]
> > wrote:
>
> > Thanks for the quick response.
> >
> > Follow-up newbie question:
> > If the fields are not stored, how is the server able to search for them
> > after a restart? Where does it get the data to be searched?
> >
> > Example:  "bob" (firstname) is indexed but not stored. After initial
> > indexing, I query for "firstname:(bob)" and I get my document back. But
> if
> > I restart the server, where does the server go to retrieve information
> that
> > will allow me to query for "bob" once again? It would seem that "bob"
> got
> > stored someplace if I can query on it after a restart.
> >
> > My untrained mind thinks that searching for "firstname:(bob)" (after a
> > restart) will fail, but that searching for "recordid:(12345)" (in my
> > original example) will succeed since it was indexed+stored.
> >
> > (stored + indexed makes total sense to me; it's the indexed but NOT
> stored
> > that I can't get my head around).
> >
> > Thanks!
> >
> >
> >
> > On Mon, Jun 30, 2014 at 5:05 PM, Shawn Heisey-4 [via Lucene] <
> > [hidden email] >
> wrote:
> >
> >>> Hello All, (warning: newbie question)
> >>>
> >>> In our schema.xml we have defined many fields such as:
> >>> 
> >>>
> >>> Other fields are defined as this:
> >>> 
> >>>
> >>> Q: If my server is restarted/ rebooted, will I still be able to search
> >> for
> >>> documents using the "firstname" field? Or will my records need to be
> >>> re-indexed before I can search by first name?
> >>> It seems that after a re-boot, I can search for the "stored='true'"
> >> fields
> >>> but not the "stored='false'" fields.
> >>>
> >>> Am I interpreting this correctly? or am I missing something?
> >>
> >> Fields that are not stored simply mean that they will not be returned
> in
> >> search results. If they are indexed, then you will be able to search on
> >> those fields.
> >>
> >> This should be the case before or after a restart.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
> >>
> >>
> >>
> >> --
> >> If you reply to this email, your message will be added to the
> discussion
> >> below:
> >>
> >>
> http://lucene.472066.n3.nabble.com/Indexing-non-stored-fields-tp4144893p4144894.html
> >> To unsubscribe from Indexing non-stored fields, click here
> >> <
> >> .
> >> NAML
> >> <
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
> >>
> >
> >
> >
> > --
> > /*---
> > * Tomas at Home
> > * [hidden email] 
> > * -*/
> >
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-non-stored-fields-tp4144893p4144895.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Indexing-non-stored-fields-tp4144893p4144902.html
>  To unsubscribe from Indexing non-stored fields, click here
> 
> .
> NAML
> 

Re: SolrCloud leaders using more disk space

2014-06-30 Thread Greg Pendlebury
Thanks for the reply Tim.

>> "Can you diff the listings of the index data directories on a leader vs.
replica?"

It was a good tip, and mirrors some stuff we have been exploring in house
as well. The leaders all have additional 'index.' directories on disk,
but we have come to the conclusion that this is a coincidence and not
related to the fact that they are leaders.

Current theory is that they are the result of an upgrade rehearsal that was
performed before launch where the cluster was split into two on different
versions of Solr and different ZK paths. I suspect that whilst the ops team
where doing the deployment there were a number of server restarts that
triggered leader elections and recovery events that weren't allowed to
complete gracefully, leaving the old data on disk.

The coincidence is simply that the ops team did all their initial practice
stuff on the same 3 hosts, which later became our leaders. I've found a few
small similar issue on hosts 4-6, and none at all on hosts 7-9.

I hoping we get a chance to test all this soon, but we need to re-jig our
test systems first, since they don't have any redundancy depth to them
right now.

Ta,
Greg


On 28 June 2014 02:59, Timothy Potter  wrote:

> Hi Greg,
>
> Sorry for the slow response. The general thinking is that you
> shouldn't worry about which nodes host leaders vs. replicas because A)
> that can change, and B) as you say, the additional responsibilities
> for leader nodes is quite minimal (mainly per-doc version management
> and then distributing updates to replicas). The segment merging all
> happens at the Lucene level, which has no knowledge of SolrCloud
> leaders / replicas. Since this is SolrCloud, all nodes pull the config
> from ZooKeeper so should be running the same settings. Can you diff
> the listings of the index data directories on a leader vs. replica?
> Might give us some insights to what files the leader has that the
> replicas don't have.
>
> Cheers,
> Tim
>
> On Tue, Jun 3, 2014 at 8:32 PM, Greg Pendlebury
>  wrote:
> > Hi all,
> >
> > We launched our new production instance of SolrCloud last week and since
> > then have noticed a trend with regards to disk usage. The non-leader
> > replicas all seem to be self-optimizing their index segments as expected,
> > but the leaders have (on average) around 33% more data on disk. My
> > assumption is that leader's are not self-optimising (or not to the same
> > extent)... but it is still early days of course.
> >
> > If it helps, there are 45 JVMs in the cloud, with 15 shards and 3
> replicas
> > per shard. Each non-leader shard is sitting at between 59GB and 87GB on
> > their SSD, but the leaders are between 84GB and 116GB.
> >
> > We have pretty much constant read and write traffic 24x7, with just
> 'slow'
> > periods overnight when write traffic is < 1 document per second and
> > searches are between 1 and 2 per second. Is this light level of traffic
> > still too much for the leaders to self-optimise?
> >
> > I'd also be curious to hear about what others are doing in terms of
> > operating procedures. We load test before launch what would happen if we
> > turned off JVMs and forced recovery events. I know that these things all
> > work, just that customers will experience slower search responses whilst
> > they occur. For example, a restore from a leader to a replica under load
> > testing for us takes around 30 minutes and response times drop from
> around
> > 200-300ms average to 1.5s average.
> >
> > Bottleneck appears to be network I/O on the servers. We haven't explored
> > whether this is specific to the servers replicating, or saturation of the
> > of the infrastructure that all the servers share, because...
> >
> > This performance is acceptable for us, but I'm not sure if I'd like to
> > force that event to occur unless required... this is following the line
> of
> > reasoning proposed internally that we should periodically rotate leaders
> by
> > turning them off briefly. We aren't going to do that unless we have a
> > strong reason though. Does anyone try to manipulate production instances
> > that way?
> >
> > Vaguely related to this is leader distribution. We have 9 physical
> servers
> > and 5 JVMs running on each server. By virtue of the deployment procedures
> > the first 3 servers to come online are all running 5 leaders each. Is
> there
> > any merit in 'moving' these around (by reboots)?
> >
> > Our planning up to launch was based on lots of mailing list response we'd
> > seen that indicated leaders had no significant performance difference to
> > normal replicas, and all of our testing has agreed with that. The disk
> size
> > 'issue' (which we aren't worried about... yet. It hasn't been in prod
> long
> > enough to know for certain) may be the only thing we've seen so far.
> >
> > Ta,
> > Greg
>


Re: Strategy for removing an active shard from zookeeper

2014-06-30 Thread Anshum Gupta
You should use the DELETEREPLICA Collections API:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api9

As of the last release, I don't think it deletes the index directory
but I remember there was a JIRA for the same.
For now you could perhaps use this API and follow it up with manually
deleting the directory after that. This should help you maintain the
sanity of the SolrCloud state.


On Mon, Jun 30, 2014 at 8:45 PM, tomasv  wrote:
> Hello All,
> (I'm a newbie, so if my terminology is incorrect or my concepts are wrong,
> please point me in the right direction)(This is the first of several
> questions to come)
>
> I've inherited a SOLR 4 cloud installation and we're having some issues with
> disk space on one of our shards.
>
> We currently have 64 servers serving a collection. The collection is managed
> by a zookeeper instance. There are two servers for each shard (32 replicated
> shards).
>
> We have a service that is constantly running and inserting new records into
> our collection as we get new data to be indexed.
>
> One of our shards is growing (on disk)  disproportionately  quickly. When
> the disk gets full, we start getting 500-series errors from the SOLR system
> and our websites start to fail.
>
> Currently, when we start seeing these errors, and IT sees that the disk is
> full on this particular server, the folks in IT delete the /data directory
> and restart the server (linux based). This has the effect of causing the
> shard to reboot and re-load itself from its paired partner.
>
> But I would expect that there is a more elegant way to recover from this
> event.
>
> Can anyone point me to a strategy that may be used in an instance such as
> this? Should we be taking steps to save the indexed information prior to
> restarting the server (more on this in a separate question). Should we be
> backing up something (anything) prior to the restart?
>
> (I'm still going through the SOLR wiki; so if the answer is there a link is
> appreciated).
>
> Thanks!
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Strategy-for-removing-an-active-shard-from-zookeeper-tp4144892.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 

Anshum Gupta
http://www.anshumgupta.net


Re: ExtractingRequestHandler - extracted files caching?

2014-06-30 Thread Alexandre Rafalovitch
Under the covers, Tika is used. You can use Tika yourself on the
client side and cache it's output in the database or text file. Then,
send that to Solr instead. Puts less load on Solr as well.

Or you can use atomic update, but then all the primary (not copyField)
fields must be stored="true".

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, Jul 1, 2014 at 5:55 AM, Gili Nachum  wrote:
> Hello,
>
> I plan to use ExtractingRequestHandler to index binary files text plus app
> metadata (like literal.downloadCount and others) into a single document.
> I expect the app metadata to change much more often than the binary file
> itself. I would hate to have to extract text from the binary file whenever
> I need to re-index the doc because of a metadata change.
> Is there a some extraction caching solution for files content? or some
> other workaround?
>
> Thanks!


Re: ExtractingRequestHandler - extracted files caching?

2014-06-30 Thread Erick Erickson
Here's an example of what Alexandre is
talking about:
http://searchhub.org/2012/02/14/indexing-with-solrj/

It mixes database fetching in with the
Tika processing, but that should be pretty easy
to pull out.

Best,
Erick

On Mon, Jun 30, 2014 at 8:21 PM, Alexandre Rafalovitch
 wrote:
> Under the covers, Tika is used. You can use Tika yourself on the
> client side and cache it's output in the database or text file. Then,
> send that to Solr instead. Puts less load on Solr as well.
>
> Or you can use atomic update, but then all the primary (not copyField)
> fields must be stored="true".
>
> Regards,
>Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr 
> proficiency
>
>
> On Tue, Jul 1, 2014 at 5:55 AM, Gili Nachum  wrote:
>> Hello,
>>
>> I plan to use ExtractingRequestHandler to index binary files text plus app
>> metadata (like literal.downloadCount and others) into a single document.
>> I expect the app metadata to change much more often than the binary file
>> itself. I would hate to have to extract text from the binary file whenever
>> I need to re-index the doc because of a metadata change.
>> Is there a some extraction caching solution for files content? or some
>> other workaround?
>>
>> Thanks!


Re: Integrating solr with Hadoop

2014-06-30 Thread gurunath
Thanks everybody, And I was confused. Now if i am not wrong, I have to use
solr with tomcat or jetty and I can use Hadoop file system to store index
file where solr by default uses NTFs... and etc. So my question is can I
have a configuration mentioned below.

1. Solr 4.7 + Tomcat 7 + Apache zookeeper and Hadoop.

Thanks
Guru Pai.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Integrating-solr-with-Hadoop-tp4144715p4144916.html
Sent from the Solr - User mailing list archive at Nabble.com.