Re: Integrating solr with Hadoop

2014-06-30 Thread gurunath
Thanks everybody, And I was confused. Now if i am not wrong, I have to use solr with tomcat or jetty and I can use Hadoop file system to store index file where solr by default uses NTFs... and etc. So my question is can I have a configuration mentioned below. 1. Solr 4.7 + Tomcat 7 + Apache zookee

Re: ExtractingRequestHandler - extracted files caching?

2014-06-30 Thread Erick Erickson
Here's an example of what Alexandre is talking about: http://searchhub.org/2012/02/14/indexing-with-solrj/ It mixes database fetching in with the Tika processing, but that should be pretty easy to pull out. Best, Erick On Mon, Jun 30, 2014 at 8:21 PM, Alexandre Rafalovitch wrote: > Under the co

Re: ExtractingRequestHandler - extracted files caching?

2014-06-30 Thread Alexandre Rafalovitch
Under the covers, Tika is used. You can use Tika yourself on the client side and cache it's output in the database or text file. Then, send that to Solr instead. Puts less load on Solr as well. Or you can use atomic update, but then all the primary (not copyField) fields must be stored="true". Re

Re: Strategy for removing an active shard from zookeeper

2014-06-30 Thread Anshum Gupta
You should use the DELETEREPLICA Collections API: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api9 As of the last release, I don't think it deletes the index directory but I remember there was a JIRA for the same. For now you could perhaps use this API and follo

Re: SolrCloud leaders using more disk space

2014-06-30 Thread Greg Pendlebury
Thanks for the reply Tim. >> "Can you diff the listings of the index data directories on a leader vs. replica?" It was a good tip, and mirrors some stuff we have been exploring in house as well. The leaders all have additional 'index.' directories on disk, but we have come to the conclusion t

Re: Indexing non-stored fields

2014-06-30 Thread tomasv
Thank you Very much for that explanation. Well done! -tomas On Jun 30, 2014 5:55 PM, "Steve McKay-4 [via Lucene]" < ml-node+s472066n4144902...@n3.nabble.com> wrote: > Stored doesn't mean "stored to disk", more like "stored verbatim". When > you index a field, Solr analyzes the field value and make

Re: Indexing non-stored fields

2014-06-30 Thread Steve McKay
Stored doesn't mean "stored to disk", more like "stored verbatim". When you index a field, Solr analyzes the field value and makes it part of the index. The index is persisted to disk when you commit, which is why it sticks around after a restart. Searching the index, mapping from search terms t

Re: Tomcat or Jetty to use with solr in production ?

2014-06-30 Thread Steve McKay
Seconding this. Solr works fine on Jetty. Solr also works fine on Tomcat. The Solr community largely uses Jetty, so most of the resources on the Web are for running Solr on Jetty, but if you have a reason to use Tomcat and know what you're doing then Tomcat is a fine choice. On Jun 30, 2014, at

Re: CopyField can't copy analyzers and Filters

2014-06-30 Thread Steve McKay
Three fields: AllChamp_ar, AllChamp_fr, AllChamp_en. Then query them with dismax. On Jun 30, 2014, at 11:53 AM, benjelloun wrote: > here is my schema: > > required="false" stored="false"/> > required="false" multiValued="true"/> > > required="false" multiValued="true"/> > > required="fa

Re: Indexing non-stored fields

2014-06-30 Thread tomasv
Thanks for the quick response. Follow-up newbie question: If the fields are not stored, how is the server able to search for them after a restart? Where does it get the data to be searched? Example: "bob" (firstname) is indexed but not stored. After initial indexing, I query for "firstname:(bob)

Re: Indexing non-stored fields

2014-06-30 Thread Shawn Heisey
> Hello All, (warning: newbie question) > > In our schema.xml we have defined many fields such as: > > > Other fields are defined as this: > > > Q: If my server is restarted/ rebooted, will I still be able to search for > documents using the "firstname" field? Or will my records need to be > re-i

Indexing non-stored fields

2014-06-30 Thread tomasv
Hello All, (warning: newbie question) In our schema.xml we have defined many fields such as: Other fields are defined as this: Q: If my server is restarted/ rebooted, will I still be able to search for documents using the "firstname" field? Or will my records need to be re-indexed before I can

Strategy for removing an active shard from zookeeper

2014-06-30 Thread tomasv
Hello All, (I'm a newbie, so if my terminology is incorrect or my concepts are wrong, please point me in the right direction)(This is the first of several questions to come) I've inherited a SOLR 4 cloud installation and we're having some issues with disk space on one of our shards. We currently

ExtractingRequestHandler - extracted files caching?

2014-06-30 Thread Gili Nachum
Hello, I plan to use ExtractingRequestHandler to index binary files text plus app metadata (like literal.downloadCount and others) into a single document. I expect the app metadata to change much more often than the binary file itself. I would hate to have to extract text from the binary file when

Re: solrcloud "indexing completed" event

2014-06-30 Thread Erick Erickson
The paradigm is different. In SolrCloud when a client sends an indexing request to any node in the system, when the response comes back all the nodes (leaders, followers, etc) have _all_ received the update and processed it. So you don't have to care in the same way. As far as different segments,

Re: CollapsingQParserPlugin throws Exception when useFilterForSortedQuery=true

2014-06-30 Thread Joel Bernstein
Sure, go ahead create the ticket. I think there is more we can here as well. I suspect we can get the CollapsingQParserPlugin to work with useFilterForSortedQuery=true if scoring is not needed for the collapse. I'll take a closer look at this. Joel Bernstein Search Engineer at Heliosearch On Mon

ANNOUNCE: Apache Solr Reference Guide for Solr 4.9 available

2014-06-30 Thread Cassandra Targett
The Lucene PMC is pleased to announce the availability of the Apache Solr Reference Guide for Solr 4.9. The 408 page PDF is the definitive user manual for Solr 4.9. The Solr Reference Guide can be downloaded from the Apache mirror network: https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-gui

Re: How to integrate nlp in solr

2014-06-30 Thread Aman Tandon
Hi Alex, I was try to get knowledge from these tutorials http://www.slideshare.net/teofili/natural-language-search-in-solr & https://wiki.apache.org/solr/OpenNLP: this one is kinda bit explaining but the real demo is not present. e.g. query: I want blue color college bags, then how using nlp it wi

Re: Integrating solr with Hadoop

2014-06-30 Thread Jay Vyas
Minor clarification: The storage of indices uses the Hadoop file system API- not hdfs specifically - so connection is actually not to hdfs ... Solr can distribute indices for failover / reliability/ scaling to any hcfs compliant filesystem. > On Jun 30, 2014, at 11:55 AM, Erick Erickson wro

Re: Integrating solr with Hadoop

2014-06-30 Thread Shawn Heisey
On 6/30/2014 3:19 AM, gurunath wrote: > I want to setup solr in production, Initially the data set i am using is of > small scale, the size of data will grow gradually. I have heard about using > "*Big Data Work for Hadoop and Solr*", Is this a better option for large > data or better to go ahead w

Re: Tomcat or Jetty to use with solr in production ?

2014-06-30 Thread Erick Erickson
The only thing I would add is that if you _already_ are a tomcat shop and have considerable expertise running Tomcat, it might just be easier to stick with what you know. But if you have a choice, Jetty is where I'd go. Best, Erick On Mon, Jun 30, 2014 at 4:06 AM, Otis Gospodnetic wrote: > Hi G

Re: Integrating solr with Hadoop

2014-06-30 Thread Erick Erickson
Whoa! You're confusing a couple of things I think. The only real connection Solr <-> Hadoop _may_ be that Solr can have its indexes stored on HDFS. Well, you can also create map/reduce jobs that will index the data via M/R and merge them into a live index in Solr (assuming it's storing its indexes

Re: Solr Fields Multilingue

2014-06-30 Thread benjelloun
Hello, its ok i did it :) thanks 2014-06-30 17:48 GMT+02:00 Erick Erickson [via Lucene] < ml-node+s472066n4144798...@n3.nabble.com>: > Again, please open a new thread for new questions. > Do _not_ just reply-to then change the subject, it > stays in the same thread anyway. > > Best, > Erick >

CopyField can't copy analyzers and Filters

2014-06-30 Thread benjelloun
here is my schema: when i index documents then search on this field "AllChamp" that don't do analyzer and filter. I know that CopyField can't copy analyzers and Filters, so how to keep analyzer and filter on Field: "AllChamp"? Exemple: I search for : AllChamp:presenton --> num res

NPE when using facets with the MLT handler.

2014-06-30 Thread SafeJava T
I am getting an NPE when using facets with the MLT handler. I googled for other npe errors with facets, but this trace looked different from the ones I found. We are using Solr 4.9-SNAPSHOT. I have reduced the query to the most basic form I can: q=id:XXX&mlt.fl=mlt_field&facet=true&facet.fie

Re: Solr Fields Multilingue

2014-06-30 Thread Erick Erickson
Again, please open a new thread for new questions. Do _not_ just reply-to then change the subject, it stays in the same thread anyway. Best, Erick On Mon, Jun 30, 2014 at 7:57 AM, benjelloun wrote: > Hello, > > Ok thanks, > i have another question :) > > here is my schema: > > required="false"

MultiCollection AddCore fails

2014-06-30 Thread cpalm
Hi, I have a maintenance use case where there are 2 collections defined, and I need to do an remove core on one of the collections, and then be able to add core that collection back in. I can successfully remove the core with the 2ndary collection, but after I add that core/collection back in the

RE: unable to start solr instance

2014-06-30 Thread Markus Jelsma
(Too many open files) Try raising the limit from probably 1024 to 4k-16k orso. -Original message- > From:Niklas Langvig > Sent: Monday 30th June 2014 17:09 > To: solr-user@lucene.apache.org > Subject: unable to start solr instance > > Hello, > We havet o solr instances running on lin

Re: Solr Fields Multilingue

2014-06-30 Thread benjelloun
hey, Thats true i know it was that but any idea of how can i resolve that ? best regards Anass BENJELLOUN -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Fields-Multilingue-tp4144223p4144790.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Fields Multilingue

2014-06-30 Thread Uwe Reh
Am 30.06.2014 16:57, schrieb benjelloun: "AllChamp" that don't do analyzer and filter. any idea? Exemple: I search for : AllChamp:presenton --> num result=0 AllChamp:présenton --> num result=1 Hi Anass, any analyzer means any modification (no ICU-Normalisation). "copyFie

unable to start solr instance

2014-06-30 Thread Niklas Langvig
Hello, We havet o solr instances running on linux/tomcat7 Both have been working fine, now only 1 works. The other seems to have crashed or something. SolrCore Initialization Failures * collection1: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error initial

Re: Solr Fields Multilingue

2014-06-30 Thread benjelloun
Hello, Ok thanks, i have another question :) here is my schema: is this correct? because when i index documents then search on this field "AllChamp" that don't do analyzer and filter. any idea? Exemple: I search for : AllChamp:presenton --> num result=0 AllChamp:pr

solrcloud "indexing completed" event

2014-06-30 Thread Giovanni Bricconi
Hello I have one application that queries solr; when the index version changes this application has to redo some tasks. Since I have more than one solr server, I would like to start these tasks when all solr nodes are synchronized. With master/slave configuration the application simply watched h

Re: Solr Fields Multilingue

2014-06-30 Thread Erick Erickson
First, please open a new thread rather than reply to an old one, see http://people.apache.org/~hossman/#threadhijack Second, you haven't explained what it is you need to have happen or what you expect. As far as I know, the language detection code tries to identify _the_ language and picks one, I

Re: How do I use multiple boost functions?

2014-06-30 Thread Bhoomit Vasani
It turns out that the solution I was looking to is additive boost. Thanks for the help :) On Mon, Jun 30, 2014 at 5:14 PM, Jack Krupansky wrote: > Do you want them to be additive or multiplicative? Just add or multiply > them yourself with the "add"/"sum" or "mul"/"product" functions. > > See:

Re: How do I use multiple boost functions?

2014-06-30 Thread Bhoomit Vasani
Thanks, I tried this but finally bf(additive boost) param worked well for me. Thanks for the help :) On Mon, Jun 30, 2014 at 5:14 PM, Ahmet Arslan wrote: > Hi, > > Use edismax query parser. boost parameter can take multiple values. > > &boost=recip(geodist(destination,1.293841,103.846487),1,10

RE: Multiterm analysis in complexphrase query

2014-06-30 Thread Allison, Timothy B.
Ahmet, please correct me if I'm wrong, but the ComplexPhraseQueryParser does not perform analysis (as you, Michael, point out). The SpanQueryParser in LUCENE-5205 does perform analysis and might meet your needs. Work on it has gone on pause, though, so you'll have to build from the patch or th

RE: SlowFuzzySearch

2014-06-30 Thread Allison, Timothy B.
I've been away from parsers for a bit, but you should be able to subclass a getFuzzyQuery() (or similar) call fairly easily. Again, last time I looked, it used the automaton (fast) for <=2 and backed off to truly slow for > 2. Note that transposition is only operational for the automaton, not

Re: How do I use multiple boost functions?

2014-06-30 Thread Ahmet Arslan
Hi, Use edismax query parser. boost parameter can take multiple values. &boost=recip(geodist(destination,1.293841,103.846487),1,1000,1000) &boost=if(exists(query({!v=$b1})),100,0) On Monday, June 30, 2014 2:30 PM, Bhoomit Vasani wrote: Hello, I want to boost using multiple functions. e.g.

Re: How do I use multiple boost functions?

2014-06-30 Thread Jack Krupansky
Do you want them to be additive or multiplicative? Just add or multiply them yourself with the "add"/"sum" or "mul"/"product" functions. See: https://cwiki.apache.org/confluence/display/solr/Function+Queries If you are using the dismax or edismax query parsers you can also use separate request

How do I use multiple boost functions?

2014-06-30 Thread Bhoomit Vasani
Hello, I want to boost using multiple functions. e.g. {!boost b=recip(geodist(destination,1.293841,103.846487),1,1000,1000) b="if(exists(query({!v=$b1})),100,0)" } when I use above query Solr only considers second function. -- -- Thanks & Regards, Bhoomit Vasani | SE @ Mygola WE are LIVE

Re: Tomcat or Jetty to use with solr in production ?

2014-06-30 Thread Otis Gospodnetic
Hi Gurunath, In 90% of our engagements with various Solr customers we see Jetty, which we also recommend and use ourselves for Solr + our own services and products. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Mon, Jun

Re: Tomcat or Jetty to use with solr in production ?

2014-06-30 Thread Ahmet Arslan
Hi, solr test cases use embedded jetty therefore jetty is the recommended one. Ahmet On Monday, June 30, 2014 12:08 PM, gurunath wrote: Hi, Confused with lot of reviews on Jetty and tomcat along with solr 4.7 ?, Is there any better option for production. want to know the complexity's with to

solr dedup on specific fields

2014-06-30 Thread Ali Nazemian
Hi, I used solr 4.8 for indexing the web pages that come from nutch. I know that solr deduplication operation works on uniquekey field. So I set that to URL field. Everything is OK. except that I want after duplication detection solr try not to delete all fields of old document. I want some fields

Re: how to log ngroups

2014-06-30 Thread Aman Tandon
Hi Umesh, Thanks alot this might help me. With Regards Aman Tandon On Mon, Jun 30, 2014 at 11:34 AM, Umesh Prasad wrote: > Hi Aman, > You can implement and register a last-component which extracts the > ngroups from response and adds it to response. > You can checkout tutorial about S

Re: Search results not as expected.

2014-06-30 Thread Modassar Ather
Thanks for the details Chris. Regards, Modassar On Fri, Jun 27, 2014 at 3:33 AM, Chris Hostetter wrote: > > : *ab:(system entity) OR ab:authorization* : Number of results returned 2 > : which is not expected. > : It seems this query makes the previous terms as OR if the next term is > : introd

Integrating solr with Hadoop

2014-06-30 Thread gurunath
Hi, I want to setup solr in production, Initially the data set i am using is of small scale, the size of data will grow gradually. I have heard about using "*Big Data Work for Hadoop and Solr*", Is this a better option for large data or better to go ahead with tomcat or jetty server with solr. Th

Tomcat or Jetty to use with solr in production ?

2014-06-30 Thread gurunath
Hi, Confused with lot of reviews on Jetty and tomcat along with solr 4.7 ?, Is there any better option for production. want to know the complexity's with tomcat and jetty in future, as i want to cluster with huge data on solr. Thanks -- View this message in context: http://lucene.472066.n3.na

[ANNOUNCE] Luke 4.9.0 released

2014-06-30 Thread Dmitry Kan
Hello, Luke 4.9.0 has been released. Download it here: https://github.com/DmitryKey/luke/releases/tag/4.9.0 The release has been tested against the solr-4.9.0 index. Most of the changes are in the org.getopt.luke.plugins.FsDirectory.java class, thus concern Lucene over Hadoop users. Remember t

Re: Solr Fields Multilingue

2014-06-30 Thread benjelloun
Hello again, I have a document which have 3 different language text(arabic, english, frensh). i have just this result: "language_s": [ "en" ] thanks for help, Best regards, Anass BENJELLOUN -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Fields-Mult