Re: solr cloud does not start with many collections

2015-03-04 Thread Shawn Heisey
On 3/3/2015 9:22 PM, Damien Kamerman wrote: > I've done a similar thing to create the collections. You're going to need > more memory I think. > > OK, so maxThreads limit on jetty could be causing a distributed dead-lock? I don't know what the exact problems would be if maxThreads is reached. It

Re: solr cloud does not start with many collections

2015-03-04 Thread Shawn Heisey
On 3/4/2015 1:02 AM, Shawn Heisey wrote: > Even now, nearly three hours after startup, the Solr log is still > spitting out thousands of lines that look like this, so I don't think I > can call it stable: > > INFO - 2015-03-04 07:35:51.166; > org.apache.solr.common.cloud.ZkStateReader; Updating d

Pattern for extracting text from a rich document and an associated metadata file

2015-03-04 Thread Yavar Husain
What is the best pattern to index the following kind of data: HarryPotter.PDF HarryPotter.txt Avengers.Docx Avengers.txt For each of the above file the meta data lies in the text file having same name as the rich document (as can be seen above). (1) Now the brute force method that I can think o

Re: Solr join + Boost in single query

2015-03-04 Thread sraav
Yes Mikhail. Similar to the one you mentioned. The only difference is that, in my case a uinon between two cores would work too.. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-join-Boost-in-single-query-tp4190825p4190911.html Sent from the Solr - User mailing list arc

Re: Solr join + Boost in single query

2015-03-04 Thread Mikhail Khludnev
if I get you right, "union" query can be achieved by SHOULD clauses: q=foo {!scorejoin fromIndex=2nd}bar On Wed, Mar 4, 2015 at 4:16 PM, sraav wrote: > Yes Mikhail. Similar to the one you mentioned. The only difference is that, > in my case a uinon between two cores would work too.. > > > > --

Re: Access permission

2015-03-04 Thread John Maker
Option #2 is far better. I found this: https://wiki.apache.org/solr/SolrSecurity#Document_Level_Security but this solution requires that I use Manifold CF which I cannot. Does anyone know how Manifold does it and can it be adopted to Solr? Another idea I'm wandering about is what if I create

Why frequency of suggestion is different from indexed frequency in Solr?

2015-03-04 Thread Nitin Solanki
Hi, Frequency of suggestion is different from the original frequency which is in indexed. Why so? I have applied "StandardTokenizer with ShingleFilterFactory on field.

Re: Pattern for extracting text from a rich document and an associated metadata file

2015-03-04 Thread Ahmet Arslan
Hi Yavar, I would stick with Erik's post : http://lucidworks.com/blog/indexing-with-solrj/ Ahmet On Wednesday, March 4, 2015 12:05 PM, Yavar Husain wrote: What is the best pattern to index the following kind of data: HarryPotter.PDF HarryPotter.txt Avengers.Docx Avengers.txt For each of

Frequency of Suggestion are varying from original Frequency in index

2015-03-04 Thread Nitin Solanki
Hi.. I have a term("who") where original frequency of "who" is 191 but when I get suggestion of "who" it gives me 90. Why? Example : *Original Frequency* comes like: "spellcheck":{ "suggestions":[ "who",{ "numFound":1, "startOffset":1, "endOffset":4,

OUTOFMEMORY

2015-03-04 Thread Rajesh
Hi, I'm using SortedMapBackedCache for my child entities. When I use this I'm getting outofmemory exception and the records are not getting indexed. I've increased my heap size to 3GB. but still the same result. Is there a way how I can configure it to index 1L records and clear the cache and then

DIH's TikaEntityProcessor's handling of embedded documents

2015-03-04 Thread Allison, Timothy B.
All, I recently took a look at the source code for TikaEntityProcessor, and I noticed that the code is not configuring the ParseContext to have Tika's AutoDetectParser (or any parser) parse documents recursively. That is, if you have a zip file or any other container document, DIH's TikaEnti

Multivalu field grouping

2015-03-04 Thread Darin Amos
Hi All, I sent an email out earlier but I didn’t get any responses so I thought I would try to reframe the question. I have a problem that I believe multivalued field grouping is the perfect answer for, of course since SOLR doesn’t support multivalued field grouping, I need to find an alterna

Re: DIH's TikaEntityProcessor's handling of embedded documents

2015-03-04 Thread Alexandre Rafalovitch
DIH does not get as much attention as other parts of the system. If you see a clear way to improve it, I'd say go ahead and file the issue. If you can provide the patch which passes the tests and - ideally - includes new tests, this would be even greater. Regards, Alex. Solr Analyzers, To

Recommendations based on MoreLikeThis & user likes/dislikes

2015-03-04 Thread Bryan Bende
Does anyone have experience tracking documents that a user "liked" / "disliked" and then incorporating that into a MoreLikeThis query? The idea would be to exclude any document a user disliked from ever returning as a similar document, and to boost any document a user liked so it shows up higher i

Re: About solr recovery

2015-03-04 Thread Erick Erickson
Hmm, 4.9 is reasonably recent. Do you by chance have the suggester uncommented? the suggester rebuilds whenever the core starts, see solrconfig.xml. But that should happen on _all_ the shards so I rather doubt this is the problem. How big are your transaction logs? Look in the .../data/tlog direct

Re: Solr 4.7.2 mergeFactor

2015-03-04 Thread Chris Hostetter
https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which

Re: solr cloud does not start with many collections

2015-03-04 Thread Shawn Heisey
On 3/4/2015 2:09 AM, Shawn Heisey wrote: > I've come to one major conclusion about this whole thing, even before > I reach the magic number of 4000 collections. Thousands of collections > is not at all practical with SolrCloud currently. I've now encountered a new problem. I may have been hasty i

RE: DIH's TikaEntityProcessor's handling of embedded documents

2015-03-04 Thread Allison, Timothy B.
Got it. Thank you. SOLR-7189. -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Wednesday, March 04, 2015 12:31 PM To: solr-user Subject: Re: DIH's TikaEntityProcessor's handling of embedded documents DIH does not get as much attention as other parts of

Re: Log numfound, qtime, ...

2015-03-04 Thread Ahmed Adel
Hi, I believe a better approach than Solarium is to create a custom search component that extends SearchComponent class and override process() method to store query, QTime, and numFound to a database for further analysis. This approach would cut steps 2 through 6 into one step. Analysis can be done

RE: Log numfound, qtime, ...

2015-03-04 Thread Markus Jelsma
Hello - This patch may be more straightforward https://issues.apache.org/jira/browse/SOLR-4018 -Original message- > From:Ahmed Adel > Sent: Wednesday 4th March 2015 19:39 > To: solr-user@lucene.apache.org > Subject: Re: Log numfound, qtime, ... > > Hi, I believe a better approach than

Re: Recommendations based on MoreLikeThis & user likes/dislikes

2015-03-04 Thread Ahmet Arslan
Hi Bryan, If you have sufficient like/dislike data, I would set up a collaborative filtering / recommendation system. For example : https://mahout.apache.org Then we can view mlt as content based recommendation. Then you can combine results from both systems. Ahmet On Wednesday, March 4, 2

Text analysis which expand the index with many words break subsequent analysis

2015-03-04 Thread fredericbaroz
Hello, My name is Frédéric Baroz. I work as a in-hospital physician in Intern Medicin in Switzerland (i speak french) and software engineer. I work in medical informatics and I m currently making some research about "semantic search" for in-hosp physician who are daily confronted with searching me

Re: Text analysis which expand the index with many words break subsequent analysis

2015-03-04 Thread Alexandre Rafalovitch
Have you thought about using copyText with two different processing pipelines? Then you could search both variants with different weights? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 4 March 2015 at 14:18, fredericbaroz

Re: Log numfound, qtime, ...

2015-03-04 Thread Chris Hostetter
: Here's my need : I'd like to log Solr Responses so as to achieve some : business statistics. : I'd like to report, as a daily/weekly/yearly/whateverly basis, the following : KPIs : ... : I think I'll soon get into performance issues, as you guess. : Do you know a better approach ? All of this

solr 4.7.2 mergeFactor/ Merge policy issue

2015-03-04 Thread Summer Shire
Hi All, I am using solr 4.7.2 is there a bug wrt merging the segments down ? I recently added the following to my solrConfig.xml false 100 1000 5 But I do not see any merging of the segments happening. I saw some other people have the same issue but there wasn’t much info

Re: Text analysis which expand the index with many words break subsequent analysis

2015-03-04 Thread fredericbaroz
Thanks a lot for the quick response! and sorry for my english. Do you mean "copyField"? I guess your idea is to index text twice, in 2 different fields, one being very heavily analysed and one almost left as is. If yes, then yes, I thought about it, or rather, I read this was a possibility. It do

How can I break/modify Solr internal synonyms

2015-03-04 Thread vit
I use Solr 4.2 On Siamese I am getting Thailand results which has bad user experience with our customers. It happens in (KSF in Analyzer tool) Looks like it is a built in mapping. How can I change this kind of mapping? -- View this message in context: http://lucene.472066.n3.nabble.com/How

Re: solr 4.7.2 mergeFactor/ Merge policy issue

2015-03-04 Thread Erick Erickson
I _think_, but don't know for sure, that the merging stuff doesn't get triggered until you commit, it doesn't "just happen". Shot in the dark... Erick On Wed, Mar 4, 2015 at 1:15 PM, Summer Shire wrote: > Hi All, > > I am using solr 4.7.2 is there a bug wrt merging the segments down ? > > I rec

Re: snapinstaller does not start newSearcher

2015-03-04 Thread alxsss
I have used snapshotter api and modified snapinstaller script, so that it successfully grabs the snapshot folder and updates index folder in slave. However, it fails to open newSearcher. It simple, sends a commit command to slave, but hasUncommittedChanges function returns false. That is the reas

Re: java.lang.AbstractMethodError at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)

2015-03-04 Thread jj_solr
How did you resolve the problem?. Please advise. -- View this message in context: http://lucene.472066.n3.nabble.com/java-lang-AbstractMethodError-at-org-apache-solr-handler-ContentStreamHandlerBase-handleRequestBody--tp3026470p4191065.html Sent from the Solr - User mailing list archive at Nabb

Re: solr 4.7.2 mergeFactor/ Merge policy issue

2015-03-04 Thread Summer Shire
actually after every commit a new segment gets created. I don't see them merging down. what all could i do to debug this better. Hasn't anyone else tried to merge their segments down to a specific range :) ? On Wed, Mar 4, 2015 at 3:12 PM, Erick Erickson wrote: > I _think_, but don't know for s

Re: solr cloud does not start with many collections

2015-03-04 Thread Damien Kamerman
I'm running on Solaris x86, I have plenty of memory and no real limits # plimit 15560 15560: /opt1/jdk/bin/java -d64 -server -Xss512k -Xms32G -Xmx32G -XX:MaxMetasp resource current maximum time(seconds) unlimited unlimited file(blocks) unlimited

Re: New leader/replica solution for HDFS

2015-03-04 Thread longsan
I'm happy to hear that. It's good option for Solr + HDFS solution. This can avoid much performance issues. -- View this message in context: http://lucene.472066.n3.nabble.com/New-leader-replica-solution-for-HDFS-tp4188735p4191082.html Sent from the Solr - User mailing list archive at Nabble.com

Re: New leader/replica solution for HDFS

2015-03-04 Thread longsan
Our updating requests is very heavy. So we met several performance problems: 1)replicas can not catch up the index speed of leader after run some moment and had to recover, but very slow and often failed. 2)data inconsistent between leader/replica, you got different results when do same query twice

Re: solr 4.7.2 mergeFactor/ Merge policy issue

2015-03-04 Thread Shawn Heisey
On 3/4/2015 4:12 PM, Erick Erickson wrote: > I _think_, but don't know for sure, that the merging stuff doesn't get > triggered until you commit, it doesn't "just happen". > > Shot in the dark... I believe that new segments are created when the indexing buffer (ramBufferSizeMB) fills up, even wit

Re: solr cloud does not start with many collections

2015-03-04 Thread Shawn Heisey
On 3/4/2015 5:37 PM, Damien Kamerman wrote: > I'm running on Solaris x86, I have plenty of memory and no real limits > # plimit 15560 > 15560: /opt1/jdk/bin/java -d64 -server -Xss512k -Xms32G -Xmx32G > -XX:MaxMetasp >resource current maximum > time(seconds) unlim

Original frequency is not matching with suggestion frequency in SOLR

2015-03-04 Thread Nitin Solanki
Hello, Something suggestion frequency varies from the original frequency. Output for *"whs is"* - *(73)* which is a suggestion of *"who is"* varies than its actual original frequency *(94). * *Please* check this link for more explanation - *http://stackoverflow.com/questions/28857915/

Issue while enabling clustering/integrating carrot2 with solr 4.4.0 and tomact under ubuntu

2015-03-04 Thread sthita
1.My solr.xml .. .. 2.My solrconfig.xml changes for carrot2 integrate default org.carrot2.clustering.lingo.LingoClusteringAlgorithm 20 . . . . 3.Copied all the r

Issue while enabling clustering/integrating carrot2 with solr 4.4.0 and tomact under ubuntu

2015-03-04 Thread sthita
1.My solr.xml .. .. 2.My solrconfig.xml changes for carrot2 integrate default org.carrot2.clustering.lingo.LingoClusteringAlgorithm 20 . . . . 3.Copied all the required j

Re: Help needed to understand zookeeper in solrcloud

2015-03-04 Thread Aman Tandon
Thanks svante. What if in the cluster of 5 zookeeper only 1 zookeeper goes down, will zookeeper election can occur with 4 / even number of zookeepers alive? With Regards Aman Tandon On Tue, Mar 3, 2015 at 6:35 PM, svante karlsson wrote: > synchronous update of state and a requirement of more t

Re: Help needed to understand zookeeper in solrcloud

2015-03-04 Thread svante karlsson
Yes, as long as it is three (the majority of 5) or more. This is why there is no point of having a 4 node cluster. This would also require 3 nodes for majority thus giving it the fault tolerance of a 3 node cluster but slower and more expensive. 2015-03-05 7:41 GMT+01:00 Aman Tandon : > Thanks

Re: Log numfound, qtime, ...

2015-03-04 Thread bengates
Hello everyone, I'll check this ASAP. Thanks for all your answers ! Ben -- View this message in context: http://lucene.472066.n3.nabble.com/Log-numfound-qtime-tp4189561p4191129.html Sent from the Solr - User mailing list archive at Nabble.com.