CollectionAdminResponse and CollectionAdminRequest.List

2016-02-05 Thread Davis, Daniel (NIH/NLM) [C]
So, this makes sense: CollectionAdminResponse response = new CollectionAdminResponse(); CollectionAdminRequest.Reload request = new CollectionAdminRequest.Reload(); request.setCollectionName(collectionName); response.setResponse(client.request(request)); But for listing the collec

Re: Match All terms in indexed field value

2016-02-05 Thread Senthil
Thank You Ahmet for your reply! I tried out the magic what you mentioned and it worked. It also gave me a direction on how function queries can be combined together to address few other use cases as well. -- View this message in context: http://lucene.472066.n3.nabble.com/Match-All-terms-in-ind

Re: indexing pdf binary stored in mongodb?

2016-02-05 Thread Jack Krupansky
See if they are stored in BSON format using GridFS. If so, you can simply use the mongofiles command to retrieve the PDF into a local file and index that in Solr either using Solr Cell or Tika. See: http://blog.mongodb.org/post/183689081/storing-large-objects-and-files-in-mongodb https://docs.mong

List of file types supported by ExtractingRequestHandler

2016-02-05 Thread Steven White
Hi everyone, Is there a publish list of Tika extractors and the file types supported that comes with Solr 5.2? For example, I noticed that the ASM JAR ( http://asm.ow2.org/) is not included with Solr. I can examine the JARs under /solr/contrib/extraction/lib/ and try to come up with the list, bu

indexing pdf binary stored in mongodb?

2016-02-05 Thread Arnett, Gabriel
Anyone have any experience indexing pdfs stored in binary form in mongodb? . Gabe Arnett Senior Director Moody's Analytics - The information contained in this e-mail message, and any attachment thereto, is c

Re: How to assign a hash range via zookeeper?

2016-02-05 Thread Aki Balogh
Thank you Shawn. I will try this and will get back to you if I run into any issues. Aki On Thu, Feb 4, 2016 at 7:10 PM, Shawn Heisey wrote: > On 2/4/2016 2:12 PM, Aki Balogh wrote: > > I found the state.json file and it indeed shows that the range for shard1 > > is null. > > > > In order to fi

Re: Solr+HDFS

2016-02-05 Thread Erick Erickson
bq: I assume this would go along with also increasing autoCommit? Not necessarily, the two are have much different consequences if openSearcher is set to false for autoCommit. Essentially all this is doing is flushing the current segments to disk and opening new segments, no autowarming etc. is be

Re: Solr+HDFS

2016-02-05 Thread Joseph Obernberger
Thank you Shawn. Sounds like increasing the autoSoftCommit maxTime would be a good idea. I assume this would go along with also increasing autoCommit? All of our collections (just 2 at the moment) have the same settings. The data directory is in HDFS and is the same data directory for every shar

Re: hitratio vs cumulative_hitratio

2016-02-05 Thread Erick Erickson
hitratio is the ratio for the currently open searcher, it starts over at 0 whenever a hard commit with openSearcher=true or a soft commit happens. Commits open a new searcher and close the current searcher. Cumulative_hitratio is the cumulative stats for all searchers that have been opened and clo

Re: large number of fields

2016-02-05 Thread Jack Krupansky
This doesn't sound like a great use case for Solr - or any other search engine for that matter. I'm not sure what you are really trying to accomplish, but you are trying to put way too many balls in the air to juggle efficiently. You really need to re-conceptualize your problem so that it has far f

Re: large number of fields

2016-02-05 Thread Walter Underwood
I would add a multiValued field for buying_customers. Add the customer ID for each relevant customer to that field. Then use a boost query “bq”, to boost those. Try that first before using the hit rate. Always try on/off control before going proportional. The simple approach will probably give

Re: fq in SolrCloud

2016-02-05 Thread Shawn Heisey
On 2/5/2016 8:56 AM, Keith L wrote: > Caches are stored on the Java heap for each instance of a searcher. The > filter cache would be different per replica, same for the doc cache, and > query cache > > On Fri, Feb 5, 2016 at 8:47 AM Tom Evans wrote: > >> I have a small question about fq in cloud

Re: fq in SolrCloud

2016-02-05 Thread Keith L
Caches are stored on the Java heap for each instance of a searcher. The filter cache would be different per replica, same for the doc cache, and query cache On Fri, Feb 5, 2016 at 8:47 AM Tom Evans wrote: > I have a small question about fq in cloud mode that I couldn't find an > explanation for

Re: Solr+HDFS

2016-02-05 Thread Shawn Heisey
On 2/5/2016 8:11 AM, Joseph Obernberger wrote: > Thank you for the reply Scott - we have the commit settings as: > > 6 > false > > > 15000 > > > Is that 50% disk space rule across the entire HDFS cluster or on an > individual spindle? That autoSoftCommit maxTime is pret

Re: Solr+HDFS

2016-02-05 Thread Joseph Obernberger
I'm wondering if the shutdown time is too short. When we shutdown the cluster, could it be that it doesn't have enough time to flush? It only happens some of the time; as to which node is seems to be random. -Joe On Tue, Feb 2, 2016 at 12:49 PM, Erick Erickson wrote: > Does this happen all th

Re: Solr+HDFS

2016-02-05 Thread Joseph Obernberger
Thank you for the reply Scott - we have the commit settings as: 6 false 15000 Is that 50% disk space rule across the entire HDFS cluster or on an individual spindle? Thank you! -Joe On Tue, Feb 2, 2016 at 12:01 PM, Scott Stults < sstu...@opensourceconnections.com> wr

Re: How to assign a hash range via zookeeper?

2016-02-05 Thread Aki Balogh
Hi Erick, Unfortunately, I didn't set up the cluster. Am trying to maintaining after the fact. Best, Aki On Thu, Feb 4, 2016 at 5:05 PM, Erick Erickson wrote: > Hash ranges should have been assigned automatically when you > created the collection unless you created the collection with the imp

Re: Signals and scoring

2016-02-05 Thread John Blythe
That's why i provide my email :) already spoke w several people at fusion and was hoping to pick your brain in particular. Thanks anyway tho, the info was helpful! best, -- John Blythe On Feb 5, 2016, 8:44 AM -0500, Erik Hatcher, wrote: > John - best to not have non-Solr discussions on the lis

fq in SolrCloud

2016-02-05 Thread Tom Evans
I have a small question about fq in cloud mode that I couldn't find an explanation for in confluence. If I specify a query with an fq, where is that cached, is it just on the nodes/replicas that process that specific query, or will it exist on all replicas? We have a sub type of queries that speci

Re: Signals and scoring

2016-02-05 Thread Erik Hatcher
John - best to not have non-Solr discussions on the list, but feel free to reach out to us at Lucidworks to connect further about it. — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com > On Feb 5, 2016, at 8:34 AM, John Blythe wrote: > > hey er

Re: Signals and scoring

2016-02-05 Thread John Blythe
hey erick, thanks for this. if it's not against the newsletters policy, and is alright w you in general, i'd love to have a side discussion about LW/Fusion. j...@curvolabs.com best, -- *John Blythe* Product Manager & Lead Developer 251.605.3071 | j...@curvolabs.com www.curvolabs.com 58 Adams

large number of fields

2016-02-05 Thread Jan Verweij - Experts in search
Hi, We store 50K products stored in Solr. We have 10K customers and each customer buys up to 10K of these products. Now we want to influence the results by adding a field for every customer. So we end up with 10K fields to influence the results on the buying behavior of each customer (personal re

Re: Signals and scoring

2016-02-05 Thread Erik Hatcher
There’s several moving pieces to Fusion’s signals processing, so we’d certainly encourage you to try Fusion out yourself and not reinvent these wheels :) But here’s the general gist of how it works: - events/signals/clicks are logged (to a separate Solr collection) - periodically jobs run t

Re: Solr 5: not loading shards from symlinked directories

2016-02-05 Thread Norgorn
Thank you -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-5-not-loading-shards-from-symlinked-directories-tp4255403p4255431.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 5: not loading shards from symlinked directories

2016-02-05 Thread Alan Woodward
This is a known bug, see https://issues.apache.org/jira/browse/SOLR-8548. It will be fixed in 5.5, or in 5.4.2 if we do another bugfix release. Alan Woodward www.flax.co.uk On 5 Feb 2016, at 06:19, Norgorn wrote: > I've tried to upgrade from Solr 4.10.3 to 5.4.1. Solr shards are placed on > d

Signals and scoring

2016-02-05 Thread John Blythe
hi all, i'm trying to find more information online about how to implement something similar to the 'signals' feature found in Fusion. so far i've found one decent article that isn't discussing the Fusion feature specifically. does any of you happen to have some solid resources to point me in the d

Tesseract command-line OCR engine has stopped working

2016-02-05 Thread Zheng Lin Edwin Yeo
Hi, I am indexing EML files (emails) into Solr, and some of those emails has attachment. During the indexing, I encountered this "*Tesseract command-line OCR engine has stopped working*" message that come out from the server. However, I did not see any error with the indexing, and all the EML fil