date:20140306

Re: SOLR OutOfMemoryError Java heap space

2014-03-06 Thread Angel Tchorbadjiiski

Hi Shawn, a big thanks for the long and detailed answer. I am aware of how linux uses free RAM for caching and the the problems related to jvm and GC. It is nice to hear how this correlates to Solr. I'll take some time and think over it. The facet.method=enum and probably a combination of Doc

Re: need suggestions for storing TBs of strutucred data in SolrCloud

2014-03-06 Thread Toke Eskildsen

On Thu, 2014-03-06 at 08:17 +0100, Chia-Chun Shih wrote: >1. Raw data is 35,000 CSV files per day. Each file is about 5 MB. >2. One collection serves one day. 200-day history data is required. So once your data are indexed, they will not change? If seems to me that 1 shard/day is a fine ch

Need help regarding SOLR Fulltext search

2014-03-06 Thread Raman Jhajj

Hello Everyone, Let me first introduce myself, I am Raman, I am a Masters of CS student. I am doing a project for my studies which need the use of SOLR. For some reasons I have to use SOLR 4.3.0 for the project. I am facing an issue with page numbers in the search result.I came across a workaroun

Re: Need help regarding SOLR Fulltext search

2014-03-06 Thread Ahmet Arslan

Hi Roman, I did similar project, this is how : 1) index page by page. Solr document (unit of retrieval) will be pages. You can generate an uniqueKey by concatenating docId and pageNo => doc50_page0 With this you will have page no information. 2) Later on you can group by document_id with htt

Re: Min Number Should Match (mm) and joins

2014-03-06 Thread mm

Any suggestions? Zitat von m...@preselect-media.com: Hello, I'm using eDisMax to do scoring for my search results. I have a nested structure of documents. The main (parent) document with meta data and the child documents with fulltext content. So I have to join them. My qf looks like th

Mixing lucene scoring and other scoring

2014-03-06 Thread Benson Margulies

Some months ago, I talked to some people at LR about this, but I can't find my notes. Imagine a function of some fields that produces a score between 0 and 1. Imagine that you want to combine this score with relevance over some more or less complex ordinary query. What are the options, given the

Re: Polygon search returning "Invalid Number" error.

2014-03-06 Thread leevduhl

My bad, I think this error was actually a result of using the Solr Admin utility to query the index and the query I entered included the double quotes. However, this left me with a different error that I may post a question about if I cannot figure it out. -- View this message in context: htt

Re: Mixing lucene scoring and other scoring

2014-03-06 Thread Otis Gospodnetic

Hi Benson, http://lucene.apache.org/core/4_7_0/expressions/org/apache/lucene/expressions/Expression.html https://issues.apache.org/jira/browse/SOLR-5707 That? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Thu, Mar 6, 20

Re: Replicating Between Solr Clouds

2014-03-06 Thread perdurabo

Toby Lazar wrote > Unless Solr is your system of record, aren't you already replicating your > source data across the WAN? If so, could you load Solr in colo B from > your colo B data source? You may be duplicating some indexing work, but > at least your colo B Solr would be more closely in sync

Re: Indexing huge data

2014-03-06 Thread Rallavagu

Erick, That helps so I can focus on the problem areas. Thanks. On 3/5/14, 6:03 PM, Erick Erickson wrote: Here's the easiest thing to try to figure out where to concentrate your energies. Just comment out the server.add call in your SolrJ program. Well, and any commits you're doing from Solr

Re: Indexing huge data

2014-03-06 Thread Rallavagu

Yeah. I have thought about spitting out JSON and run it against Solr using parallel Http threads separately. Thanks. On 3/5/14, 6:46 PM, Susheel Kumar wrote: One more suggestion is to collect/prepare the data in CSV format (1-2 million sample depending on size) and then import data direct into

Polygon search returning "InvalidShapeException: incompatible dimension (2)... error.

2014-03-06 Thread leevduhl

Getting the following error when attempting to run a polygon query from the Solr Admin utility: :"com.spatial4j.core.exception.InvalidShapeException: incompatible dimension (2) and values (Intersects). Only 0 values specified", "code":400 My query is as follows: q=geoloc:Intersects(POLYGON((-

Re: need suggestions for storing TBs of strutucred data in SolrCloud

2014-03-06 Thread Shawn Heisey

On 3/6/2014 12:17 AM, Chia-Chun Shih wrote: > I am planning a system for searching TB's of structured data in SolrCloud. > I need suggestions for handling such huge amount of data in SolrCloud. > (e.g., number of shards per collection, number of nodes, etc.) > > Here are some specs of the system:

Re: SOLR OutOfMemoryError Java heap space

2014-03-06 Thread Divyang Shah

hi, heap problem is due to memory full. you should remove unnecessary data and restart server once. On Thursday, 6 March 2014 10:39 AM, Angel Tchorbadjiiski wrote: Hi Shawn, a big thanks for the long and detailed answer. I am aware of how linux uses free RAM for caching and the the problem

Re: Replicating Between Solr Clouds

2014-03-06 Thread Shawn Heisey

On 3/6/2014 7:54 AM, perdurabo wrote: > Toby Lazar wrote >> Unless Solr is your system of record, aren't you already replicating your >> source data across the WAN? If so, could you load Solr in colo B from >> your colo B data source? You may be duplicating some indexing work, but >> at least you

Race condition in Leader Election

2014-03-06 Thread KNitin

Hi When restarting a node in solrcloud, i run into scenarios where both the replicas for a shard get into "recovering" state and never come up causing the error "No servers hosting this shard". To fix this, I either unload one core or restart one of the nodes again so that one of them becomes the

SolrCloud setup guidance

2014-03-06 Thread Priti Solanki

Hello Everyone, I would like to take you guidance of following I have a single core with 124 GB of index data size. Indexing and Reading both are very slow as I have 7 GB RAM to support this huge data. Almost 8 million of documents. Hence, we thought of going to SolrCloud so that we can accommo

Re: Replicating Between Solr Clouds

2014-03-06 Thread perdurabo

Well, I think I finally figured out how to get SolrEntityProcessor to work, but there are still some issues. I had to add a library path to solrconfig.xml, but the cores are finally coming up and i am now manually able to run a data import that does seem to index all of the documents on the remote

Date Range Query taking more time.

2014-03-06 Thread Vijay Kokatnur

I am working with date range query that is not giving me faster response times. After modifying date range construct after reading several forums, response time now is around 200ms, down from 2-3secs. However, I was wondering if there still some way to improve upon it as queries without date rang

Re: Date Range Query taking more time.

2014-03-06 Thread Ahmet Arslan

Hi, Since your range query has NOW in it, it won't be cached meaningfully. http://solr.pl/en/2012/03/05/use-of-cachefalse-and-cost-parameters/ This is untested but can you try this? &q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 &fq=Status:Booked &fq=ClientID:4 &fq={!cache=false cost=150}StartDa

Re: Date Range Query taking more time.

2014-03-06 Thread Vijay Kokatnur

Ahmet, I have tried filter queries before to fine tune query performance. However, whenever we use filter queries the response time goes up and remains there. With above change, the response time was consistently around 4-5 secs. We are using the default cache settings. Is there any settings I

Re: Race condition in Leader Election

2014-03-06 Thread Mark Miller

Are you using an old version? - Mark http://about.me/markrmiller On Mar 6, 2014, at 11:50 AM, KNitin wrote: > Hi > > When restarting a node in solrcloud, i run into scenarios where both the > replicas for a shard get into "recovering" state and never come up causing > the error "No servers ho

Re: Date Range Query taking more time.

2014-03-06 Thread Ahmet Arslan

Hi, Did you try with non-cached filter quries before? cached Filter queries are useful when they are re-used. How often do you commit? I thought that we can do something if we disable cache filter queries and manipulate their execution order with cost parameter. What happens with this : &q=User

Re: Date Range Query taking more time.

2014-03-06 Thread Vijay Kokatnur

That did the trick Ahmet. The first response was around 200ms, but the subsequent queries were around 2-5ms. I tried this &q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 &fq={!cache=false cost=100}Status:Booked &fq={!cache=false cost=50}ClientID:4 &fq={!cache=false cost=50}[NOW/DAY TO NOW/DAY+1YE

hung threads and CLOSE_WAIT sockets

2014-03-06 Thread Avishai Ish-Shalom

Hi, We've had a strange mishap with a solr cloud cluster (version 4.5.1) where we observed high search latency. The problem appears to develop over several hours until such point where the entire cluster stopped responding properly. After investigation we found that the number of threads (both so

Re: Polygon search returning "InvalidShapeException: incompatible dimension (2)... error.

2014-03-06 Thread leevduhl

Ok, I think the issue here is that I need to install the JTS library. I will have that done and try again. Lee -- View this message in context: http://lucene.472066.n3.nabble.com/Polygon-search-returning-InvalidShapeException-incompatible-dimension-2-error-tp4121704p4121796.html Sent from the

Re: Solr Filter Cache Size

2014-03-06 Thread Otis Gospodnetic

What Erick said. That's a giant Filter Cache. Have a look at these Solr metrics and note the Filter Cache in the middle: http://www.flickr.com/photos/otis/8409088080/ Note how small the cache is and how high the hit rate is. Those are stats for http://search-lucene.com/ and http://search-hadoop

RE: SolrCloud setup guidance

2014-03-06 Thread Susheel Kumar

Setting up Solr cloud(horizontal scaling) is definitely a good idea for this big index but before going to Solr cloud, are you able to upgrade your single node to 128GB of memory(vertical scaling) to see the difference. Thanks, Susheel -Original Message- From: Priti Solanki [mailto:pri

Re: Race condition in Leader Election

2014-03-06 Thread KNitin

I am using 4.3.1. On Thu, Mar 6, 2014 at 11:48 AM, Mark Miller wrote: > Are you using an old version? > > - Mark > > http://about.me/markrmiller > > On Mar 6, 2014, at 11:50 AM, KNitin wrote: > > > Hi > > > > When restarting a node in solrcloud, i run into scenarios where both the > > replicas

Re: Apache Solr Configuration Problem (Japanese Language)

2014-03-06 Thread T. Kuro Kurosaka

Andy, I don't have a direct answer to your question but I have a question. On 03/05/2014 07:21 AM, Andy Alexander wrote: fq=ss_language:ja&q=製品 I am guessing you have a field called ss_language where a language code of the document is stored, and you have Solr documents of different language

Re: hung threads and CLOSE_WAIT sockets

2014-03-06 Thread Mark Miller

It sounds like the distributed update deadlock issue. It’s fixed in 4.6.1 and 4.7. - Mark http://about.me/markrmiller On Mar 6, 2014, at 3:10 PM, Avishai Ish-Shalom wrote: > Hi, > > We've had a strange mishap with a solr cloud cluster (version 4.5.1) where > we observed high search latency.

Re: Date Range Query taking more time.

2014-03-06 Thread Chris Hostetter

: That did the trick Ahmet. The first response was around 200ms, but the : subsequent queries were around 2-5ms. Are you really sure you want "cache=false" on all of those filters? While the "ClientID:4" query may by something that cahnges significantly enough in every query to not be useful t

SolrCloud constantly crashes after upgrading to Solr 4.7

2014-03-06 Thread Martin de Vries

Hi, We have 5 Solr servers in a Cloud with about 70 cores and 12GB indexes in total (every core has 2 shards, so it's 6 GB per server). After upgrade to Solr 4.7 the Solr servers are crashing constantly (each server about one time per hour). We currently don't have any clue about the reason.

Re: Date Range Query taking more time.

2014-03-06 Thread Ahmet Arslan

Hoss, Thanks for the correction. I missed the /DAY part and thought as it was StartDate:[NOW TO NOW+1YEAR] Ahmet On Friday, March 7, 2014 12:33 AM, Chris Hostetter wrote: : That did the trick Ahmet. The first response was around 200ms, but the : subsequent queries were around 2-5ms. Are

Re: SolrCloud constantly crashes after upgrading to Solr 4.7

2014-03-06 Thread Mark Miller

On Mar 6, 2014, at 5:37 PM, Martin de Vries wrote: > IndexSchema is using 62% of the memory but we don't know if that's a > problem: That seems odd. Can you see what objects are taking all the RAM in the IndexSchema? - Mark http://about.me/markrmiller

Re: SolrCloud constantly crashes after upgrading to Solr 4.7

2014-03-06 Thread Shawn Heisey

On 3/6/2014 3:37 PM, Martin de Vries wrote: > We have 5 Solr servers in a Cloud with about 70 cores and 12GB > indexes in total (every core has 2 shards, so it's 6 GB per server). > > After upgrade to Solr 4.7 the Solr servers are crashing constantly > (each server about one time per hour). We cur

SolrCloud recovery after nodes are rebooted in rapid succession

2014-03-06 Thread Nazik Huq

Hello, I have a question from a colleague who's managing a 3-node(VMs) SolrCloud cluster with a separate 3-node Zookeeper ensemble. Periodically the data center underneath the SolrCloud decides to upgrade the SolrCloud instance infrastructure in a "rolling upgrade" fashion. So after the 1st i

Re:Solr 4.7.0 - cursorMark question

2014-03-06 Thread Greg Pendlebury

"* New 'cursorMark' request param for efficient deep paging of sorted result sets. See http://s.apache.org/cursorpagination"; At the end of the linked doco there is an example that doesn't make sense to me, because it mentions "sort=timestamp asc" and is then followed by pseudo code that sorts b

Re: SolrCloud recovery after nodes are rebooted in rapid succession

2014-03-06 Thread Mark Miller

Would probably need to see some logs to say much. Need to understand why they are inoperable. What version is this? - Mark http://about.me/markrmiller On Mar 6, 2014, at 6:15 PM, Nazik Huq wrote: > Hello, > > > > I have a question from a colleague who's managing a 3-node(VMs) SolrCloud >

Re: Indexing huge data

2014-03-06 Thread Kranti Parisa

thats what I do. precreate JSONs following the schema, saving that in MongoDB, this is part of the ETL process. after that, just dump the JSONs into Solr using batching etc. with this you can do full and incremental indexing as well. Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa

Re:Solr 4.7.0 - cursorMark question

2014-03-06 Thread Chris Hostetter

: At the end of the linked doco there is an example that doesn't make sense : to me, because it mentions "sort=timestamp asc" and is then followed by : pseudo code that sorts by id only. I understand that cursorMark requires Ok ... 2 things contributing to the confusion. 1) the para that refers

Re: Solr 4.7.0 - cursorMark question

2014-03-06 Thread Greg Pendlebury

Thank-you, that all sounds great. My assumption about documents being missed was something like this: A,B,C,D where they are sorted by timestamp first and ID second. Say the first 'page' of results is 'A,B', and before the second page is requested both documents B + C receive update events and th

Re: Date Range Query taking more time.

2014-03-06 Thread Vijay Kokatnur

My initial approach was to use filter cache static fields. However when filter query is used, every query after the first has the same response time as the first. For instance, when cache is enabled in the query under review, response time shoots up to 4-5secs and stays there. We are using defau

RE: SolrCloud recovery after nodes are rebooted in rapid succession

2014-03-06 Thread Nazik Huq

The version is 4.6. I am going to ask for the log files and post it. -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Thursday, March 06, 2014 6:33 PM To: solr-user Subject: Re: SolrCloud recovery after nodes are rebooted in rapid succession Would probably need t

Dataimport handler Date

2014-03-06 Thread Pritesh Patel

I'm using the dataimporthandler to index data from a mysql DB. Been running it just fine. I've been using full-imports. I'm now trying implement the delta import functionality. To implement the delta query, you need to be reading the last_index_time from a properties file to know what new to inde

Re: SolrCloud setup guidance

2014-03-06 Thread Priti Solanki

Thanks Susheel, But this index will keep on growing that my worry So I always have to increase the RAM . Can you suggest how many nodes one can think to support this bug index? Regards, On Fri, Mar 7, 2014 at 2:50 AM, Susheel Kumar < susheel.ku...@thedigitalgroup.net> wrote: > Setting up Sol

What is mean by Index Searcher?

2014-03-06 Thread search engn dev

I am reading apache solr reference guide and it has lines as below ". Solr caches are associated with a specific instance of an Index Searcher, a specific view of an index that doesn't change during the lifetime of that searcher. As long as that Index Searcher is being used, any items in its c

Re: What is mean by Index Searcher?

2014-03-06 Thread Alexandre Rafalovitch

That's under the covers implementation. Unless you are doing extensions, you probably don't need to worry. Where it connects to the userland is - for example - the commits. Until you commit, your records are not visible. Even though Solr already has them. This is because the 'index searcher' does

Re: Dataimport handler Date

2014-03-06 Thread Gora Mohanty

On 7 March 2014 08:50, Pritesh Patel wrote: > I'm using the dataimporthandler to index data from a mysql DB. Been > running it just fine. I've been using full-imports. I'm now trying > implement the delta import functionality. > > To implement the delta query, you need to be reading the last_inde

SolrCloud with Tomcat

2014-03-06 Thread Vineet Mishra

Hi I am installing SolrCloud with 3 External Zookeeper(localhost:2181,localhost:2182,localhost:2183) and 2 Tomcats(localhost:8181,localhost:8182) all available on a single Machine(Just for getting started). By Following these links http://myjeeva.com/solrcloud-cluster-single-collection-deployment

50 matches

Mail list logo