Hi Shawn,
a big thanks for the long and detailed answer. I am aware of how linux
uses free RAM for caching and the the problems related to jvm and GC. It
is nice to hear how this correlates to Solr. I'll take some time and
think over it. The facet.method=enum and probably a combination of
Doc
On Thu, 2014-03-06 at 08:17 +0100, Chia-Chun Shih wrote:
>1. Raw data is 35,000 CSV files per day. Each file is about 5 MB.
>2. One collection serves one day. 200-day history data is required.
So once your data are indexed, they will not change? If seems to me that
1 shard/day is a fine ch
Hello Everyone,
Let me first introduce myself, I am Raman, I am a Masters of CS student. I
am doing a project for my studies which need the use of SOLR. For some
reasons I have to use SOLR 4.3.0 for the project.
I am facing an issue with page numbers in the search result.I came across a
workaroun
Hi Roman,
I did similar project, this is how :
1) index page by page. Solr document (unit of retrieval) will be pages. You can
generate an uniqueKey by concatenating docId and pageNo => doc50_page0 With
this you will have page no information.
2) Later on you can group by document_id with
htt
Any suggestions?
Zitat von m...@preselect-media.com:
Hello,
I'm using eDisMax to do scoring for my search results.
I have a nested structure of documents. The main (parent) document
with meta data and the child documents with fulltext content. So I
have to join them.
My qf looks like th
Some months ago, I talked to some people at LR about this, but I can't
find my notes.
Imagine a function of some fields that produces a score between 0 and 1.
Imagine that you want to combine this score with relevance over some
more or less complex ordinary query.
What are the options, given the
My bad, I think this error was actually a result of using the Solr Admin
utility to query the index and the query I entered included the double
quotes.
However, this left me with a different error that I may post a question
about if I cannot figure it out.
--
View this message in context:
htt
Hi Benson,
http://lucene.apache.org/core/4_7_0/expressions/org/apache/lucene/expressions/Expression.html
https://issues.apache.org/jira/browse/SOLR-5707
That?
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/
On Thu, Mar 6, 20
Toby Lazar wrote
> Unless Solr is your system of record, aren't you already replicating your
> source data across the WAN? If so, could you load Solr in colo B from
> your colo B data source? You may be duplicating some indexing work, but
> at least your colo B Solr would be more closely in sync
Erick,
That helps so I can focus on the problem areas. Thanks.
On 3/5/14, 6:03 PM, Erick Erickson wrote:
Here's the easiest thing to try to figure out where to
concentrate your energies. Just comment out the
server.add call in your SolrJ program. Well, and any
commits you're doing from Solr
Yeah. I have thought about spitting out JSON and run it against Solr
using parallel Http threads separately. Thanks.
On 3/5/14, 6:46 PM, Susheel Kumar wrote:
One more suggestion is to collect/prepare the data in CSV format (1-2 million sample
depending on size) and then import data direct into
Getting the following error when attempting to run a polygon query from the
Solr Admin utility: :"com.spatial4j.core.exception.InvalidShapeException:
incompatible dimension (2) and values (Intersects). Only 0 values
specified",
"code":400
My query is as follows:
q=geoloc:Intersects(POLYGON((-
On 3/6/2014 12:17 AM, Chia-Chun Shih wrote:
> I am planning a system for searching TB's of structured data in SolrCloud.
> I need suggestions for handling such huge amount of data in SolrCloud.
> (e.g., number of shards per collection, number of nodes, etc.)
>
> Here are some specs of the system:
hi,
heap problem is due to memory full.
you should remove unnecessary data and restart server once.
On Thursday, 6 March 2014 10:39 AM, Angel Tchorbadjiiski
wrote:
Hi Shawn,
a big thanks for the long and detailed answer. I am aware of how linux
uses free RAM for caching and the the problem
On 3/6/2014 7:54 AM, perdurabo wrote:
> Toby Lazar wrote
>> Unless Solr is your system of record, aren't you already replicating your
>> source data across the WAN? If so, could you load Solr in colo B from
>> your colo B data source? You may be duplicating some indexing work, but
>> at least you
Hi
When restarting a node in solrcloud, i run into scenarios where both the
replicas for a shard get into "recovering" state and never come up causing
the error "No servers hosting this shard". To fix this, I either unload one
core or restart one of the nodes again so that one of them becomes the
Hello Everyone,
I would like to take you guidance of following
I have a single core with 124 GB of index data size. Indexing and Reading
both are very slow as I have 7 GB RAM to support this huge data. Almost 8
million of documents.
Hence, we thought of going to SolrCloud so that we can accommo
Well, I think I finally figured out how to get SolrEntityProcessor to work,
but there are still some issues. I had to add a library path to
solrconfig.xml, but the cores are finally coming up and i am now manually
able to run a data import that does seem to index all of the documents on
the remote
I am working with date range query that is not giving me faster response
times. After modifying date range construct after reading several forums,
response time now is around 200ms, down from 2-3secs.
However, I was wondering if there still some way to improve upon it as
queries without date rang
Hi,
Since your range query has NOW in it, it won't be cached meaningfully.
http://solr.pl/en/2012/03/05/use-of-cachefalse-and-cost-parameters/
This is untested but can you try this?
&q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336
&fq=Status:Booked
&fq=ClientID:4
&fq={!cache=false cost=150}StartDa
Ahmet, I have tried filter queries before to fine tune query performance.
However, whenever we use filter queries the response time goes up and
remains there. With above change, the response time was consistently
around 4-5 secs. We are using the default cache settings.
Is there any settings I
Are you using an old version?
- Mark
http://about.me/markrmiller
On Mar 6, 2014, at 11:50 AM, KNitin wrote:
> Hi
>
> When restarting a node in solrcloud, i run into scenarios where both the
> replicas for a shard get into "recovering" state and never come up causing
> the error "No servers ho
Hi,
Did you try with non-cached filter quries before?
cached Filter queries are useful when they are re-used. How often do you commit?
I thought that we can do something if we disable cache filter queries and
manipulate their execution order with cost parameter.
What happens with this :
&q=User
That did the trick Ahmet. The first response was around 200ms, but the
subsequent queries were around 2-5ms.
I tried this
&q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336
&fq={!cache=false cost=100}Status:Booked
&fq={!cache=false cost=50}ClientID:4
&fq={!cache=false cost=50}[NOW/DAY TO NOW/DAY+1YE
Hi,
We've had a strange mishap with a solr cloud cluster (version 4.5.1) where
we observed high search latency. The problem appears to develop over
several hours until such point where the entire cluster stopped responding
properly.
After investigation we found that the number of threads (both so
Ok, I think the issue here is that I need to install the JTS library. I will
have that done and try again.
Lee
--
View this message in context:
http://lucene.472066.n3.nabble.com/Polygon-search-returning-InvalidShapeException-incompatible-dimension-2-error-tp4121704p4121796.html
Sent from the
What Erick said. That's a giant Filter Cache. Have a look at these Solr
metrics and note the Filter Cache in the middle:
http://www.flickr.com/photos/otis/8409088080/
Note how small the cache is and how high the hit rate is. Those are stats
for http://search-lucene.com/ and http://search-hadoop
Setting up Solr cloud(horizontal scaling) is definitely a good idea for this
big index but before going to Solr cloud, are you able to upgrade your single
node to 128GB of memory(vertical scaling) to see the difference.
Thanks,
Susheel
-Original Message-
From: Priti Solanki [mailto:pri
I am using 4.3.1.
On Thu, Mar 6, 2014 at 11:48 AM, Mark Miller wrote:
> Are you using an old version?
>
> - Mark
>
> http://about.me/markrmiller
>
> On Mar 6, 2014, at 11:50 AM, KNitin wrote:
>
> > Hi
> >
> > When restarting a node in solrcloud, i run into scenarios where both the
> > replicas
Andy,
I don't have a direct answer to your question but I have a question.
On 03/05/2014 07:21 AM, Andy Alexander wrote:
fq=ss_language:ja&q=製品
I am guessing you have a field called ss_language where a language code
of the document is stored,
and you have Solr documents of different language
It sounds like the distributed update deadlock issue.
It’s fixed in 4.6.1 and 4.7.
- Mark
http://about.me/markrmiller
On Mar 6, 2014, at 3:10 PM, Avishai Ish-Shalom wrote:
> Hi,
>
> We've had a strange mishap with a solr cloud cluster (version 4.5.1) where
> we observed high search latency.
: That did the trick Ahmet. The first response was around 200ms, but the
: subsequent queries were around 2-5ms.
Are you really sure you want "cache=false" on all of those filters?
While the "ClientID:4" query may by something that cahnges significantly
enough in every query to not be useful t
Hi,
We have 5 Solr servers in a Cloud with about 70 cores and 12GB
indexes in total (every core has 2 shards, so it's 6 GB per server).
After upgrade to Solr 4.7 the Solr servers are crashing constantly
(each server about one time per hour). We currently don't have any clue
about the reason.
Hoss,
Thanks for the correction. I missed the /DAY part and thought as it was
StartDate:[NOW TO NOW+1YEAR]
Ahmet
On Friday, March 7, 2014 12:33 AM, Chris Hostetter
wrote:
: That did the trick Ahmet. The first response was around 200ms, but the
: subsequent queries were around 2-5ms.
Are
On Mar 6, 2014, at 5:37 PM, Martin de Vries wrote:
> IndexSchema is using 62% of the memory but we don't know if that's a
> problem:
That seems odd. Can you see what objects are taking all the RAM in the
IndexSchema?
- Mark
http://about.me/markrmiller
On 3/6/2014 3:37 PM, Martin de Vries wrote:
> We have 5 Solr servers in a Cloud with about 70 cores and 12GB
> indexes in total (every core has 2 shards, so it's 6 GB per server).
>
> After upgrade to Solr 4.7 the Solr servers are crashing constantly
> (each server about one time per hour). We cur
Hello,
I have a question from a colleague who's managing a 3-node(VMs) SolrCloud
cluster with a separate 3-node Zookeeper ensemble. Periodically the data
center underneath the SolrCloud decides to upgrade the SolrCloud instance
infrastructure in a "rolling upgrade" fashion. So after the 1st i
"* New 'cursorMark' request param for efficient deep paging of sorted
result sets. See http://s.apache.org/cursorpagination";
At the end of the linked doco there is an example that doesn't make sense
to me, because it mentions "sort=timestamp asc" and is then followed by
pseudo code that sorts b
Would probably need to see some logs to say much. Need to understand why they
are inoperable.
What version is this?
- Mark
http://about.me/markrmiller
On Mar 6, 2014, at 6:15 PM, Nazik Huq wrote:
> Hello,
>
>
>
> I have a question from a colleague who's managing a 3-node(VMs) SolrCloud
>
thats what I do. precreate JSONs following the schema, saving that in
MongoDB, this is part of the ETL process. after that, just dump the JSONs
into Solr using batching etc. with this you can do full and incremental
indexing as well.
Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa
: At the end of the linked doco there is an example that doesn't make sense
: to me, because it mentions "sort=timestamp asc" and is then followed by
: pseudo code that sorts by id only. I understand that cursorMark requires
Ok ... 2 things contributing to the confusion.
1) the para that refers
Thank-you, that all sounds great. My assumption about documents being
missed was something like this:
A,B,C,D
where they are sorted by timestamp first and ID second. Say the first
'page' of results is 'A,B', and before the second page is requested both
documents B + C receive update events and th
My initial approach was to use filter cache static fields. However when
filter query is used, every query after the first has the same response
time as the first. For instance, when cache is enabled in the query under
review, response time shoots up to 4-5secs and stays there.
We are using defau
The version is 4.6. I am going to ask for the log files and post it.
-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Thursday, March 06, 2014 6:33 PM
To: solr-user
Subject: Re: SolrCloud recovery after nodes are rebooted in rapid succession
Would probably need t
I'm using the dataimporthandler to index data from a mysql DB. Been
running it just fine. I've been using full-imports. I'm now trying
implement the delta import functionality.
To implement the delta query, you need to be reading the last_index_time
from a properties file to know what new to inde
Thanks Susheel,
But this index will keep on growing that my worry So I always have to
increase the RAM .
Can you suggest how many nodes one can think to support this bug index?
Regards,
On Fri, Mar 7, 2014 at 2:50 AM, Susheel Kumar <
susheel.ku...@thedigitalgroup.net> wrote:
> Setting up Sol
I am reading apache solr reference guide and it has lines as below
". Solr caches are associated with a specific instance of an Index
Searcher, a specific view of an index that doesn't change during the
lifetime of that
searcher. As long as that Index Searcher is being used, any items in its
c
That's under the covers implementation. Unless you are doing
extensions, you probably don't need to worry.
Where it connects to the userland is - for example - the commits.
Until you commit, your records are not visible. Even though Solr
already has them. This is because the 'index searcher' does
On 7 March 2014 08:50, Pritesh Patel wrote:
> I'm using the dataimporthandler to index data from a mysql DB. Been
> running it just fine. I've been using full-imports. I'm now trying
> implement the delta import functionality.
>
> To implement the delta query, you need to be reading the last_inde
Hi
I am installing SolrCloud with 3 External
Zookeeper(localhost:2181,localhost:2182,localhost:2183) and 2
Tomcats(localhost:8181,localhost:8182) all available on a single
Machine(Just for getting started).
By Following these links
http://myjeeva.com/solrcloud-cluster-single-collection-deployment
50 matches
Mail list logo