On 11/4/2012 11:41 PM, deniz wrote:
Michael Della Bitta-2 wrote
No, RAMDirectory doesn't work for replication. Use MMapDirectory... it
ends up storing the index in RAM and more efficiently so, plus it's
backed by disk.
Just be sure to not set a big heap because MMapDirectory works outside of
he
Hello all,
I have a situation for solr grouping where I want group my products into
top categories for a ecommerce application. The number of groups here is
less than 10 and total number of docs in the index is 10 Million. Will solr
goruping is an issue here, we have seen OOM issue when we tried
Michael Della Bitta-2 wrote
> No, RAMDirectory doesn't work for replication. Use MMapDirectory... it
> ends up storing the index in RAM and more efficiently so, plus it's
> backed by disk.
>
> Just be sure to not set a big heap because MMapDirectory works outside of
> heap.
for my tests, i dont t
I would use the Unix "split" command. You can give it a line count.
% split -l 1400 myfile.csv
You can use "wc -l" to count the lines.
wunder
On Nov 4, 2012, at 10:23 PM, Gora Mohanty wrote:
> On 5 November 2012 11:11, mitra wrote:
>
>> Hello all
>>
>> i have a csv file of size 10 gb wh
On 5 November 2012 11:11, mitra wrote:
> Hello all
>
> i have a csv file of size 10 gb which i have to index using solr
>
> my question is how to index the csv in such a way so that
> i can get two separate index files of which one of the index is the index
> for the first half of the csv and the
Hello all
i have a csv file of size 10 gb which i have to index using solr
my question is how to index the csv in such a way so that
i can get two separate index files of which one of the index is the index
for the first half of the csv and the second index is the index for the
second half of th
Thanks Eric for the explanation. It helps me a lot :).
--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrCloud-AutoSharding-In-enterprise-environment-tp4017036p4018194.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hello all
i have a csv file of size 10 gb which i have to index using solr
my question is how to index the csv in such a way so that
i can get two separate index files of which one of the index is the index
for the first half of the csv and the second index is the index for the
second half of th
Depends what you really need. Index aliases are very handy for having a
sliding last N days type search. Solr doesn't have that yetbut it may
be in jira.
Otis
--
Performance Monitoring - http://sematext.com/spm
On Nov 4, 2012 11:34 PM, "Nathan Findley" wrote:
> Otis,
>
> I believe I found th
Otis,
I believe I found the thread which contains a link about elasticsearch
and big data.
http://www.elasticsearch.org/videos/2012/06/05/big-data-search-and-analytics.html
We are dealing with data that is searched using time ranges. Does the
"time" data flow concept work in SOLR? Does it
Hi All,
I was testing my solr on MMapDirectory, and while indexing, I get this error
lines in the log:
10:27:41.003 [commitScheduler-4-thread-1] ERROR
org.apache.solr.update.CommitTracker - auto commit
error...:org.apache.solr.common.SolrException: Error opening new searcher
at org.apac
Thanks Everyone.
As Shawn mentioned, it was a memory issue. I reduced the amount allocated to
Java to 6 GB. And its been working pretty good.
I am re-indexing one of the SolrCloud. I was having trouble with optimizing
the data when I indexed last time
I am hoping optimizing will not be an iss
Yes. I can guarantee that a force merge will not "massively help". It might not
even measurably help.
wunder
On Nov 4, 2012, at 1:05 PM, Otis Gospodnetic wrote:
> Measure / monitor first :)
> You may not need to optimize at all, especially if your index is always
> being modified.
>
> Otis
> -
Measure / monitor first :)
You may not need to optimize at all, especially if your index is always
being modified.
Otis
--
Performance Monitoring - http://sematext.com/spm
On Nov 4, 2012 3:03 PM, "tictacs" wrote:
> Thanks for the reply both and apologies if this is a recurring question.
> From t
Or, don't "optimize" (force merge) at all. Really. This is a manual override
for an automatic process, merging.
I can only think of one case where a forced merge makes sense:
1. All documents are reindexed.
2. Traditional Solr replication is used (not SolrCloud).
3. Replication is manually timed
Thanks for the reply both and apologies if this is a recurring question.
>From the sounds of it I am sure an optimize overnight when app traffic is
low will suffice. This will massively help with server perfomance I am
sure.
--
View this message in context:
http://lucene.472066.n3.nabble.com/
Correct. There was a good thread on this topic on the ElasticSearch ML.
Search for "oversharding" and my name. Same ideas apply to SolrCloud.
Neither server offer automatic rebalancing yet, though ES lets you move
shards around on demand.
Otis
--
Performance Monitoring - http://sematext.com/spm
O
On Sat, Nov 3, 2012 at 4:23 AM, Lance Norskog wrote:
> If any value is in a bogus format, the entire document batch in that HTTP
> request fails. That is the right timestamp format.
> The index may be corrupted somehow. Can you try removing all of the fields in
> data/ and trying again?
>
Thank
On Fri, Nov 2, 2012 at 4:32 PM, Erick Erickson wrote:
> Well, I'm at my wits end. I tried your field definitions (using the
> exampledocs XML) and they work just fine. As far as if you mess up the date
> on the way in, you should be seeing stack traces in your log files.
>
Please don't go to wit'
Otis,
Thanks for that makes sense. I have one more question: at this point
the only way for future expansion of shard count is by having more than
one shard per machine and then, when things grow, moving each shard to
its own dedicated machine? That is how I understand it from the wiki.
So
Hi Guru,
here my blogpost about this:
http://www.sentric.ch/blog/setting-up-solr-4-0-beta-with-tomcat-and-zookeeper
It´s pretty simple, just follow the mentioned steps.
Best regards
Vadim
2012/9/5 bsargurunathan :
> Hi Markus,
>
> Can you please tell me the exact file name in the tomcat folder?
Hi,
By we I meant Sematext, not LW.
You'll have to ask LW about open-sourcing their implementation.
Otis
--
Performance Monitoring - http://sematext.com/spm
On Nov 3, 2012 10:43 PM, "SR" wrote:
> Thanks Otis.
> By "we" you mean "Lucid works"?
>
> Is there a chance to get it sometime soon in th
Steve,
It seems to me your task has a lot in common with mines. I tell about
several approaches at next week
http://www.apachecon.eu/schedule/presentation/18/ .
Thanks
On Sun, Nov 4, 2012 at 6:43 AM, SR wrote:
> Thanks Otis.
> By "we" you mean "Lucid works"?
>
> Is there a chance to get it s
23 matches
Mail list logo