systems across many machines. HDFS
> makes this easy.
>
> -Joe
>
> On 8/2/2019 9:10 AM, lstusr 5u93n4 wrote:
> > Hi Joe,
> >
> > We fought with Solr on HDFS for quite some time, and faced similar issues
> > as you're seeing. (See this thread, for exampl
ove only having one large file system to manage
instead of lots of individual file systems across many machines. HDFS
makes this easy.
-Joe
On 8/2/2019 9:10 AM, lstusr 5u93n4 wrote:
Hi Joe,
We fought with Solr on HDFS for quite some time, and faced similar issues
as you're seeing. (
Hi Joe,
We fought with Solr on HDFS for quite some time, and faced similar issues
as you're seeing. (See this thread, for example:"
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201812.mbox/%3cCABd9LjTeacXpy3FFjFBkzMq6vhgu7Ptyh96+w-KC2p=-rqk...@mail.gmail.com%3e
)
The
irectories, recreate and index the affected collection, while you
work your other isues.
On Aug 1, 2019, at 16:40, Joe Obernberger wrote:
Been using Solr on HDFS for a while now, and I'm seeing an issue with
redundancy/reliability. If a server goes down, when it comes back up, it will
collection, while you
work your other isues.
On Aug 1, 2019, at 16:40, Joe Obernberger wrote:
Been using Solr on HDFS for a while now, and I'm seeing an issue with
redundancy/reliability. If a server goes down, when it comes back up, it will
never recover because of the lock files in HDFS. That
Been using Solr on HDFS for a while now, and I'm seeing an issue with
redundancy/reliability. If a server goes down, when it comes back up,
it will never recover because of the lock files in HDFS. That solr node
needs to be brought down manually, the lock files deleted, and then
brought
@Shawn Heisey Yeah, delete "write.lock" files manually is ok finally。
@Walter Underwood Have some performace evaluation about Solr on HDFS vs
LocalFS recently?
Shawn Heisey 于2018年8月28日周二 上午4:10写道:
> On 8/26/2018 7:47 PM, zhenyuan wei wrote:
> > I found an exception w
On 8/26/2018 7:47 PM, zhenyuan wei wrote:
I found an exception when running Solr on HDFS。The detail is:
Running solr on HDFS,and update doc was running always,
then,kill -9 solr JVM or reboot linux os/shutdown linux os,then restart all.
If you use "kill -9" to stop a Solr ins
Sun, Aug 26, 2018 at 6:47 PM zhenyuan wei wrote:
>>>
>>> Hi all,
>>>I found an exception when running Solr on HDFS。The detail is:
>>> Running solr on HDFS,and update doc was running always,
>>> then,kill -9 solr JVM or reboot linux os/shutdown li
wrote:
> >
> > Hi all,
> > I found an exception when running Solr on HDFS。The detail is:
> > Running solr on HDFS,and update doc was running always,
> > then,kill -9 solr JVM or reboot linux os/shutdown linux os,then restart
> all.
> > The exception appe
Because HDFS doesn't follow the file semantics that Solr expects.
There's quite a bit of background here:
https://issues.apache.org/jira/browse/SOLR-8335
Best,
Erick
On Sun, Aug 26, 2018 at 6:47 PM zhenyuan wei wrote:
>
> Hi all,
> I found an exception when running Solr
Hi all,
I found an exception when running Solr on HDFS。The detail is:
Running solr on HDFS,and update doc was running always,
then,kill -9 solr JVM or reboot linux os/shutdown linux os,then restart all.
The exception appears like:
2018-08-26 22:23:12.529 ERROR
(coreContainerWorkExecutor-2
The only option should be to configure Solr to just have a replication
factor of 1 or HDFS to have no replication. I would go for the middle
and configure both to use a factor of 2. This way a single failure in
HDFS and Solr is not a problem. While in 1/3 or 3/1 option a single
server error wou
On 6/7/2018 6:41 AM, Greenhorn Techie wrote:
As HDFS has got its own replication mechanism, with a HDFS replication
factor of 3, and then SolrCloud replication factor of 3, does that mean
each document will probably have around 9 copies replicated underneath of
HDFS? If so, is there a way to conf
Hi,
As HDFS has got its own replication mechanism, with a HDFS replication
factor of 3, and then SolrCloud replication factor of 3, does that mean
each document will probably have around 9 copies replicated underneath of
HDFS? If so, is there a way to configure HDFS or Solr such that only three
co
al file system. But this seems to greatly
>>>> depend on how your setup looks like and what actions you perform. We now
>>>> had a patter with lots of small updates and commits and that seems to be
>>>> quite a bit slower. We are about to do performance testing
Marathon/Mesos. With HDFS the data is in a shared file system and
thus it is possible to move the replica to a different instance on a a
different host.
regards,
Hendrik
On 22.11.2017 14:59, Greenhorn Techie wrote:
Hi,
Good Afternoon!!
While the discussion around issues related to "So
reason we switched to HDFS was largely connected to us using Docker
>> and Marathon/Mesos. With HDFS the data is in a shared file system and
>> thus it is possible to move the replica to a different instance on a a
>> different host.
>>
>> regards,
>> Hendrik
>>
&
eplica to a different instance on a a
> different host.
>
> regards,
> Hendrik
>
> On 22.11.2017 14:59, Greenhorn Techie wrote:
> > Hi,
> >
> > Good Afternoon!!
> >
> > While the discussion around issues related to "Solr on HDFS" is live, I
.
regards,
Hendrik
On 22.11.2017 14:59, Greenhorn Techie wrote:
Hi,
Good Afternoon!!
While the discussion around issues related to "Solr on HDFS" is live, I
would like to understand if anyone has done any performance benchmarking
for both Solr indexing and search between HDFS vs local f
Hi,
Good Afternoon!!
While the discussion around issues related to "Solr on HDFS" is live, I
would like to understand if anyone has done any performance benchmarking
for both Solr indexing and search between HDFS vs local file system.
Also, from experience, what would the commu
I'm also not really an HDFS expert but I believe it is slightly different:
The HDFS data is replicated, lets say 3 times, between the HDFS data
nodes but for an HDFS client it looks like one directory and it is
hidden that the data is replicated. Every client should see the same
data. Just lik
bq: in the none HDFS case that sounds logical but in the HDFS case all
the index data is in the shared HDFS file system
That's not really the point, and it's not quite true. The Solr index
unique _per replica_. So replica1 points to an HDFS directory (that's
triply replicated to be sure). replica2
Hi Erick,
in the none HDFS case that sounds logical but in the HDFS case all the
index data is in the shared HDFS file system. Even the transaction logs
should be in there. So the node that once had the replica should not
really have more information then any other node, especially if
legacyC
Hendrik:
bq: Not really sure why one replica needs to be up though.
I didn't write the code so I'm guessing a bit, but consider the
situation where you have no replicas for a shard up and add a new one.
Eventually it could become the leader but there would have been no
chance for it to check if i
Hi,
I had opened SOLR-10092
(https://issues.apache.org/jira/browse/SOLR-10092) for this a while ago.
I was now able to gt this feature working with a very small code change.
After a few seconds Solr reassigns the replica to a different Solr
instance as long as one replica is still up. Not rea
HDFS is like a shared filesystem so every Solr Cloud instance can access
the data using the same path or URL. The clusterstate.json looks like this:
"shards":{"shard1":{
"range":"8000-7fff",
"state":"active",
"replicas":{
"core_node1":{
"core
On 1/19/2017 4:09 AM, Hendrik Haddorp wrote:
> Given that the data is on HDFS it shouldn't matter if any active
> replica is left as the data does not need to get transferred from
> another instance but the new core will just take over the existing
> data. Thus a replication factor of 1 should also
Hi,
I'm seeing the same issue on Solr 6.3 using HDFS and a replication
factor of 3, even though I believe a replication factor of 1 should work
the same. When I stop a Solr instance this is detected and Solr actually
wants to create a replica on a different instance. The command for that
does
On 1/13/2017 5:46 PM, Chetas Joshi wrote:
> One of the things I have observed is: if I use the collection API to
> create a replica for that shard, it does not complain about the config
> which has been set to ReplicationFactor=1. If replication factor was
> the issue as suggested by Shawn, shouldn
; with 3X
> >> replication). This is the reason why I have kept replicationFactor=1.
> >>
> >> As per the link:
> >> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
> >> One benefit to running Solr in HDFS is the ability to automatic
ore) representing the replica goes down. The data in
>> on HDFS (distributed across all the datanodes of the hadoop cluster with 3X
>> replication). This is the reason why I have kept replicationFactor=1.
>>
>> As per the link:
>> https://cwiki.apache.org/confluenc
g/confluence/display/solr/Running+Solr+on+HDFS
> One benefit to running Solr in HDFS is the ability to automatically add new
> replicas when the Overseer notices that a shard has gone down. Because the
> "gone" index shards are stored in HDFS, a new core will be created and the
&
replica goes down. The data in
on HDFS (distributed across all the datanodes of the hadoop cluster with 3X
replication). This is the reason why I have kept replicationFactor=1.
As per the link:
https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
One benefit to running Solr in HDFS is
overseer does not seem to be creating an extra core even though
> autoAddReplica=true (
> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS).
>
> Is this happening because the overseer sees the shard as active as
> suggested by the cluster status?
> If yes, i
ce/display/solr/Running+Solr+on+HDFS).
Is this happening because the overseer sees the shard as active as
suggested by the cluster status?
If yes, is "autoAddReplica" not reliable? should I add a replica for this
shard when such cases arise?
Thanks!
I took another look at the stack trace and I'm pretty sure the issue is
with NULL values in one of the sort fields. The null pointer is occurring
during the comparison of sort values. See line 85 of:
https://github.com/apache/lucene-solr/blob/branch_5_5/solr/solrj/src/java/org/apache/solr/client/so
Hi Joel,
I don't have any solr documents that have NULL values for the sort fields I
use in my queries.
Thanks!
On Sun, Dec 18, 2016 at 12:56 PM, Joel Bernstein wrote:
> Ok, based on the stack trace I suspect one of your sort fields has NULL
> values, which in the 5x branch could produce null
Ok, based on the stack trace I suspect one of your sort fields has NULL
values, which in the 5x branch could produce null pointers if a segment had
no values for a sort field. This is also fixed in the Solr 6x branch.
Joel Bernstein
http://joelsolr.blogspot.com/
On Sat, Dec 17, 2016 at 2:44 PM, C
Here is the stack trace.
java.lang.NullPointerException
at
org.apache.solr.client.solrj.io.comp.FieldComparator$2.compare(FieldComparator.java:85)
at
org.apache.solr.client.solrj.io.comp.FieldComparator.compare(FieldComparator.java:92)
at
org.apache.solr.client.solrj.io.
If you could provide the json parse exception stack trace, it might help to
predict issue there.
On Fri, Dec 16, 2016 at 5:52 PM, Chetas Joshi
wrote:
> Hi Joel,
>
> The only NON alpha-numeric characters I have in my data are '+' and '/'. I
> don't have any backslashes.
>
> If the special charac
Hi Joel,
The only NON alpha-numeric characters I have in my data are '+' and '/'. I
don't have any backslashes.
If the special characters was the issue, I should get the JSON parsing
exceptions every time irrespective of the index size and irrespective of
the available memory on the machine. That
The Streaming API may have been throwing exceptions because the JSON
special characters were not escaped. This was fixed in Solr 6.0.
Joel Bernstein
http://joelsolr.blogspot.com/
On Fri, Dec 16, 2016 at 4:34 PM, Chetas Joshi
wrote:
> Hello,
>
> I am running Solr 5.5.0.
> It is a solrCloud
Hello,
I am running Solr 5.5.0.
It is a solrCloud of 50 nodes and I have the following config for all the
collections.
maxShardsperNode: 1
replicationFactor: 1
I was using Streaming API to get back results from Solr. It worked fine for
a while until the index data size reached beyond 40 GB per sh
On 12/16/2016 11:58 AM, Chetas Joshi wrote:
> How different the index data caching mechanism is for the Streaming
> API from the cursor approach?
Solr and Lucene do not handle that caching. Systems external to Solr
(like the OS, or HDFS) handle the caching. The cache effectiveness will
be a comb
Thank you everyone. I would add nodes to the SolrCloud and split the shards.
Shawn,
Thank you for explaining why putting index data on local file system could
be a better idea than using HDFS. I need to find out how HDFS caches the
index files in a resource constrained environment.
I would also
On 12/14/2016 11:58 AM, Chetas Joshi wrote:
> I am running Solr 5.5.0 on HDFS. It is a solrCloud of 50 nodes and I have
> the following config.
> maxShardsperNode: 1
> replicationFactor: 1
>
> I have been ingesting data into Solr for the last 3 months. With increase
> in data, I am observing increa
I think 70GB is too huge for a shard.
How much memory does the system is having?
Incase solr does not have sufficient memory to load the indexes, it will
use only the amount of memory defined in your Solr Caches.
Although you are on HFDS, solr performances will be really bad if it has do
disk IO a
I think the shard index size is huge and should be split.
On Wed, Dec 14, 2016 at 10:58 AM, Chetas Joshi
wrote:
> Hi everyone,
>
> I am running Solr 5.5.0 on HDFS. It is a solrCloud of 50 nodes and I have
> the following config.
> maxShardsperNode: 1
> replicationFactor: 1
>
> I have been ingest
Hi everyone,
I am running Solr 5.5.0 on HDFS. It is a solrCloud of 50 nodes and I have
the following config.
maxShardsperNode: 1
replicationFactor: 1
I have been ingesting data into Solr for the last 3 months. With increase
in data, I am observing increase in the query time. Currently the size of
The core_node name is largely irrelevant, you should have names more
descriptive in the state.json file like collection1_shard1_replica1.
You happen to see 19 because you have only one replica per shard,
Exactly how are you creating the replica? What version of Solr? If
you're using the "core admi
Is this happening because I have set replicationFactor=1?
So even if I manually add replica for the shard that's down, it will just
create a dataDir but would not copy any of the data into the dataDir?
On Tue, Sep 13, 2016 at 6:07 PM, Chetas Joshi
wrote:
> Hi,
>
> I just started experimenting wi
Hi,
I just started experimenting with solr cloud.
I have a solr cloud of 20 nodes. I have one collection with 18 shards
running on 18 different nodes with replication factor=1.
When one of my shards goes down, I create a replica using the Solr UI. On
HDFS I see a core getting added. But the data
tique (EEI)
32 avenue Pablo Picasso
92000 Nanterre
charles.val...@edf.fr
Tél. : + (0) 1 78 66 69 81
Un geste simple pour l'environnement, n'imprimez ce message que si vous en
avez l'utilité.
De :otis.gospodne...@gmail.com
A : solr-user@lucene.apache.org
Date : 06/
t; Hi Charles,
>
> See http://search-lucene.com/?q=solr+hdfs and
> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
>
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.c
Hi Charles,
See http://search-lucene.com/?q=solr+hdfs and
https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/
On Tue, Jan 6, 2015 at 11:0
I am considering using Solr to extend Hortonworks Data Platform
capabilities to search.
- I found tutorials to index documents into a Solr instance from HDFS, but
I guess this solution would require a Solr cluster distinct to the Hadoop
cluster. Is it possible to have a Solr integrated into the
Hi all,
I am new to Solr and hdfs, actually, I am trying to index text content
extracted from binary files like PDF, MS Office...etc which are stored on
hdfs (single node), till now I've running Solr on HDFS, and create the core
but I couldn't send the files to solr for indexing.
C
Hi Joseph,
I believe Nutch can index into Solr/SolrCloud just fine. Sounds like that
is the approach you should take.
Otis
--
Solr & ElasticSearch Support
http://sematext.com/
On Thu, Mar 7, 2013 at 12:10 AM, Joseph Lim wrote:
> Hi Amit,
>
> Currently I am designing a Learning Management
Hi Amit,
Currently I am designing a Learning Management System where it is based on
Hadoop and hbase . Right now I want to integrate nutch with solr in it as
part of crawler module, so that users will only be able to search relevant
documents from specific source. And since crawling and indexing t
Joseph,
Doing what Otis said will do literally what you want which is copying the
index to HDFS. It's no different than copying it to a different machine
which btw is what Solr's master/slave replication scheme does.
Alternatively, I think people are starting to setup new Solr instances with
SolrC
Hi Amit,
so you mean that if I just want to get redundancy for solr in hdfs, the
only best way to do it is to as per what Otis suggested using the following
command
hadoop fs -copyFromLocal URI
Ok let me try out solrcloud as I will need to make sure it works well with
nutch too..
Thanks for th
Why wouldn't SolrCloud help you here? You can setup shards and replicas etc
to have redundancy b/c HDFS isn't designed to serve real time queries as
far as I understand. If you are using HDFS as a backup mechanism to me
you'd be better served having multiple slaves tethered to a master (in a
non-cl
Hi Upayavira,
sure, let me explain. I am setting up Nutch and SOLR in hadoop environment.
Since I am using hdfs, in the event if there is any crashes to the
localhost(running solr), i will still have the shards of data being stored
in hdfs.
Thanks you so much =)
On Thu, Mar 7, 2013 at 1:19 AM, U
What are you actually trying to achieve? If you can share what you are
trying to achieve maybe folks can help you find the right way to do it.
Upayavira
On Wed, Mar 6, 2013, at 02:54 PM, Joseph Lim wrote:
> Hello Otis ,
>
> Is there any configuration where it will index into hdfs instead?
>
> I
Hello Otis ,
Is there any configuration where it will index into hdfs instead?
I tried crawlzilla and lily but I hope to update specific package such as
Hadoop only or nutch only when there are updates.
That's y would prefer to install separately .
Thanks so much. Looking forward for your repl
Hello Joseph,
You can certainly put them there, as in:
hadoop fs -copyFromLocal URI
But searching such an index will be slow.
See also: http://katta.sourceforge.net/
Otis
--
Solr & ElasticSearch Support
http://sematext.com/
On Wed, Mar 6, 2013 at 7:50 AM, Joseph Lim wrote:
> Hi,
> Woul
Hi,
Would like to know how can i put the indexed solr shards into hdfs?
Thanks..
Joseph
On Mar 6, 2013 7:28 PM, "Otis Gospodnetic"
wrote:
> Hi Joseph,
>
> What exactly are you looking to to?
> See http://incubator.apache.org/blur/
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.c
Hi Joseph,
What exactly are you looking to to?
See http://incubator.apache.org/blur/
Otis
--
Solr & ElasticSearch Support
http://sematext.com/
On Wed, Mar 6, 2013 at 2:39 AM, Joseph Lim wrote:
> Hi I am running hadoop distributed file system, how do I put my output of
> the solr dir into h
69 matches
Mail list logo