Hi all - I'm testing 6.1.0 on a small two shard setup (two physical
machines) using HDFS for the index. I was indexing away when one of the
shards started throwing this error:
org.apache.solr.common.SolrException: Exception writing document id
COLLECT2587102526510 to the index; possible analysis
Thank you Marcus - they are indeed set to 1024 for the hdfs user. We'll
re-configure limits.conf and try again.
-Joe
On Tue, Jun 21, 2016 at 10:38 AM, Markus Jelsma
wrote:
> Hello Joseph,
>
> Your datanodes are in a bad state, you probably overwhelmed it when
> indexing. Check your max open fi
Anyone ever seen an error like this? We are running using HDFS for the
index. At the time of the error, we are doing a lot of indexing.
Two errors:
java.io.IOException: All datanodes DatanodeInfoWithStorage[
172.16.100.220:50010,DS-4b806395-0661-4a70-a32b-deef82a85359,DISK] are bad.
Aborting...
> > On Sun, Mar 6, 2016 at 7:42 AM, Susheel Kumar
> wrote:
> >
> >> Entity Recognition means you may want to recognize different entities
> >> name/person, email, location/city/state/country etc. in your
> >> tweets/messages with goal of providing better rele
016 at 4:19 AM, Charlie Hull wrote:
>
> > On 03/03/2016 19:25, Toke Eskildsen wrote:
> >
> >> Joseph Obernberger wrote:
> >>
> >>> Hi All - would it be reasonable to index the Twitter 'firehose'
> >>> with Solr Cloud - roughly 500-6
Hi All - would it be reasonable to index the Twitter 'firehose' with Solr
Cloud - roughly 500-600 million docs per day indexing each of the fields
(about 180)?
If I were to guess at a sharded setup to handle such data, and keep 2 years
worth, I would guess about 2500 shards. Is that reasonable?
Is
replicas.
-Joe
On Fri, Feb 5, 2016 at 10:43 AM, Shawn Heisey wrote:
> On 2/5/2016 8:11 AM, Joseph Obernberger wrote:
> > Thank you for the reply Scott - we have the commit settings as:
> >
> > 6
> > false
> >
> >
> > 15000
&g
erged.
> >
> >
> > k/r,
> > Scott
> >
> > On Fri, Jan 29, 2016 at 12:40 AM, Joseph Obernberger <
> > joseph.obernber...@gmail.com> wrote:
> >
> >> Hi All - we're using Apache Solr Cloud 5.2.1, with an HDFS system that
> is
&
com> wrote:
> It seems odd that the tlog files are so large. HDFS aside, is there a
> reason why you're not committing? Also, as far as disk space goes, if you
> dip below 50% free you run the risk that the index segments can't be
> merged.
>
>
> k/r,
> Scott
>
Hi All - we're using Apache Solr Cloud 5.2.1, with an HDFS system that is
86% full. Some of the datanodes in the HDFS cluster are more close to
being full than other nodes. We're getting messages about "Error adding
log" from the index process, which I **think** is related to datanodes
being full
Best,
Erick
On Thu, Aug 20, 2015 at 9:23 AM, Joseph Obernberger
wrote:
Hi - we currently have a multi-shard setup running solr cloud without
replication running on top of HDFS. Does it make sense to use replication
when using HDFS? Will we expect to see a performance increase in searches?
Thank you!
-Joe
Hi - we currently have a multi-shard setup running solr cloud without
replication running on top of HDFS. Does it make sense to use
replication when using HDFS? Will we expect to see a performance
increase in searches?
Thank you!
-Joe
, Shawn Heisey wrote:
On 7/23/2015 7:51 AM, Joseph Obernberger wrote:
Hi Upayavira - the URL was:
http://server1:9100/solr/MYCOL1/clustering?q=Collection:(COLLECT1008+OR+COLLECT2587)+AND+(amazon+AND+soap)&wt=json&indent=true&clustering=true&rows=1&df=FULL_DOCUMENT&debug
81,
maxDocs=16336337)\n0.3125 = fieldNorm(doc=209834)\n
0.5714286 = coord(4/7)\n"}}}
On 7/22/2015 3:36 PM, Upayavira wrote:
I'd be curious to see the parsed query that you get when adding
debugQuery=true to the URL. I bet that the clustering component is
extracting ter
erm2) AND Field2:(item1 OR item2)
-Joe
On 7/22/2015 3:21 PM, Joseph Obernberger wrote:
Hi - I'm using carrot2 inside of solr cloud and have noticed that
queries that involve parenthesis don't seem to work correctly. For
example if I have:
q=Field1:(term1 OR term2) AND Field2:(item1
Hi - I'm using carrot2 inside of solr cloud and have noticed that
queries that involve parenthesis don't seem to work correctly. For
example if I have:
q=Field1:(term1 OR term2) AND Field2:(item1 OR item2)
The clustering seems to ignore the values in parenthesis. If instead I do:
q=(Field1:ter
Hi All - I'm working with the heatmap PNGs generated from solr as
described here:
https://issues.apache.org/jira/browse/SOLR-7005
What would be the coordinate reference system that the generated PNG
uses? Is it possible to load these PNG files into a geospatial tool as
a raster layer like QGI
Hi - perhaps you do not have enough geospatial data in your index to
generate a larger image? Try setting the facet.heatpmap.gridLevel to
something higher like 4.
I've run queries like:
q=insert whatever
here&wt=json&indent=true&facet=true&facet.heatmap=geo&facet.heatmap.gridLevel=4&facet.heat
="solr.hdfs.blockcache.direct.memory.allocation">true
16384
true
false
true
64
512
hdfs://nameservice1:8020/solr5
/etc/hadoop/conf.cloudera.hdfs1
-Joe
On 6/5/2015 9:34 AM, Shawn Heisey wrote:
On 6/3/2015 6:39 PM, Joseph Obernberger wrote:
Hi All - I&
3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
Thanks for any thoughts!
-Joe
On 6/3/2015 8:39 PM, Joseph Obernberger wrote:
Hi All - I've run into a problem where every-once in a while one or
more of the shards (27 shard cluster) will loose connection to
zookeeper and repor
Hi All - I've run into a problem where every-once in a while one or more
of the shards (27 shard cluster) will loose connection to zookeeper and
report "updates are disabled". In additional to the CLUSTERSTATUS
timeout errors, which don't seem to cause any issue, this one certainly
does as tha
ts are merged the deleted documents will have
all their resources reclaimed, effectively deleting the field from the
old docs So you could gradually re-index your corpus and get this
stuff out of there.
Best,
Erick
On Sat, May 30, 2015 at 5:18 AM, Joseph Obernberger
wrote:
Thank you Erick.
tting an OOM is a mystery though. But delete field isn't
removing the contents if indexed documents. Showing us the full stack
when you hit an OOM would be helpful.
Best,
Erick
On Fri, May 29, 2015 at 4:58 PM, Joseph Obernberger
wrote:
Thank you Shawn - I'm referring to fields in the s
Thank you Shawn - I'm referring to fields in the schema. With Solr 5,
you can delete fields from the schema.
https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-DeleteaField
-Joe
On 5/29/2015 7:30 PM, Shawn Heisey wrote:
On 5/29/2015 5:08 PM, Joseph Obernberger wrote
Hi All - I have a lot of fields to delete, but noticed that once I
started deleting them, I quickly ran out of heap space. Is delete-field
a memory intensive operation? Should I delete one field, wait a while,
then delete the next?
Thank you!
-Joe
I'm also getting this error with 5.1.0 and a 27 shard setup.
null:org.apache.solr.common.SolrException: CLUSTERSTATUS the collection
time out:180s
at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:740)
at
org.apache.solr.handler.admin.Collection
Hi All - I've been working with geo tools to build a heat map based on
location data coming back from Solr Cloud using these nifty feature
where you can facet on location
(https://issues.apache.org/jira/browse/SOLR-7005) and generate a raster.
I've been able to take this data and build a GridCov
Hi - I'm very interested in the new heat map capability of Solr 5.1.0.
Has anyone looked at combining geotool's HeatmapProcess method with this
data? I'm trying this now, but I keep getting an empty image from the
GridCoverage2D object.
Any pointers/tips?
Thank you!
-Joe
Great news!
Any tips on how to do an upgrade from 5.0.0 to 5.1.0?
Thank you!
-Joe
On 4/14/2015 2:39 PM, Timothy Potter wrote:
I apologize - Yonik prepared these nice release notes for 5.1 and I
neglected to include them:
Solr 5.1 Release Highlights:
* The new Facet Module, including the JSO
)
at java.lang.Thread.run(Thread.java:745)
9:39:33.595 AMINFO org.apache.zookeeper.server.NIOServerCnxn
Closed socket connection for client /172.16.100.211:59968 which had
sessionid 0x44cabd42bcb4efb
-Joe
On 4/13/2015 11:40 AM, Joseph Obernberger wrote:
I'm ge
I'm getting the following error running a 27 shard setup (27 physical
machines) on Solr Cloud 5.0.0 that are part of an Hadoop cluster. HDFS
is used for the index.
null:org.apache.solr.common.SolrException: CLUSTERSTATUS the collection
time out:180s
at
org.apache.solr.handler.admin.Colle
using HDFS. I've seen it take well over a minute to stop.
I'm not sure if the index is going to be missing data, or if it will be
corrupt at this point.
-Joe
On 4/6/2015 1:35 PM, Joseph Obernberger wrote:
Having a couple issues with restarts of a 27 shard cluster using
SolrCloud
Having a couple issues with restarts of a 27 shard cluster using
SolrCloud 5.0.0 and HDFS. I'm getting errors that a lock file exists
and the shard will not start. When I delete the file, that shard starts OK.
On another shard, I'm getting the following messsage:
538220 [coreLoadExecutor-5-th
3/31/2015 3:13 PM, Joseph Obernberger wrote:
I've tried to replicate the issue starting from new, but so far it
hasn't happened again.
-Joe
On 3/28/2015 2:10 PM, Mark Miller wrote:
Hmm...can you file a JIRA issue with this info?
- Mark
On Fri, Mar 27, 2015 at 6:09 PM Joseph Ober
I've tried to replicate the issue starting from new, but so far it
hasn't happened again.
-Joe
On 3/28/2015 2:10 PM, Mark Miller wrote:
Hmm...can you file a JIRA issue with this info?
- Mark
On Fri, Mar 27, 2015 at 6:09 PM Joseph Obernberger
wrote:
I just started up a two sha
I just started up a two shard cluster on two machines using HDFS. When I
started to index documents, the log shows errors like this. They repeat
when I execute searches. All seems well - searches and indexing appear
to be working.
Possibly a configuration issue?
My HDFS config:
true
Hi All - does it make sense to run a solr shard on a node within an
Hadoop cluster that is not a data node? In that case all the data that
node processes would need to come over the network, but you get the
benefit of more CPU for things like faceting.
Thank you!
-Joe
Great! Thank you!
I had a 4 shard setup - no replicas. Index size was 2.0TBytes stored in
HDFS with each node having approximately 500G of index. I added four
more shards on four other machines as replicas. One thing that happened
was the 4 replicas all ran out of HDFS cache size
(SnapPul
HDFS, though, a single replica (just a
leader) per shard means that you don't have any redundancy if the
motherboard on that server dies even though HDFS has multiple copies
of the _data_.
Best,
Erick
On Wed, Feb 25, 2015 at 12:01 PM, Joseph Obernberger
wrote:
I am also confused on this.
I am also confused on this. Is adding replicas going to increase search
performance? I'm not sure I see the point of any replicas when using
HDFS. Is there one?
Thank you!
-Joe
On 2/25/2015 10:57 AM, Erick Erickson wrote:
bq: And the data sync between leader/replica is always a problem
No
I have a similar use-case. Check out the export capability and using
cursorMark.
-Joe
On 2/2/2015 8:14 AM, Matteo Grolla wrote:
Hi,
I'm thinking about having an instance of solr (SolrA) with all fields
stored and just id indexed in addition with a normal production instance of
solr
On 1/8/2015 3:16 AM, Toke Eskildsen wrote:
On Wed, 2015-01-07 at 22:26 +0100, Joseph Obernberger wrote:
Thank you Toke - yes - the data is indexed throughout the day. We are
handling very few searches - probably 50 a day; this is an R&D system.
If your searches are in small bundles,
redo our Solr Cloud, we will only run one
shard per box, and supply more HDFS cache.
-Joe
On 1/7/2015 3:50 PM, Toke Eskildsen wrote:
Joseph Obernberger [j...@lovehorsepower.com] wrote:
[HDFS, 9M docs, 2.9TB, 22 shards, 11 bare metal boxes]
A typical query takes about 7 seconds to run, but we al
Kinda late to the party on this very interesting thread, but I'm
wondering if anyone has been using SolrCloud with HDFS at large scales?
We really like this capability since our data is inside of Hadoop and we
can run the Solr shards on the same nodes, and we only need to manage
one pool of st
Shard splits can take a long time - the 900 seconds is just the REST
timeout. The split is still taking place.
On Tue, Dec 16, 2014 at 12:43 PM, Trilok Prithvi
wrote:
>
> Sorry... I sent without explaining the situation.
>
> We did splitshard:
>
> solr/admin/collections?action=SPLITSHARD&collect
Hi Koji - is it possible to execute word2vec on a subset of documents from
Solr? - ie could I run a query, get back the top n results and pass only
those to word2vec?
Will this work with Solr Cloud?
Thank you!
-Joe
On Thu, Nov 20, 2014 at 12:18 PM, Paul Libbrecht wrote:
> As far as I could t
100G shard, the index size goes up by 100G with the
two new shards. Is this correct for HDFS operation?
Thank you!
-Joe
On Mon, Nov 17, 2014 at 7:12 PM, Joseph Obernberger <
joseph.obernber...@gmail.com> wrote:
> Looks like the shard split failed, and only created one additional sha
e create the
> directory ahead of time I don't think.
>
> Best,
> Erick
>
> On Mon, Nov 17, 2014 at 12:17 PM, Joseph Obernberger
> wrote:
> > Originally I had two shards on two machines - shard1 and shard2.
> > I did a SHARDSPLIT on shard1.
> > Now have s
Originally I had two shards on two machines - shard1 and shard2.
I did a SHARDSPLIT on shard1.
Now have shard1, shard2, and shard1_0
If I select the core (COLLECT_shard1_0_replica1) and execute a query, I get
all the docs OK, but if I specific &distrib=false, I get 0 documents.
Under HDFS - when/h
If I create the directory manually on the server that I'm splitting:
COLLECT_shard1_0_replica1
Then do the shard split command, it works OK.
-Joe
I tried to split a shard using HDFS storage, and at first I received this
error:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error
CREATEing SolrCore 'COLLECT1_shard1_0_replica1': Unable to create core
[COLLECT1_shard1_0_replica1] Caused by: Direct buffer memory
at
I wanted to make a change to the solrconfig.xml file in my 4.10.2 solr
cloud cluster. I modified the files and put it in /tmp/conf - the only
file in that directory.
I then executed:
./zkcli.sh -cmd upconfig -zkhost list_of_hosts -d /tmp/conf -n ConfigName
These ran successfully, and I was able t
https://github.com/DmitryKey/luke/releases/tag/luke-4.10.1
>
>
> http://dmitrykan.blogspot.fi/2014/09/exporting-lucene-index-to-xml-with-luke.html
>
> It does not have the option to export select fields only, though.
>
> Dmitry
>
> On Thu, Oct 30, 2014 at 12:39 AM, Jo
Hi - I'm trying to use 4.10.1 with /export. I've defined a field as
follows:
I then call:
http://server:port/solr/COLLECT1/export?q=Collection:COLLECT2000&sort=DocumentId
desc&fl=DocumentId
The error I receive is:
java.io.IOException: DocumentId must have DocValues to use this feature.
at
org.a
new configs
> * Reindex data to the new collection.
> * Use collection aliasing to swap the old/new collections.
> (http://www.anshumgupta.net/2013/10/collection-aliasing-in-solrcloud.html)
>
> All this while, you wouldn't really need to shut down the Solr
> cluster/collection
g the field type might require you to reindex your
> data.
>
> There's an open JIRA for that one and I think someone would get to it
> sometime in the reasonably near future.
> JIRA: https://issues.apache.org/jira/browse/SOLR-5289
>
> On Wed, Sep 10, 2014 at 8:05 AM
In addition to adding new fields to the schema, is there a way to modify an
existing field? If I created a field called userID as a long, but decided
later that it should be a string?
Thank you!
-Joe
Could you add another field(s) to your application and use that instead of
creating collections/cores? When you execute a search, instead of picking
a core, just search a single large core but add in a field which contains
some core ID.
-Joe
http://www.lovehorsepower.com
On Sun, Aug 31, 2014 at
I'm getting the following error when I'm indexing a large number of
documents (in the millions). I do not see any errors on the two solr cloud
servers only on the processes that are doing the indexing. The error is
thrown from:
cloudSolrServer.add(solrDoc);
I can't see to put my finger on the cau
gt; ideal - I expect there will be server-side facilities to do something
> equivalent.)
> >
> > Steve
> > www.lucidworks.com
> >
> > On Aug 15, 2014, at 11:49 AM, Joseph Obernberger <
> joseph.obernber...@gmail.com> wrote:
> >
> >> Thank you! A
gt;
> Steve
>
> On Aug 15, 2014, at 11:00 AM, Joseph Obernberger <
> joseph.obernber...@gmail.com> wrote:
>
> > Hi - I've been using Solr Cloud in schema-less mode and am having some
> > issues with 4.8.1 and 4.9.0 when adding lots of new fields. In 4.8.1
&
Hi - I've been using Solr Cloud in schema-less mode and am having some
issues with 4.8.1 and 4.9.0 when adding lots of new fields. In 4.8.1 I'll
get continuous messages that say:
134567307 [qtp968427990-2492] INFO org.apache.solr.schema.IndexSchema รข
Failed to persist managed schema at /configs/
62 matches
Mail list logo