Solr 6.1.0 - Indexing Error

2016-07-14 Thread Joseph Obernberger
Hi all - I'm testing 6.1.0 on a small two shard setup (two physical machines) using HDFS for the index. I was indexing away when one of the shards started throwing this error: org.apache.solr.common.SolrException: Exception writing document id COLLECT2587102526510 to the index; possible analysis

Re: All Datanodes are Bad

2016-06-22 Thread Joseph Obernberger
Thank you Marcus - they are indeed set to 1024 for the hdfs user. We'll re-configure limits.conf and try again. -Joe On Tue, Jun 21, 2016 at 10:38 AM, Markus Jelsma wrote: > Hello Joseph, > > Your datanodes are in a bad state, you probably overwhelmed it when > indexing. Check your max open fi

All Datanodes are Bad

2016-06-20 Thread Joseph Obernberger
Anyone ever seen an error like this? We are running using HDFS for the index. At the time of the error, we are doing a lot of indexing. Two errors: java.io.IOException: All datanodes DatanodeInfoWithStorage[ 172.16.100.220:50010,DS-4b806395-0661-4a70-a32b-deef82a85359,DISK] are bad. Aborting...

Re: Indexing Twitter - Hypothetical

2016-03-08 Thread Joseph Obernberger
> > On Sun, Mar 6, 2016 at 7:42 AM, Susheel Kumar > wrote: > > > >> Entity Recognition means you may want to recognize different entities > >> name/person, email, location/city/state/country etc. in your > >> tweets/messages with goal of providing better rele

Re: Indexing Twitter - Hypothetical

2016-03-04 Thread Joseph Obernberger
016 at 4:19 AM, Charlie Hull wrote: > > > On 03/03/2016 19:25, Toke Eskildsen wrote: > > > >> Joseph Obernberger wrote: > >> > >>> Hi All - would it be reasonable to index the Twitter 'firehose' > >>> with Solr Cloud - roughly 500-6

Indexing Twitter - Hypothetical

2016-03-03 Thread Joseph Obernberger
Hi All - would it be reasonable to index the Twitter 'firehose' with Solr Cloud - roughly 500-600 million docs per day indexing each of the fields (about 180)? If I were to guess at a sharded setup to handle such data, and keep 2 years worth, I would guess about 2500 shards. Is that reasonable? Is

Re: Solr+HDFS

2016-02-05 Thread Joseph Obernberger
replicas. -Joe On Fri, Feb 5, 2016 at 10:43 AM, Shawn Heisey wrote: > On 2/5/2016 8:11 AM, Joseph Obernberger wrote: > > Thank you for the reply Scott - we have the commit settings as: > > > > 6 > > false > > > > > > 15000 &g

Re: Solr+HDFS

2016-02-05 Thread Joseph Obernberger
erged. > > > > > > k/r, > > Scott > > > > On Fri, Jan 29, 2016 at 12:40 AM, Joseph Obernberger < > > joseph.obernber...@gmail.com> wrote: > > > >> Hi All - we're using Apache Solr Cloud 5.2.1, with an HDFS system that > is &

Re: Solr+HDFS

2016-02-05 Thread Joseph Obernberger
com> wrote: > It seems odd that the tlog files are so large. HDFS aside, is there a > reason why you're not committing? Also, as far as disk space goes, if you > dip below 50% free you run the risk that the index segments can't be > merged. > > > k/r, > Scott >

Solr+HDFS

2016-01-28 Thread Joseph Obernberger
Hi All - we're using Apache Solr Cloud 5.2.1, with an HDFS system that is 86% full. Some of the datanodes in the HDFS cluster are more close to being full than other nodes. We're getting messages about "Error adding log" from the index process, which I **think** is related to datanodes being full

Re: replication and HDFS

2015-08-31 Thread Joseph Obernberger
Best, Erick On Thu, Aug 20, 2015 at 9:23 AM, Joseph Obernberger wrote: Hi - we currently have a multi-shard setup running solr cloud without replication running on top of HDFS. Does it make sense to use replication when using HDFS? Will we expect to see a performance increase in searches? Thank you! -Joe

replication and HDFS

2015-08-20 Thread Joseph Obernberger
Hi - we currently have a multi-shard setup running solr cloud without replication running on top of HDFS. Does it make sense to use replication when using HDFS? Will we expect to see a performance increase in searches? Thank you! -Joe

Re: Solr Clustering Issue

2015-07-24 Thread Joseph Obernberger
, Shawn Heisey wrote: On 7/23/2015 7:51 AM, Joseph Obernberger wrote: Hi Upayavira - the URL was: http://server1:9100/solr/MYCOL1/clustering?q=Collection:(COLLECT1008+OR+COLLECT2587)+AND+(amazon+AND+soap)&wt=json&indent=true&clustering=true&rows=1&df=FULL_DOCUMENT&debug

Re: Solr Clustering Issue

2015-07-23 Thread Joseph Obernberger
81, maxDocs=16336337)\n0.3125 = fieldNorm(doc=209834)\n 0.5714286 = coord(4/7)\n"}}} On 7/22/2015 3:36 PM, Upayavira wrote: I'd be curious to see the parsed query that you get when adding debugQuery=true to the URL. I bet that the clustering component is extracting ter

Re: Solr Clustering Issue

2015-07-22 Thread Joseph Obernberger
erm2) AND Field2:(item1 OR item2) -Joe On 7/22/2015 3:21 PM, Joseph Obernberger wrote: Hi - I'm using carrot2 inside of solr cloud and have noticed that queries that involve parenthesis don't seem to work correctly. For example if I have: q=Field1:(term1 OR term2) AND Field2:(item1

Solr Clustering Issue

2015-07-22 Thread Joseph Obernberger
Hi - I'm using carrot2 inside of solr cloud and have noticed that queries that involve parenthesis don't seem to work correctly. For example if I have: q=Field1:(term1 OR term2) AND Field2:(item1 OR item2) The clustering seems to ignore the values in parenthesis. If instead I do: q=(Field1:ter

Solr PNG Coordinate Reference

2015-07-14 Thread Joseph Obernberger
Hi All - I'm working with the heatmap PNGs generated from solr as described here: https://issues.apache.org/jira/browse/SOLR-7005 What would be the coordinate reference system that the generated PNG uses? Is it possible to load these PNG files into a geospatial tool as a raster layer like QGI

Re: heatmaps

2015-07-02 Thread Joseph Obernberger
Hi - perhaps you do not have enough geospatial data in your index to generate a larger image? Try setting the facet.heatpmap.gridLevel to something higher like 4. I've run queries like: q=insert whatever here&wt=json&indent=true&facet=true&facet.heatmap=geo&facet.heatmap.gridLevel=4&facet.heat

Re: Lost connection to Zookeeper

2015-06-05 Thread Joseph Obernberger
="solr.hdfs.blockcache.direct.memory.allocation">true 16384 true false true 64 512 hdfs://nameservice1:8020/solr5 /etc/hadoop/conf.cloudera.hdfs1 -Joe On 6/5/2015 9:34 AM, Shawn Heisey wrote: On 6/3/2015 6:39 PM, Joseph Obernberger wrote: Hi All - I&

Re: Lost connection to Zookeeper

2015-06-05 Thread Joseph Obernberger
3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) Thanks for any thoughts! -Joe On 6/3/2015 8:39 PM, Joseph Obernberger wrote: Hi All - I've run into a problem where every-once in a while one or more of the shards (27 shard cluster) will loose connection to zookeeper and repor

Lost connection to Zookeeper

2015-06-03 Thread Joseph Obernberger
Hi All - I've run into a problem where every-once in a while one or more of the shards (27 shard cluster) will loose connection to zookeeper and report "updates are disabled". In additional to the CLUSTERSTATUS timeout errors, which don't seem to cause any issue, this one certainly does as tha

Re: Deleting Fields

2015-06-01 Thread Joseph Obernberger
ts are merged the deleted documents will have all their resources reclaimed, effectively deleting the field from the old docs So you could gradually re-index your corpus and get this stuff out of there. Best, Erick On Sat, May 30, 2015 at 5:18 AM, Joseph Obernberger wrote: Thank you Erick.

Re: Deleting Fields

2015-05-30 Thread Joseph Obernberger
tting an OOM is a mystery though. But delete field isn't removing the contents if indexed documents. Showing us the full stack when you hit an OOM would be helpful. Best, Erick On Fri, May 29, 2015 at 4:58 PM, Joseph Obernberger wrote: Thank you Shawn - I'm referring to fields in the s

Re: Deleting Fields

2015-05-29 Thread Joseph Obernberger
Thank you Shawn - I'm referring to fields in the schema. With Solr 5, you can delete fields from the schema. https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-DeleteaField -Joe On 5/29/2015 7:30 PM, Shawn Heisey wrote: On 5/29/2015 5:08 PM, Joseph Obernberger wrote

Deleting Fields

2015-05-29 Thread Joseph Obernberger
Hi All - I have a lot of fields to delete, but noticed that once I started deleting them, I quickly ran out of heap space. Is delete-field a memory intensive operation? Should I delete one field, wait a while, then delete the next? Thank you! -Joe

Re: CLUSTERSTATUS timeout

2015-05-29 Thread Joseph Obernberger
I'm also getting this error with 5.1.0 and a 27 shard setup. null:org.apache.solr.common.SolrException: CLUSTERSTATUS the collection time out:180s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:740) at org.apache.solr.handler.admin.Collection

Solr & Heatmap & Geotools

2015-05-23 Thread Joseph Obernberger
Hi All - I've been working with geo tools to build a heat map based on location data coming back from Solr Cloud using these nifty feature where you can facet on location (https://issues.apache.org/jira/browse/SOLR-7005) and generate a raster. I've been able to take this data and build a GridCov

5.1.0 Heatmap + Geotools

2015-05-06 Thread Joseph Obernberger
Hi - I'm very interested in the new heat map capability of Solr 5.1.0. Has anyone looked at combining geotool's HeatmapProcess method with this data? I'm trying this now, but I keep getting an empty image from the GridCoverage2D object. Any pointers/tips? Thank you! -Joe

Re: [ANNOUNCE] Apache Solr 5.1.0 released

2015-04-14 Thread Joseph Obernberger
Great news! Any tips on how to do an upgrade from 5.0.0 to 5.1.0? Thank you! -Joe On 4/14/2015 2:39 PM, Timothy Potter wrote: I apologize - Yonik prepared these nice release notes for 5.1 and I neglected to include them: Solr 5.1 Release Highlights: * The new Facet Module, including the JSO

Re: CLUSTERSTATE timeout

2015-04-13 Thread Joseph Obernberger
) at java.lang.Thread.run(Thread.java:745) 9:39:33.595 AMINFO org.apache.zookeeper.server.NIOServerCnxn Closed socket connection for client /172.16.100.211:59968 which had sessionid 0x44cabd42bcb4efb -Joe On 4/13/2015 11:40 AM, Joseph Obernberger wrote: I'm ge

CLUSTERSTATE timeout

2015-04-13 Thread Joseph Obernberger
I'm getting the following error running a 27 shard setup (27 physical machines) on Solr Cloud 5.0.0 that are part of an Hadoop cluster. HDFS is used for the index. null:org.apache.solr.common.SolrException: CLUSTERSTATUS the collection time out:180s at org.apache.solr.handler.admin.Colle

Re: HDFS Locking

2015-04-06 Thread Joseph Obernberger
using HDFS. I've seen it take well over a minute to stop. I'm not sure if the index is going to be missing data, or if it will be corrupt at this point. -Joe On 4/6/2015 1:35 PM, Joseph Obernberger wrote: Having a couple issues with restarts of a 27 shard cluster using SolrCloud

HDFS Locking

2015-04-06 Thread Joseph Obernberger
Having a couple issues with restarts of a 27 shard cluster using SolrCloud 5.0.0 and HDFS. I'm getting errors that a lock file exists and the shard will not start. When I delete the file, that shard starts OK. On another shard, I'm getting the following messsage: 538220 [coreLoadExecutor-5-th

Re: Solr 5.0.0 and HDFS

2015-04-03 Thread Joseph Obernberger
3/31/2015 3:13 PM, Joseph Obernberger wrote: I've tried to replicate the issue starting from new, but so far it hasn't happened again. -Joe On 3/28/2015 2:10 PM, Mark Miller wrote: Hmm...can you file a JIRA issue with this info? - Mark On Fri, Mar 27, 2015 at 6:09 PM Joseph Ober

Re: Solr 5.0.0 and HDFS

2015-03-31 Thread Joseph Obernberger
I've tried to replicate the issue starting from new, but so far it hasn't happened again. -Joe On 3/28/2015 2:10 PM, Mark Miller wrote: Hmm...can you file a JIRA issue with this info? - Mark On Fri, Mar 27, 2015 at 6:09 PM Joseph Obernberger wrote: I just started up a two sha

Solr 5.0.0 and HDFS

2015-03-27 Thread Joseph Obernberger
I just started up a two shard cluster on two machines using HDFS. When I started to index documents, the log shows errors like this. They repeat when I execute searches. All seems well - searches and indexing appear to be working. Possibly a configuration issue? My HDFS config: true

Solr and HDFS configuration

2015-03-24 Thread Joseph Obernberger
Hi All - does it make sense to run a solr shard on a node within an Hadoop cluster that is not a data node? In that case all the data that node processes would need to come over the network, but you get the benefit of more CPU for things like faceting. Thank you! -Joe

Re: New leader/replica solution for HDFS

2015-02-26 Thread Joseph Obernberger
Great! Thank you! I had a 4 shard setup - no replicas. Index size was 2.0TBytes stored in HDFS with each node having approximately 500G of index. I added four more shards on four other machines as replicas. One thing that happened was the 4 replicas all ran out of HDFS cache size (SnapPul

Re: New leader/replica solution for HDFS

2015-02-25 Thread Joseph Obernberger
HDFS, though, a single replica (just a leader) per shard means that you don't have any redundancy if the motherboard on that server dies even though HDFS has multiple copies of the _data_. Best, Erick On Wed, Feb 25, 2015 at 12:01 PM, Joseph Obernberger wrote: I am also confused on this.

Re: New leader/replica solution for HDFS

2015-02-25 Thread Joseph Obernberger
I am also confused on this. Is adding replicas going to increase search performance? I'm not sure I see the point of any replicas when using HDFS. Is there one? Thank you! -Joe On 2/25/2015 10:57 AM, Erick Erickson wrote: bq: And the data sync between leader/replica is always a problem No

Re: scanning all documents in the collection

2015-02-02 Thread Joseph Obernberger
I have a similar use-case. Check out the export capability and using cursorMark. -Joe On 2/2/2015 8:14 AM, Matteo Grolla wrote: Hi, I'm thinking about having an instance of solr (SolrA) with all fields stored and just id indexed in addition with a normal production instance of solr

Re: How large is your solr index?

2015-01-08 Thread Joseph Obernberger
On 1/8/2015 3:16 AM, Toke Eskildsen wrote: On Wed, 2015-01-07 at 22:26 +0100, Joseph Obernberger wrote: Thank you Toke - yes - the data is indexed throughout the day. We are handling very few searches - probably 50 a day; this is an R&D system. If your searches are in small bundles,

Re: How large is your solr index?

2015-01-07 Thread Joseph Obernberger
redo our Solr Cloud, we will only run one shard per box, and supply more HDFS cache. -Joe On 1/7/2015 3:50 PM, Toke Eskildsen wrote: Joseph Obernberger [j...@lovehorsepower.com] wrote: [HDFS, 9M docs, 2.9TB, 22 shards, 11 bare metal boxes] A typical query takes about 7 seconds to run, but we al

Re: How large is your solr index?

2015-01-07 Thread Joseph Obernberger
Kinda late to the party on this very interesting thread, but I'm wondering if anyone has been using SolrCloud with HDFS at large scales? We really like this capability since our data is inside of Hadoop and we can run the Solr shards on the same nodes, and we only need to manage one pool of st

Re: splitshard the collection time out:900s

2014-12-16 Thread Joseph Obernberger
Shard splits can take a long time - the 900 seconds is just the REST timeout. The split is still taking place. On Tue, Dec 16, 2014 at 12:43 PM, Trilok Prithvi wrote: > > Sorry... I sent without explaining the situation. > > We did splitshard: > > solr/admin/collections?action=SPLITSHARD&collect

Re: [ANN] word2vec for Lucene

2014-11-20 Thread Joseph Obernberger
Hi Koji - is it possible to execute word2vec on a subset of documents from Solr? - ie could I run a query, get back the top n results and pass only those to word2vec? Will this work with Solr Cloud? Thank you! -Joe On Thu, Nov 20, 2014 at 12:18 PM, Paul Libbrecht wrote: > As far as I could t

Re: More HDFS and Shard Splitting

2014-11-20 Thread Joseph Obernberger
100G shard, the index size goes up by 100G with the two new shards. Is this correct for HDFS operation? Thank you! -Joe On Mon, Nov 17, 2014 at 7:12 PM, Joseph Obernberger < joseph.obernber...@gmail.com> wrote: > Looks like the shard split failed, and only created one additional sha

Re: More HDFS and Shard Splitting

2014-11-17 Thread Joseph Obernberger
e create the > directory ahead of time I don't think. > > Best, > Erick > > On Mon, Nov 17, 2014 at 12:17 PM, Joseph Obernberger > wrote: > > Originally I had two shards on two machines - shard1 and shard2. > > I did a SHARDSPLIT on shard1. > > Now have s

More HDFS and Shard Splitting

2014-11-17 Thread Joseph Obernberger
Originally I had two shards on two machines - shard1 and shard2. I did a SHARDSPLIT on shard1. Now have shard1, shard2, and shard1_0 If I select the core (COLLECT_shard1_0_replica1) and execute a query, I get all the docs OK, but if I specific &distrib=false, I get 0 documents. Under HDFS - when/h

Shard splitting and HDFS

2014-11-17 Thread Joseph Obernberger
If I create the directory manually on the server that I'm splitting: COLLECT_shard1_0_replica1 Then do the shard split command, it works OK. -Joe

Shard splitting and HDFS

2014-11-17 Thread Joseph Obernberger
I tried to split a shard using HDFS storage, and at first I received this error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error CREATEing SolrCore 'COLLECT1_shard1_0_replica1': Unable to create core [COLLECT1_shard1_0_replica1] Caused by: Direct buffer memory at

Updating solrconfig.xml with zookeeper & HDFS

2014-11-13 Thread Joseph Obernberger
I wanted to make a change to the solrconfig.xml file in my 4.10.2 solr cloud cluster. I modified the files and put it in /tmp/conf - the only file in that directory. I then executed: ./zkcli.sh -cmd upconfig -zkhost list_of_hosts -d /tmp/conf -n ConfigName These ran successfully, and I was able t

Re: Exporting Error in 4.10.1

2014-10-30 Thread Joseph Obernberger
https://github.com/DmitryKey/luke/releases/tag/luke-4.10.1 > > > http://dmitrykan.blogspot.fi/2014/09/exporting-lucene-index-to-xml-with-luke.html > > It does not have the option to export select fields only, though. > > Dmitry > > On Thu, Oct 30, 2014 at 12:39 AM, Jo

Exporting Error in 4.10.1

2014-10-29 Thread Joseph Obernberger
Hi - I'm trying to use 4.10.1 with /export. I've defined a field as follows: I then call: http://server:port/solr/COLLECT1/export?q=Collection:COLLECT2000&sort=DocumentId desc&fl=DocumentId The error I receive is: java.io.IOException: DocumentId must have DocValues to use this feature. at org.a

Re: Modify Schema - Schema API

2014-09-10 Thread Joseph Obernberger
new configs > * Reindex data to the new collection. > * Use collection aliasing to swap the old/new collections. > (http://www.anshumgupta.net/2013/10/collection-aliasing-in-solrcloud.html) > > All this while, you wouldn't really need to shut down the Solr > cluster/collection

Re: Modify Schema - Schema API

2014-09-10 Thread Joseph Obernberger
g the field type might require you to reindex your > data. > > There's an open JIRA for that one and I think someone would get to it > sometime in the reasonably near future. > JIRA: https://issues.apache.org/jira/browse/SOLR-5289 > > On Wed, Sep 10, 2014 at 8:05 AM

Modify Schema - Schema API

2014-09-10 Thread Joseph Obernberger
In addition to adding new fields to the schema, is there a way to modify an existing field? If I created a field called userID as a long, but decided later that it should be a string? Thank you! -Joe

Re: Scaling to large Number of Collections

2014-08-31 Thread Joseph Obernberger
Could you add another field(s) to your application and use that instead of creating collections/cores? When you execute a search, instead of picking a core, just search a single large core but add in a field which contains some core ID. -Joe http://www.lovehorsepower.com On Sun, Aug 31, 2014 at

Indexing Error IOException

2014-08-29 Thread Joseph Obernberger
I'm getting the following error when I'm indexing a large number of documents (in the millions). I do not see any errors on the two solr cloud servers only on the processes that are doing the indexing. The error is thrown from: cloudSolrServer.add(solrDoc); I can't see to put my finger on the cau

Re: Managed Schema

2014-08-15 Thread Joseph Obernberger
gt; ideal - I expect there will be server-side facilities to do something > equivalent.) > > > > Steve > > www.lucidworks.com > > > > On Aug 15, 2014, at 11:49 AM, Joseph Obernberger < > joseph.obernber...@gmail.com> wrote: > > > >> Thank you! A

Re: Managed Schema

2014-08-15 Thread Joseph Obernberger
gt; > Steve > > On Aug 15, 2014, at 11:00 AM, Joseph Obernberger < > joseph.obernber...@gmail.com> wrote: > > > Hi - I've been using Solr Cloud in schema-less mode and am having some > > issues with 4.8.1 and 4.9.0 when adding lots of new fields. In 4.8.1 &

Managed Schema

2014-08-15 Thread Joseph Obernberger
Hi - I've been using Solr Cloud in schema-less mode and am having some issues with 4.8.1 and 4.9.0 when adding lots of new fields. In 4.8.1 I'll get continuous messages that say: 134567307 [qtp968427990-2492] INFO org.apache.solr.schema.IndexSchema รข Failed to persist managed schema at /configs/