Re: DataImportHandler OutOfMemory Mysql

2017-04-02 Thread Shawn Heisey
On 4/1/2017 4:17 PM, marotosg wrote:
> I am trying to load a big table into Solr using DataImportHandler and Mysql. 
> I am getting OutOfMemory error because Solr is trying to load the full
> table. I have been reading different posts and tried batchSize="-1". 
> https://wiki.apache.org/solr/DataImportHandlerFaq
>
> Do you have any idea what could be the issue?
> Completely lost here.
>
> Solr.6.4.1
> mysql-connector-java-5.1.41-bin.jar
>
> data-config 
>
>  driver="com.mysql.jdbc.Driver"
> url="jdbc:mysql://188.68.190.85:3306/jobsdb" 
> user="suer" 
> password="passowrd"/>
> 
>pk="id"
>   batchSize="-1"

Setting batchSize to -1 is the proper solution, but you've got it in the
wrong place.  It goes on dataSource, not on entity.

https://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Why_does_DataImportHandler_bring_everything_to_memory.3F

When batchSize is -1, DIH executes setFetchSize(Integer.MIN_VALUE) on
the JDBC statement.  This causes the MySQL JDBC driver to stream the
results instead of buffering them.

You should upgrade to 6.4.2 or 6.5.0.  6.4.0 and 6.4.1 have a serious
performance bug.

https://issues.apache.org/jira/browse/SOLR-10130

You may also want to edit the maxMergeCount setting on the
mergeScheduler config, set it to at least 6.  I ran into a problem with
the database disconnecting while importing millions of rows with DIH
from MySQL; this was the solution.  See this thread:

http://lucene.472066.n3.nabble.com/Closed-connection-issue-while-doing-dataimport-td4327116.html

Thanks,
Shawn



Index upgrade time and disk space

2017-04-02 Thread Putul S
Hi all,
I am migrating Solr 4 index to Solr  5. The upgrade tool/script works well.
But ran out disk space upgrading 4 GB index. The server had at least 8 GB
free then. On production, the index is about 200 GB.

How much disk space is needed for indexing? Also, how long does it take to
upgrade large index? It took about a minute to upgrade less than half GB
index.

Thanks in advance.

-- putul


Re: Index upgrade time and disk space

2017-04-02 Thread Shawn Heisey
On 4/2/2017 8:16 AM, Putul S wrote:
> I am migrating Solr 4 index to Solr  5. The upgrade tool/script works well. 
> But ran out disk space upgrading 4 GB index. The server had at least 8 GB 
> free then. On production, the index is about 200 GB.
>
> How much disk space is needed for indexing? Also, how long does it take to 
> upgrade large index? It took about a minute to upgrade less than half GB 
> index.

You've asked questions that have no generic answer.  Answering them
requires a lot of very specific information about your index and the
data it contains, and even if that information is provided, the answers
will only be guesses.  The only way to find out for sure is to try it.

https://lucidworks.com/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Nobody can tell you how much disk space is needed for indexing.  That
will depend on how your schema is configured and how much data you
index.  Small changes can increase or decrease the disk space required.

Upgrading an index runs an operation that Lucene calls "forceMerge" on
the index.  Solr calls this procedure "optimize".  Exactly how fast the
optimize proceeds will depend on the precise contents of the index,
which will depend on the schema and exactly what data has been indexed.

I have some 50GB indexes that take about two hours to optimize (on
systems with very fast disks), which means that it would take about two
hours to upgrade.  Somebody else who has a 50GB index might take a very
different amount of time to optimize, because the contents of their
index are likely to be different than the contents of mine, and their
hardware probably has different capabilities.

An upgrade or an optimize should only require enough disk space to store
the full index again.  It may double in size, then shrink back down to
about the same size, unless there are deleted documents, in which case
the new index will be smaller than the original.

General recommendations for Lucene and Solr are to have FREE disk space
equivalent to *double* the size of all your index data.  This is because
in certain situations when reindexing the bulk of your data and
optimizing the index, it can triple in size temporarily.  In most
situations, the increase will only be double, but the recommendation is
that you have the disk space to handle triple.

Thanks,
Shawn



Re: Index upgrade time and disk space

2017-04-02 Thread sputul
Thanks, Shawn for getting back with detail explanation. I will run tests
upfront with large index and space, and see if fast disk is needed.
- Putul



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-upgrade-time-and-disk-space-tp4328003p4328040.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Fieldtype json supported in SOLR 5.4.0 or 5.4.1

2017-04-02 Thread Abhijit Pawar
Hi Edwin,
​​

I had to replace <> value with <> in 

in my data-source-config file and was able to index the data.

​However w​
ith IP address it is still not working.
In solr-config.xml I am not seeing anywhere to mention schema.xml or
managed-schema.xml.

​Meanwhile going back to my original question - what fieldtype to use for
JSON field in mongoDB ?

Currently for field "pricing" in my data-source-config file​
​
​ I am using String Array as a datatype :

​
data-source-config file​
​
​ :
​

​​

​managed-schema.xml (for SOLR 6.0) /schema.xml(for older version of SOLR)​ :



However it just shows and empty array when data is indexed.Any idea how to
define field of type JSON ?​
​

Best Regards,


Abhijit Pawar
Office : +1 (469) 287 2005 x 110



Follow us on:





On Sat, Apr 1, 2017 at 10:12 AM, Zheng Lin Edwin Yeo 
wrote:

> Did you upgrade your solrconfig.xml to the Solr 6.0 version too?
> There are some difference in Solr 6.0 version which requires setting to
> determine whether to use managed-schema or classic schema (the physical
> schema.xml file)
>
> Regards,
> Edwin
>
> On 1 April 2017 at 01:27, Abhijit Pawar 
> wrote:
>
> > Hi Rick,
> >
> > I tried installing SOLR 6.0 since SOLR 6.0 has managed-schema and tried
> > index the data from mongoDB :
> >
> >
> > 
> >  > driver="com.mongodb.jdbc.MongoDriver" url="mongodb://
> > ​<<>IP-Address>​
> > :27017/
> > ​<>
> > "/>
> > 
> >  > dataSource="mongod"
> > transformer="TemplateTransformer,ProdsCatsFieldTransformer"
> > onError="continue"
> > pk="uuid"
> > query="SELECT
> > orgidStr,idStr,name,code,description,price,images,
> > categoriesStr,enddate_solar,begin_date_solar,status_solar,
> > current_stock_solar,retprice_solar,distprice_solar,
> > listprice_solar,mfgprice_solar,out_of_stock_solar,hide_
> > product_solar,saleprice_solar,metakey_solar,sales_enabled,
> > new_product,has_sku,configurable,rating,updatedAt,comparable,hide_price
> > FROM products"
> > deltaImportQuery="SELECT
> > orgidStr,idStr,name,code,description,price,images,
> > categoriesStr,enddate_solar,begin_date_solar,status_solar,
> > current_stock_solar,retprice_solar,distprice_solar,
> > listprice_solar,mfgprice_solar,out_of_stock_solar,hide_
> > product_solar,saleprice_solar,metakey_solar,sales_enabled,
> > new_product,has_sku,configurable,rating,updatedAt,comparable,hide_price
> > FROM products WHERE orgidStr = '${dataimporter.request.orgid}' AND
> idStr =
> > '${dataimporter.delta.idStr}'"
> > deltaQuery="SELECT idStr FROM products WHERE idStr =
> > '${dataimporter.request.prodidStr}' AND orgidStr =
> > '${dataimporter.request.orgid}'"
> > >
> > 
> > 
> >  > template="org-${products.orgidStr}-prod-${products.idStr}"/>
> > 
> > 
> > 
> > 
> >
> >
> > ​This is the error I get :
> >
> > getNext() failed for query 'SELECT
> > orgidStr,idStr,name,code,description,price,images,
> > categoriesStr,enddate_solar,begin_date_solar,status_solar,
> > current_stock_solar,retprice_solar,distprice_solar,
> > listprice_solar,mfgprice_solar,out_of_stock_solar,hide_
> > product_solar,saleprice_solar,metakey_solar,sales_enabled,
> > new_product,has_sku,configurable,rating,updatedAt,comparable,hide_price
> > FROM products'
> >
> > :com.mongodb.MongoException$Network: can't call something : /
> > ​<>
> > :27017/
> > ​<>
> >
> >
> >
> > Caused by: java.io.IOException: couldn't connect to [/
> > ​
> > ​<>:27017] bc:java.net.SocketTimeoutException: connect timed
> > out
> >
> > ​Have anyone else gone through this kind of issue ?
> >
> >
> >
> >
> > On Tue, Mar 28, 2017 at 6:20 PM, Rick Leir  wrote:
> >
> > > Abhijit
> > > In Mongo you probably have one JSON record per document. You can post
> > that
> > > JSON record to Solr, and the JSON fields get indexed. The github
> project
> > > you mention does just that. If you use the Solr managed schema then
> Solr
> > > will automatically define fields based on what it receives. Otherwise
> you
> > > will need to carefully design a schema.xml.
> > > Cheers -- Rick
> > >
> > > On March 28, 2017 6:08:40 PM EDT, Abhijit Pawar <
> > > abhijit.ibizs...@gmail.com> wrote:
> > > >Hello All,
> > > >
> > > >I am working on a requirement to index field of type JSON (in mongoDB
> > > >collection) in SOLR 5.4.0.
> > > >
> > > >I am using mongo-jdbc-dih which I found on GitHub :
> > > >
> > > >https://github.com/hrishik/solr-mongodb-dih
> > > >
> > > >However I could not find a fieldtype on Apache SOLR wiki page which
> > > >would
> > > >support JSON datatype in mongoDB.
> > > >
> > > >Can someone please recommend a way to include datatype / fieldtype in
> > > >SOLR
> > > >schema to support or index JSON data field from mongoDB.
> > > >Thanks.
> > > >
> > > >R​egards,
> > > >
> > > >Abhijit​
> > >
> > > --
> > > Sent from my Android device with K-9 Mail. Please excuse my brevity.
> >
>


Re: Closed connection issue while doing dataimport

2017-04-02 Thread santosh sidnal
Thanks Shwan, that was good point to consider but we had porblem with
holdability param in data-config.xml and Oracle 12c version of DB and
client.

BY remove Holdability parameter we are able to solve the issue.

On 28 March 2017 at 18:19, Shawn Heisey  wrote:

> On 3/27/2017 7:13 PM, santosh sidnal wrote:
> > i am facing closed connection issue while doing dataimporter, any
> solution
> > to this> stack trace is as below
> >
> >
> > [3/27/17 8:54:41:399 CDT] 00b4 OracleDataSto >  findMappingClass for
> :
> > Entry
> >  java.sql.SQLRecoverableException:
> Closed
> > Connection
>
> Does the import appear to work correctly at first, then stop before it's
> done with this exception after a few hours?  If so, then I think you may
> be running into a situation where the merge scheduler has multiple
> merges scheduled and stops the incoming indexing thread until the
> largest merge is done.  If the indexing thread is stopped for long
> enough, JDBC will disconnect the database, and when the indexing thread
> finally starts back up, the dataimporter finds that it cannot read from
> the database any more.
>
> If this is what's happening, then the solution is to allow the merge
> scheduler to schedule more merges simultaneously.  Here's the
> indexConfig that I use in solrconfig.xml:
>
> 
>   
> 35
> 35
> 105
>   
>   
> 1
> 6
>   
> 
>
> You don't need the mergePolicy part of this config to solve this issue.
> That is the settings that I use, but this greatly increases the number
> of files that can be in an index, which may require OS tuning for max
> open files.
>
> The mergeScheduler is the important part, and specifically
> maxMergeCount.  These settings are for standard spinning-magnetic
> disks.  I increase the maxMergeCount to 6, so more merges can be
> scheduled without halting the indexing thread.  The maxThreadCount value
> should be set to 1 if you're using standard disks.  If you're using SSD,
> then you can bump it a little bit, because SSD can easily handle
> multiple threads randomly writing to the disk.  I don't know what values
> are appropriate for SSD, but I would probably start with 2-4.
>
> Thanks,
> Shawn
>
>


-- 
Regards,
Santosh Sidnal