Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Zheng Lin Edwin Yeo
Hi David, Yes, I do have this field "_root_" in the schema. However, I don't think I have use the field, and there is no difference in the indexing speed after I remove the field. Regards, Edwin On Wed, 3 Apr 2019 at 22:57, David Smiley wrote: > Hi Edwin, > > I&

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread David Smiley
t who knows. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Tue, Apr 2, 2019 at 10:17 PM Zheng Lin Edwin Yeo wrote: > Hi, > > I am setting up the latest Solr 8.0.0, and I am re-indexing the data from > scratch in Solr 8.0.0 > >

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread David Smiley
What/where is this benchmark? I recall once Ishan was working with a volunteer to set up something like Lucene has but sadly it was not successful On Wed, Apr 3, 2019 at 6:04 AM Đạt Cao Mạnh wrote: > Hi guys, > > I'm seeing the same problems with Shalin nightly indexing benchmark. This > happen

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Toke Eskildsen
On Wed, 2019-04-03 at 18:04 +0800, Zheng Lin Edwin Yeo wrote: > I have tried to set all the docValues in my schema.xml to false and > do the indexing again. > There isn't any difference with the indexing speed as compared to > when we have enabled the docValues. Thank you for sp

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Zheng Lin Edwin Yeo
Hi Toke, I have tried to set all the docValues in my schema.xml to false and do the indexing again. There isn't any difference with the indexing speed as compared to when we have enabled the docValues. Seems like the cause of the regression might be somewhere else? Regards, Edwin On Wed,

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Đạt Cao Mạnh
Hi guys, I'm seeing the same problems with Shalin nightly indexing benchmark. This happen around this period git log --before=2018-12-07 --after=2018-11-21 On Wed, Apr 3, 2019 at 8:45 AM Toke Eskildsen wrote: > On Wed, 2019-04-03 at 15:24 +0800, Zheng Lin Edwin Yeo wrote: > > Yes, I am using Do

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Toke Eskildsen
On Wed, 2019-04-03 at 15:24 +0800, Zheng Lin Edwin Yeo wrote: > Yes, I am using DocValues for most of my fields. So that's a culprit. Thank you. > Currently we can't share the test data yet as some of the records are > sensitive. Do you have any data from CSV file that you can test? Not really.

Re: Slower indexing speed in Solr 8.0.0

2019-04-03 Thread Zheng Lin Edwin Yeo
Yes, I am using DocValues for most of my fields. I am using dynamicField, in which I have appended the field name with things like _s, _i, etc in the CSV file. Currently we can't share the test data yet as some of the recor

Re: Slower indexing speed in Solr 8.0.0

2019-04-02 Thread Toke Eskildsen
On Wed, 2019-04-03 at 10:17 +0800, Zheng Lin Edwin Yeo wrote: > What could be the reason that causes the indexing to be slower in > Solr 8.0.0? As Aroop states there can be multiple explanations. One of them is the change to how DocValues are handled in 8.0.0. The indexing impact should be tiny, b

Re: Slower indexing speed in Solr 8.0.0

2019-04-02 Thread Zheng Lin Edwin Yeo
code from the SimplePostTools. I have already tried it more than 10 times, and for all the time that I tried, the indexing speed in 8.0 are all at least 40% slower than 7.7.1 Regards, Edwin On Wed, 3 Apr 2019 at 11:19, Aroop Ganguly wrote: > Indexing speeds are function of a lot of va

Re: Slower indexing speed in Solr 8.0.0

2019-04-02 Thread Aroop Ganguly
>> Hi, >> >> I am setting up the latest Solr 8.0.0, and I am re-indexing the data from >> scratch in Solr 8.0.0 >> >> However, I found that the indexing speed is slower in Solr 8.0.0, as >> compared to the earlier version like Solr 7.7.1. I have not

Re: Slower indexing speed in Solr 8.0.0

2019-04-02 Thread Zheng Lin Edwin Yeo
atch in Solr 8.0.0 > > However, I found that the indexing speed is slower in Solr 8.0.0, as > compared to the earlier version like Solr 7.7.1. I have not changed the > schema.xml and solrconfig.xml yet, just did a change of the > luceneMatchVersion in solrconfig.xml to 8.0.0 >

Slower indexing speed in Solr 8.0.0

2019-04-02 Thread Zheng Lin Edwin Yeo
Hi, I am setting up the latest Solr 8.0.0, and I am re-indexing the data from scratch in Solr 8.0.0 However, I found that the indexing speed is slower in Solr 8.0.0, as compared to the earlier version like Solr 7.7.1. I have not changed the schema.xml and solrconfig.xml yet, just did a change of

Re: Improve indexing speed?

2019-01-01 Thread Shawn Heisey
On 1/1/2019 8:59 AM, John Milton wrote: My document contains 65 fields. All the fields needs to be indexed. But for the 100 documents takes 10 seconds for indexing. I am using Solr 7.5 (2 cloud instance), with 50 shards. The best way to achieve fast indexing in Solr is to index multiple items

Re: Improve indexing speed?

2019-01-01 Thread Hendrik Haddorp
Java heap space 15 GB. How to improve indexing speed? Note : All the fields contains maximum 20 characters only. Field type is text general with case insensitive. Thanks, John Milton

Re: Improve indexing speed?

2019-01-01 Thread Erick Erickson
> It's running on Windows OS and it has 32 GB RAM. Java heap space 15 GB. > How to improve indexing speed? > Note : > All the fields contains maximum 20 characters only. Field type is text > general with case insensitive. > > Thanks, > John Milton

Improve indexing speed?

2019-01-01 Thread John Milton
Hi to all, My document contains 65 fields. All the fields needs to be indexed. But for the 100 documents takes 10 seconds for indexing. I am using Solr 7.5 (2 cloud instance), with 50 shards. It's running on Windows OS and it has 32 GB RAM. Java heap space 15 GB. How to improve indexing

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-03-01 Thread 苗海泉
Thank you for your advice on gc tools, what do you suggest to me? 2018-02-28 23:57 GMT+08:00 Shawn Heisey : > On 2/28/2018 2:53 AM, 苗海泉 wrote: > >> Thanks for your detailed advice, the monitor product you are talking about >> is good, but our solr system is running on a private network and seems

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-28 Thread Shawn Heisey
On 2/28/2018 2:53 AM, 苗海泉 wrote: Thanks for your detailed advice, the monitor product you are talking about is good, but our solr system is running on a private network and seems to be unusable at all, with no single downloadable application for analyzing specific gc logs. For analyzing GC logs

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-28 Thread Emir Arnautović
If you are after only visualising GC, there are several tools that you can download or upload logs to visualise. If you would like to monitor all host/solr/jvm, Sematext’s SPM also comes in on-premises version, where you install and host your own monitoring infrastructure: https://sematext.com

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-28 Thread 苗海泉
Thanks for your detailed advice, the monitor product you are talking about is good, but our solr system is running on a private network and seems to be unusable at all, with no single downloadable application for analyzing specific gc logs. 2018-02-28 16:57 GMT+08:00 Emir Arnautović : > Hi, > I w

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-28 Thread Emir Arnautović
Hi, I would start with following: 1. have dedicated nodes for ZK ensemble - those do not have to be powerful nodes (maybe 2-4 cores and 8GB RAM) 2. reduce heap size to value below margin where JVM can use compressed oops - 31GB should be safe size 3. shard collection to all nodes 4. increase roll

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
Thank you, I read under the memory footprint, I set 75% recovery, memory occupancy at about 76%, the other we zookeeper not on a dedicated server, perhaps because of this cause instability. What else do you recommend for me to check? 2018-02-27 22:37 GMT+08:00 Emir Arnautović : > This does not s

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread Emir Arnautović
This does not show much: only that your heap is around 75% (24-25GB). I was thinking that you should compare metrics (heap/GC as well) when running on without issues and when running with issues and see if something can be concluded. About instability: Do you run ZK on dedicated nodes? Emir --

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
Thank you, we were 49 shard 49 nodes, but later found that in this case, often disconnect between solr and zookeepr, zookeeper too many nodes caused solr instability, so reduced to 25 A follow-up performance can not keep up also need to increase back. Very slow when solr and zookeeper not found an

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread Emir Arnautović
Ah, so there are ~560 shards per node and not all nodes are indexing at the same time. Why is that? You can have better throughput if indexing on all nodes. If happy with shard size, you can create new collection with 49 shards every 2h and have everything the same and index on all nodes. Back

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
Thanks for you reply again. I just said that you may have some misunderstanding, we have 49 solr nodes, each collection has 25 shards, each shard has only one replica of the data, there is no copy, and I reduce the part of the cache. If you need the metric data, I can check Come out to tell you, i

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread Emir Arnautović
Hi, It is hard to tell without looking more into your metrics. It seems to me that you are reaching limits of your cluster. I would doublecheck if memory is the issue. If I got it right, you have ~1120 shards per node. It takes some heap just to keep them open. If you have some caches enabled an

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
In addition, we found that the rate was normal when the number of collections was kept below 936 and the speed was slower and slower at 984. Therefore, we could only temporarily delete the older collection, but now we need more Online collection, there has been no good way to confuse us for a long

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
Thank you for reply. One collection has 25 shard one replica, one solr node has about 5T on desk. GC is checked ,and modify as follow : SOLR_JAVA_MEM="-Xms32768m -Xmx32768m " GC_TUNE=" \ -XX:+UseG1GC \ -XX:+PerfDisableSharedMem \ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=8m \ -XX:MaxGCPaus

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread Emir Arnautović
Hi, To get more complete picture, can you tell us how many shards/replicas do you have per collection? Also what is index size on disk? Did you check GC? BTW, using 32GB heap prevents you from using compressed oops, resulting in less memory available than 31GB. Thanks, Emir -- Monitoring - Log

When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
I encountered a more serious problem in the process of using solr. We use the solr version is 6.0, our daily amount of data is about 500 billion documents, create a collection every hour, the online collection of more than a thousand, 49 solr nodes. If the collection in less than 800, the speed is

Re: Slow indexing speed when collection size is large

2017-05-07 Thread Zheng Lin Edwin Yeo
ching speed is quite fast currently, even during indexing. It is the indexing speed that is slow. Regards, Edwin On 7 May 2017 at 21:14, Shawn Heisey wrote: > On 5/6/2017 6:49 PM, Zheng Lin Edwin Yeo wrote: > > For my rich documentation handling, I'm using Extracting Reques

Re: Slow indexing speed when collection size is large

2017-05-07 Thread Shawn Heisey
On 5/6/2017 6:49 PM, Zheng Lin Edwin Yeo wrote: > For my rich documentation handling, I'm using Extracting Request Handler, and > it requires OCR. > > However, currently, for the slow indexing speed which I'm experiencing, the > indexing is done directly from the Syba

Re: Slow indexing speed when collection size is large

2017-05-06 Thread Zheng Lin Edwin Yeo
Hi Shawn, For my rich documentation handling, I'm using Extracting Request Handler, and it requires OCR. However, currently, for the slow indexing speed which I'm experiencing, the indexing is done directly from the Sybase database. I will fetch about 1000 records at a time from S

Re: Slow indexing speed when collection size is large

2017-05-06 Thread Shawn Heisey
On 5/1/2017 10:17 AM, Zheng Lin Edwin Yeo wrote: > I'm using Solrj for the indexing, not using curl. Normally I bundle > about 1000 documents for each POST. There's more than 300GB of RAM for > that server, and I do not use any sharing at the moment. Looking over your email history on the list, I

Re: Slow indexing speed when collection size is large

2017-05-01 Thread Zheng Lin Edwin Yeo
10:39:29 PM EDT, Zheng Lin Edwin Yeo < > edwinye...@gmail.com> wrote: > >Hi, > > > >I'm using Solr 6.4.2. > > > >Would like to check, if there are alot of collections in my Solr which > >has > >very large index size, will the indexing speed be affected

Re: Slow indexing speed when collection size is large

2017-05-01 Thread Rick Leir
Edwin Yeo wrote: >Hi, > >I'm using Solr 6.4.2. > >Would like to check, if there are alot of collections in my Solr which >has >very large index size, will the indexing speed be affected? > >Currently, I have created a new collections in Solr which has several &g

Slow indexing speed when collection size is large

2017-04-30 Thread Zheng Lin Edwin Yeo
Hi, I'm using Solr 6.4.2. Would like to check, if there are alot of collections in my Solr which has very large index size, will the indexing speed be affected? Currently, I have created a new collections in Solr which has several collections with very large index size, and the indexing

Re: Indexing speed reduced significantly with OCR

2017-03-31 Thread Zheng Lin Edwin Yeo
This is my comparison of the indexing speed with and without Tesseract OCR. The smaller file is taking longer to index, probably because there are more text to do the OCR, as compared to the bigger file, which has lesser text. Is that usually the case? *With Tesseract OCR* 174KB - 5.20 sec

RE: Indexing speed reduced significantly with OCR

2017-03-30 Thread Phil Scadden
Yes, that would seem an accurate assessment of the problem. -Original Message- From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] Sent: Thursday, 30 March 2017 4:53 p.m. To: solr-user@lucene.apache.org Subject: Re: Indexing speed reduced significantly with OCR Thanks for your reply

Re: Indexing speed reduced significantly with OCR

2017-03-30 Thread Walter Underwood
om] > Sent: Thursday, March 30, 2017 7:37 AM > To: solr-user@lucene.apache.org > Subject: Re: Indexing speed reduced significantly with OCR > > The workflow is > -/ OCR new documents > -/ check quality and tune until you get good output text -/ keep the output > text in the

RE: Indexing speed reduced significantly with OCR

2017-03-30 Thread Allison, Timothy B.
> Note that the OCRing is a separate task from Solr indexing, and is best done > on separate machines. +1 -Original Message- From: Rick Leir [mailto:rl...@leirtech.com] Sent: Thursday, March 30, 2017 7:37 AM To: solr-user@lucene.apache.org Subject: Re: Indexing speed r

Re: Indexing speed reduced significantly with OCR

2017-03-30 Thread Rick Leir
The workflow is -/ OCR new documents -/ check quality and tune until you get good output text -/ keep the output text in the file system -/ index and re-index to Solr as necessary from the file system Note that the OCRing is a separate task from Solr indexing, and is best done on separate mach

Re: Indexing speed reduced significantly with OCR

2017-03-29 Thread Zheng Lin Edwin Yeo
Thanks for your reply. >From what I see, getting more hardware to do the OCR is inevitable? Even if we run the OCR outside of Solr indexing stream, it will still take a long time to process it if it is on just one machine. And we still need to wait for the OCR to finish converting before we can r

RE: Indexing speed reduced significantly with OCR

2017-03-28 Thread Phil Scadden
Well I haven’t had to deal with a problem that size, but it seems to me that you have little alternative except through more computer hardware at it. For the job I did, I OCRed to convert PDF to searchable PDF outside the indexing workflow. I used pdftotext utility to extract text from pdf. If t

Re: Indexing speed reduced significantly with OCR

2017-03-28 Thread Walter Underwood
28, 2017, at 2:52 AM, Zheng Lin Edwin Yeo wrote: > > Hi, > > Do you have suggestions that we can do to cope with the expensive process > of indexing documents which requires OCR. > > For my current situation, the indexing takes about 2 weeks to complete. If > the avera

Re: Indexing speed reduced significantly with OCR

2017-03-28 Thread Zheng Lin Edwin Yeo
Hi, Do you have suggestions that we can do to cope with the expensive process of indexing documents which requires OCR. For my current situation, the indexing takes about 2 weeks to complete. If the average indexing speed is say to be 50 times slower, it means it will require 100 weeks to index

Re: Indexing speed reduced significantly with OCR

2017-03-27 Thread Zheng Lin Edwin Yeo
with OCR can be 100 times slower than indexing a PDF > that is searchable (text extractable without OCR). > > -Original Message- > From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] > Sent: Tuesday, 28 March 2017 4:13 p.m. > To: solr-user@lucene.apache.org > S

RE: Indexing speed reduced significantly with OCR

2017-03-27 Thread Phil Scadden
: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] Sent: Tuesday, 28 March 2017 4:13 p.m. To: solr-user@lucene.apache.org Subject: Indexing speed reduced significantly with OCR Hi, Does the indexing speed of Solr reduced significantly when we are using Tesseract OCR to extract scanned inline

Indexing speed reduced significantly with OCR

2017-03-27 Thread Zheng Lin Edwin Yeo
Hi, Does the indexing speed of Solr reduced significantly when we are using Tesseract OCR to extract scanned inline images from PDF? I found that after I implement the solution to extract those scanned images from PDF, the indexing speed is now slower by almost more than 10 times. I'm

Re: Slow indexing speed when index size is large?

2016-10-16 Thread Zheng Lin Edwin Yeo
Hi Shawn, Thanks for the information. Regards, Edwin On 14 October 2016 at 20:19, Shawn Heisey wrote: > On 10/13/2016 9:58 PM, Zheng Lin Edwin Yeo wrote: > > Thanks for the reply Shawn. Currently, my heap allocation to each Solr > > instance is 22GB. Is that big enough? > > I can't answer tha

Re: Slow indexing speed when index size is large?

2016-10-14 Thread Shawn Heisey
On 10/13/2016 9:58 PM, Zheng Lin Edwin Yeo wrote: > Thanks for the reply Shawn. Currently, my heap allocation to each Solr > instance is 22GB. Is that big enough? I can't answer that question. I know little about your install. Even if I *did* know a few more things about your install, I could o

Re: Slow indexing speed when index size is large?

2016-10-13 Thread Zheng Lin Edwin Yeo
Thanks for the reply Shawn. Currently, my heap allocation to each Solr instance is 22GB. Is that big enough? Regards, Edwin On 13 October 2016 at 23:56, Shawn Heisey wrote: > On 10/13/2016 9:20 AM, Zheng Lin Edwin Yeo wrote: > > Would like to find out, will the indexing speed in a c

Re: Slow indexing speed when index size is large?

2016-10-13 Thread Shawn Heisey
On 10/13/2016 9:20 AM, Zheng Lin Edwin Yeo wrote: > Would like to find out, will the indexing speed in a collection with a > very large index size be much slower than one which is still empty or > a very small index size? This is assuming that the configurations, > indexing code and

Slow indexing speed when index size is large?

2016-10-13 Thread Zheng Lin Edwin Yeo
Hi, Would like to find out, will the indexing speed in a collection with a very large index size be much slower than one which is still empty or a very small index size? This is assuming that the configurations, indexing code and the files to be indexed are the same. Currently, I have a setup in

Re: Does EML files with inline images affect the indexing speed

2016-05-03 Thread Zheng Lin Edwin Yeo
t. That's what > doing the parsing. > > Regards, > Alex > On 3 May 2016 7:53 pm, "Zheng Lin Edwin Yeo" wrote: > > > Hi, > > > > I would like to find out, if the presence of inline images in EML files > > will slow down the indexing speed sign

Re: Does EML files with inline images affect the indexing speed

2016-05-03 Thread Alexandre Rafalovitch
L files > will slow down the indexing speed significantly? > > Even though the content of the EML files are in Plain Text instead of HTML. > but I still found that the indexing performance is not up to expectation > yet. Average speed which I'm getting are around 0.3GB/hr. > &

Does EML files with inline images affect the indexing speed

2016-05-03 Thread Zheng Lin Edwin Yeo
Hi, I would like to find out, if the presence of inline images in EML files will slow down the indexing speed significantly? Even though the content of the EML files are in Plain Text instead of HTML. but I still found that the indexing performance is not up to expectation yet. Average speed

Re: Optimal indexing speed in Solr

2016-04-14 Thread John Bickerstaff
d over time, optimizing your collection(s) may help. > On Apr 14, 2016 3:52 AM, "Emir Arnautovic" > wrote: > >> Hi Edwin, >> Indexing speed depends on multiple factors: HW, Solr configurations and >> load, documents, indexing client: More complex documents, more CPU

Re: Optimal indexing speed in Solr

2016-04-14 Thread John Bickerstaff
If you delete a lot of documents over time, or if you add updated documents of the same I'd over time, optimizing your collection(s) may help. On Apr 14, 2016 3:52 AM, "Emir Arnautovic" wrote: > Hi Edwin, > Indexing speed depends on multiple factors: HW, Solr configurations

Re: Optimal indexing speed in Solr

2016-04-14 Thread Emir Arnautovic
Hi Edwin, Indexing speed depends on multiple factors: HW, Solr configurations and load, documents, indexing client: More complex documents, more CPU time to process each document before indexing structure is written down to disk. Bigger the index, more heap is used, more frequent GCs. Maybe

Optimal indexing speed in Solr

2016-04-13 Thread Zheng Lin Edwin Yeo
Hi, Would like to find out, what is the optimal indexing speed in Solr? Previously, I managed to get more than 3GB/hour, but now the speed has drop to 0.7GB/hr. What could be the potential reason behind this? Besides the index size getting bigger, I have only added in more collections into the

Re: Single-sharded SolrCloud vs Lucene indexing speed

2015-11-29 Thread Erick Erickson
t two cases. The raw overhead imposed by Solr is probably your third case. Yes, slowest replica determines indexing speed. To guarantee data isn't lost, the process is: > leader receives updates. > leader indexes locally _and_ forwards docs to follower > follower acks back to leade

Single-sharded SolrCloud vs Lucene indexing speed

2015-11-28 Thread Zisis Tachtsidis
I'm conducting some indexing experiments in SolrCloud and I want to confirm my conclusions and ask for suggestions on how to improve performance. My setup includes a single-sharded collection with 1 additional replica in SolrCloud 5.3.1. I'm using SolrJ and the indexing speed refers to

Re: Slow Indexing speed for csv files, multi-threaded indexing

2013-11-07 Thread Erick Erickson
nd commit time are set > to very large numbers. > > I have tried indexing a test set of csv files which contains 1.44M records > (total size 21MB). All my tests have been on different types of Amazon ec2 > instances - e.g. m1.xlarge (4vCPU, 15GB RAM) and m3.2xlarge(8vCPU, 30GB

Slow Indexing speed for csv files, multi-threaded indexing

2013-11-04 Thread Vikram Srinivasan
) and m3.2xlarge(8vCPU, 30GB RAM). I have set my jvm heap size large enough and tuned gc parameters as seen on various forums. Observations: 1. My indexing speed for 1.44M records (or row in CSV file) is 240s on the m1.xlarge instance and 160s on the m3.2xlarge instance. 2. The indexing speed is

Re: howto increase indexing speed?

2013-10-16 Thread Walter Underwood
to use more cores you need to use solrj. Or maybe > more than one DIH and more cores of course. > > Primoz > > > > From: Giovanni Bricconi > To: solr-user > Date: 16.10.2013 16:25 > Subject: howto increase indexing speed? > > > > I

Re: howto increase indexing speed?

2013-10-16 Thread primoz . skale
increase indexing speed? I have a small solr setup, not even on a physical machine but a vmware virtual machine with a single cpu that reads data using DIH from a database. The machine has no phisical disks attached but stores data on a netapp nas. Currently this machine indexes 320 documents/sec

howto increase indexing speed?

2013-10-16 Thread Giovanni Bricconi
I have a small solr setup, not even on a physical machine but a vmware virtual machine with a single cpu that reads data using DIH from a database. The machine has no phisical disks attached but stores data on a netapp nas. Currently this machine indexes 320 documents/sec, not bad but we plan to d

Re: Storing/indexing speed drops quickly

2013-09-23 Thread Per Steffensen
Now running the tests on a slightly reduced setup (2 machines, quadcore, 8GB ram ...), but that doesnt matter We see that storing/indexing speed drops when using IndexWriter.updateDocument in DirectUpdateHandler2.addDoc. But it does not drop when just using IndexWriter.addDocument (update

Re: Storing/indexing speed drops quickly

2013-09-16 Thread Toke Eskildsen
On Fri, 2013-09-13 at 17:32 +0200, Shawn Heisey wrote: > Put your OS and Solr itself on regular disks in RAID1 and your Solr data > on the SSD. Due to the eventual decay caused by writes, SSD will > eventually die, so be ready for SSD failures to take out shard replicas. One of the very useful

Re: Storing/indexing speed drops quickly

2013-09-13 Thread Shawn Heisey
On 9/13/2013 12:03 AM, Per Steffensen wrote: What is it that will fill my heap? I am trying to avoid the FieldCache. For now, I am actually not doing any searches - focus on indexing for now - and certainly not group/facet/sort searches that will use the FieldCache. I don't know what makes up t

Re: Storing/indexing speed drops quickly

2013-09-13 Thread Per Steffensen
On 9/12/13 4:26 PM, Shawn Heisey wrote: On 9/12/2013 2:14 AM, Per Steffensen wrote: Starting from an empty collection. Things are fine wrt storing/indexing speed for the first two-three hours (100M docs per hour), then speed goes down dramatically, to an, for us, unacceptable level (max 10M per

Re: Storing/indexing speed drops quickly

2013-09-12 Thread Shawn Heisey
On 9/12/2013 2:14 AM, Per Steffensen wrote: >> Starting from an empty collection. Things are fine wrt >> storing/indexing speed for the first two-three hours (100M docs per >> hour), then speed goes down dramatically, to an, for us, unacceptable >> level (max 10M per ho

Re: Storing/indexing speed drops quickly

2013-09-12 Thread Per Steffensen
* doccount.png: Measured number of doc in Solr collection Starting from an empty collection. Things are fine wrt storing/indexing speed for the first two-three hours (100M docs per hour), then speed goes down dramatically, to an, for us, unacceptable level (max 10M per hour). At the same time as speed goes

Re: Storing/indexing speed drops quickly

2013-09-12 Thread Per Steffensen
: Measured number of doc in Solr collection Starting from an empty collection. Things are fine wrt storing/indexing speed for the first two-three hours (100M docs per hour), then speed goes down dramatically, to an, for us, unacceptable level (max 10M per hour). At the same time as speed goes down, we

Storing/indexing speed drops quickly

2013-09-11 Thread Per Steffensen
images * iowait.png: Measured I/O wait on the Solr machines * doccount.png: Measured number of doc in Solr collection Starting from an empty collection. Things are fine wrt storing/indexing speed for the first two-three hours (100M docs per hour), then speed goes down dramatically, to an, for us

Re: Indexing-speed issues (chart included)

2011-06-21 Thread Mathias Hodler
Sorry, here are some details: requestHandler: XmlUpdateRequesetHandler protocol: http (10 concurrend threads) document: 1kb size, 15 fields cpu load: 20% memory usage: 50% But generally speaking, is that normal or must be something wrong with my configuration, ... 2011/6/17 Erick Erickson >

Re: Indexing-speed issues (chart included)

2011-06-17 Thread Erick Erickson
No, generally this isn't what I'd expect. There will be periodic slowdowns when segments are flushed (I'm assuming you're not using trunk, there have been speedups here, see: http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-have-i-can-use-them/) Does your config have any parameters set? Y

Re: Indexing-speed issues (chart included)

2011-06-17 Thread Mark Schoy
Sorry, here are some details: requestHandler: XmlUpdateRequesetHandler protocol: http (10 concurrend threads) document: 1kb size, 15 fields cpu load: 20% memory usage: 50% But generally speaking, is that normal or must be something wrong with my configuration, ... 2011/6/17 Erick Erickson > W

Re: Indexing-speed issues (chart included)

2011-06-17 Thread Erick Erickson
Well, it's kinda hard to say anything pertinent with so little information. How are you indexing things? What kind of documents? How are you feeding docs to Solr? You might review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Fri, Jun 17, 2011 at 8:10 AM, Mark Schoy wrote: > Hi,

RE: Solr index - Size and indexing speed

2009-08-29 Thread Fuad Efendi
>I tried to merge the 15 indexes again, and I found out that the new merged >index (without opitmization) size was about 351 GB , but when I optimize it >the size return back to 411 GB, Why? Just as a sample, IOT in Oracle... Ok, just in a kids-lang, what 'optimization' means? It means that Ma

Re: Solr index - Size and indexing speed

2009-08-29 Thread Yonik Seeley
On Tue, Aug 25, 2009 at 3:30 PM, engy.ali wrote: > >  Summary > === > > I had about 120,000 object of total size 71.2 GB, those objects are already > indexed using Lucene. The index size is about 111 GB. > > I tried to use solr 1.4 nightly build to index the same collection. I > divided

Re: Solr index - Size and indexing speed

2009-08-29 Thread Yonik Seeley
On Sat, Aug 29, 2009 at 7:09 AM, engy.ali wrote: > I thought that optimization would decrease or at least be equal to the same > index size before optimization Some index structures like norms are non-sparse. Index one unique field with norms and there is a byte allocated for every document in th

RE: Solr index - Size and indexing speed

2009-08-29 Thread engy.ali
00; do it once at the end... > > > > -Original Message- > From: engy.ali [mailto:omeshm...@hotmail.com] > Sent: August-25-09 3:31 PM > To: solr-user@lucene.apache.org > Subject: Solr index - Size and indexing speed > > > Summary > === > &

RE: Solr index - Size and indexing speed

2009-08-25 Thread Fuad Efendi
uninverted index) (Yonik), term vectors, stored=true, copyField, etc. Do not do commit per 100; do it once at the end... -Original Message- From: engy.ali [mailto:omeshm...@hotmail.com] Sent: August-25-09 3:31 PM To: solr-user@lucene.apache.org Subject: Solr index - Size and indexing

Solr index - Size and indexing speed

2009-08-25 Thread engy.ali
the size and the result was that the new index is about twice size of old index. DO you have any idea what might be the reason? 2. the indexing speed is slow, 100 object on single solr instance were indexed in 1 hour so i estimated that 1000 on single instance can be done in 10 hours, but that was

Re: mergeFactor / indexing speed

2009-08-09 Thread Avlesh Singh
> > And - indexing 160k documents now takes 5min instead of 1.5h! > Awesome! It works for all! (Now I can go relaxed on vacation. :-D ) > Take me along! Cheers Avlesh On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann < chantal.ackerm...@btelligent.de> wrote: > Juhu, great news, guys. I merged m

Re: mergeFactor / indexing speed

2009-08-07 Thread Chantal Ackermann
Thanks for the tip, Shalin. I'm happy with 6 indexes running in parallel and completing in less than 10min, right now, but I'll have look anyway. Shalin Shekhar Mangar schrieb: On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann < chantal.ackerm...@btelligent.de> wrote: Juhu, great news, guys.

Re: mergeFactor / indexing speed

2009-08-07 Thread Shalin Shekhar Mangar
On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann < chantal.ackerm...@btelligent.de> wrote: > Juhu, great news, guys. I merged my child entity into the root entity, and > changed the custom entityprocessor to handle the additional columns > correctly. > And - indexing 160k documents now takes 5min

Re: mergeFactor / indexing speed

2009-08-07 Thread Chantal Ackermann
Juhu, great news, guys. I merged my child entity into the root entity, and changed the custom entityprocessor to handle the additional columns correctly. And - indexing 160k documents now takes 5min instead of 1.5h! (Now I can go relaxed on vacation. :-D ) Conclusion: In my case performance w

Re: mergeFactor / indexing speed

2009-08-06 Thread Avlesh Singh
> > does DIH call commit periodically, or are things done in one big batch? > AFAIK, one big batch. Cheers Avlesh On Thu, Aug 6, 2009 at 11:23 PM, Yonik Seeley wrote: > On Mon, Aug 3, 2009 at 12:32 PM, Chantal > Ackermann wrote: > > avg-cpu: %user %nice%sys %iowait %idle > > 1

Re: mergeFactor / indexing speed

2009-08-06 Thread Yonik Seeley
On Mon, Aug 3, 2009 at 12:32 PM, Chantal Ackermann wrote: > avg-cpu:  %user   %nice    %sys %iowait   %idle >           1.23    0.00    0.03    0.03   98.71 > > Basically, it is doing very little? *scratch* How often is commit being called? (a Lucene commit sync's all of the index files so a cra

Re: mergeFactor / indexing speed

2009-08-06 Thread Avlesh Singh
> > Do you think it's possible to return (in the nested entity) rows > independent of the unique id, and let the processor decide when a document > is complete? > I don't think so. In my case, I had 9 (JDBC) entities for each document. Most of these entities returned a single column and limited nu

Re: mergeFactor / indexing speed

2009-08-06 Thread Chantal Ackermann
Hi all, to keep this thread up to date... ;-) d) jdbc batch size changed to 10. (Was default: 500, then 1000) The problem with my dih setup is that the root entity query returns a huge set (all ids that shall be indexed). A larger fetchsize would be good for that query. The nested entity, ho

Re: mergeFactor / indexing speed

2009-08-03 Thread Chantal Ackermann
Hi Avlesh, hi Otis, hi Grant, hi all, (enumerating to keep track of all the input) a) mergeFactor 1000 too high I'll change that back to 10. I thought it would make Lucene use more RAM before starting IO. b) ramBufferSize: OK, or maybe more. I'll keep that in mind. c) solrconfig.xml - defau

Re: mergeFactor / indexing speed

2009-08-03 Thread Grant Ingersoll
How big are your documents? I haven't benchmarked DIH, so I am not sure what to expect, but it does seem like something isn't right. Can you fully describe how you are indexing? Have you done any profiling? On Aug 3, 2009, at 12:32 PM, Chantal Ackermann wrote: Hi all, I'm still struggli

Re: mergeFactor / indexing speed

2009-08-03 Thread Otis Gospodnetic
, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: Chantal Ackermann > To: "solr-user@lucene.apache.org" > Sent: Monday, August 3, 2009 12:32:12 PM > Subject: Re: mergeFactor / indexing speed > > Hi all, > > I'm still st

Re: mergeFactor / indexing speed

2009-08-03 Thread Avlesh Singh
> > avg-cpu: %user %nice%sys %iowait %idle > 1.230.000.030.03 98.71 > I agree, real bad statistics, actually. Currently, I've set mergeFactor to 1000 and ramBufferSize to 256MB. > To me the former appears to be too high and latter too low (for your machine configur

  1   2   >