Hi David,
Yes, I do have this field "_root_" in the schema.
However, I don't think I have use the field, and there is no difference in
the indexing speed after I remove the field.
Regards,
Edwin
On Wed, 3 Apr 2019 at 22:57, David Smiley wrote:
> Hi Edwin,
>
> I&
t who knows.
~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley
On Tue, Apr 2, 2019 at 10:17 PM Zheng Lin Edwin Yeo
wrote:
> Hi,
>
> I am setting up the latest Solr 8.0.0, and I am re-indexing the data from
> scratch in Solr 8.0.0
>
>
What/where is this benchmark? I recall once Ishan was working with a
volunteer to set up something like Lucene has but sadly it was not
successful
On Wed, Apr 3, 2019 at 6:04 AM Đạt Cao Mạnh wrote:
> Hi guys,
>
> I'm seeing the same problems with Shalin nightly indexing benchmark. This
> happen
On Wed, 2019-04-03 at 18:04 +0800, Zheng Lin Edwin Yeo wrote:
> I have tried to set all the docValues in my schema.xml to false and
> do the indexing again.
> There isn't any difference with the indexing speed as compared to
> when we have enabled the docValues.
Thank you for sp
Hi Toke,
I have tried to set all the docValues in my schema.xml to false and do the
indexing again.
There isn't any difference with the indexing speed as compared to when we
have enabled the docValues.
Seems like the cause of the regression might be somewhere else?
Regards,
Edwin
On Wed,
Hi guys,
I'm seeing the same problems with Shalin nightly indexing benchmark. This
happen around this period
git log --before=2018-12-07 --after=2018-11-21
On Wed, Apr 3, 2019 at 8:45 AM Toke Eskildsen wrote:
> On Wed, 2019-04-03 at 15:24 +0800, Zheng Lin Edwin Yeo wrote:
> > Yes, I am using Do
On Wed, 2019-04-03 at 15:24 +0800, Zheng Lin Edwin Yeo wrote:
> Yes, I am using DocValues for most of my fields.
So that's a culprit. Thank you.
> Currently we can't share the test data yet as some of the records are
> sensitive. Do you have any data from CSV file that you can test?
Not really.
Yes, I am using DocValues for most of my fields.
I am using dynamicField, in which I have appended the field name with
things like _s, _i, etc in the CSV file.
Currently we can't share the test data yet as some of the recor
On Wed, 2019-04-03 at 10:17 +0800, Zheng Lin Edwin Yeo wrote:
> What could be the reason that causes the indexing to be slower in
> Solr 8.0.0?
As Aroop states there can be multiple explanations. One of them is the
change to how DocValues are handled in 8.0.0. The indexing impact
should be tiny, b
code from the SimplePostTools.
I have already tried it more than 10 times, and for all the time that I
tried, the indexing speed in 8.0 are all at least 40% slower than 7.7.1
Regards,
Edwin
On Wed, 3 Apr 2019 at 11:19, Aroop Ganguly wrote:
> Indexing speeds are function of a lot of va
>> Hi,
>>
>> I am setting up the latest Solr 8.0.0, and I am re-indexing the data from
>> scratch in Solr 8.0.0
>>
>> However, I found that the indexing speed is slower in Solr 8.0.0, as
>> compared to the earlier version like Solr 7.7.1. I have not
atch in Solr 8.0.0
>
> However, I found that the indexing speed is slower in Solr 8.0.0, as
> compared to the earlier version like Solr 7.7.1. I have not changed the
> schema.xml and solrconfig.xml yet, just did a change of the
> luceneMatchVersion in solrconfig.xml to 8.0.0
>
Hi,
I am setting up the latest Solr 8.0.0, and I am re-indexing the data from
scratch in Solr 8.0.0
However, I found that the indexing speed is slower in Solr 8.0.0, as
compared to the earlier version like Solr 7.7.1. I have not changed the
schema.xml and solrconfig.xml yet, just did a change of
On 1/1/2019 8:59 AM, John Milton wrote:
My document contains 65 fields. All the fields needs to be indexed. But for
the 100 documents takes 10 seconds for indexing.
I am using Solr 7.5 (2 cloud instance), with 50 shards.
The best way to achieve fast indexing in Solr is to index multiple items
Java heap space 15 GB.
How to improve indexing speed?
Note :
All the fields contains maximum 20 characters only. Field type is text
general with case insensitive.
Thanks,
John Milton
> It's running on Windows OS and it has 32 GB RAM. Java heap space 15 GB.
> How to improve indexing speed?
> Note :
> All the fields contains maximum 20 characters only. Field type is text
> general with case insensitive.
>
> Thanks,
> John Milton
Hi to all,
My document contains 65 fields. All the fields needs to be indexed. But for
the 100 documents takes 10 seconds for indexing.
I am using Solr 7.5 (2 cloud instance), with 50 shards.
It's running on Windows OS and it has 32 GB RAM. Java heap space 15 GB.
How to improve indexing
Thank you for your advice on gc tools, what do you suggest to me?
2018-02-28 23:57 GMT+08:00 Shawn Heisey :
> On 2/28/2018 2:53 AM, 苗海泉 wrote:
>
>> Thanks for your detailed advice, the monitor product you are talking about
>> is good, but our solr system is running on a private network and seems
On 2/28/2018 2:53 AM, 苗海泉 wrote:
Thanks for your detailed advice, the monitor product you are talking about
is good, but our solr system is running on a private network and seems to
be unusable at all, with no single downloadable application for analyzing
specific gc logs.
For analyzing GC logs
If you are after only visualising GC, there are several tools that you can
download or upload logs to visualise. If you would like to monitor all
host/solr/jvm, Sematext’s SPM also comes in on-premises version, where you
install and host your own monitoring infrastructure:
https://sematext.com
Thanks for your detailed advice, the monitor product you are talking about
is good, but our solr system is running on a private network and seems to
be unusable at all, with no single downloadable application for analyzing
specific gc logs.
2018-02-28 16:57 GMT+08:00 Emir Arnautović :
> Hi,
> I w
Hi,
I would start with following:
1. have dedicated nodes for ZK ensemble - those do not have to be powerful
nodes (maybe 2-4 cores and 8GB RAM)
2. reduce heap size to value below margin where JVM can use compressed oops -
31GB should be safe size
3. shard collection to all nodes
4. increase roll
Thank you, I read under the memory footprint, I set 75% recovery, memory
occupancy at about 76%, the other we zookeeper not on a dedicated server,
perhaps because of this cause instability.
What else do you recommend for me to check?
2018-02-27 22:37 GMT+08:00 Emir Arnautović :
> This does not s
This does not show much: only that your heap is around 75% (24-25GB). I was
thinking that you should compare metrics (heap/GC as well) when running on
without issues and when running with issues and see if something can be
concluded.
About instability: Do you run ZK on dedicated nodes?
Emir
--
Thank you, we were 49 shard 49 nodes, but later found that in this case,
often disconnect between solr and zookeepr, zookeeper too many nodes caused
solr instability, so reduced to 25 A follow-up performance can not keep up
also need to increase back.
Very slow when solr and zookeeper not found an
Ah, so there are ~560 shards per node and not all nodes are indexing at the
same time. Why is that? You can have better throughput if indexing on all
nodes. If happy with shard size, you can create new collection with 49 shards
every 2h and have everything the same and index on all nodes.
Back
Thanks for you reply again.
I just said that you may have some misunderstanding, we have 49 solr nodes,
each collection has 25 shards, each shard has only one replica of the data,
there is no copy, and I reduce the part of the cache. If you need the
metric data, I can check Come out to tell you, i
Hi,
It is hard to tell without looking more into your metrics. It seems to me that
you are reaching limits of your cluster. I would doublecheck if memory is the
issue. If I got it right, you have ~1120 shards per node. It takes some heap
just to keep them open. If you have some caches enabled an
In addition, we found that the rate was normal when the number of
collections was kept below 936 and the speed was slower and slower at 984.
Therefore, we could only temporarily delete the older collection, but now
we need more Online collection, there has been no good way to confuse us
for a long
Thank you for reply.
One collection has 25 shard one replica, one solr node has about 5T on desk.
GC is checked ,and modify as follow :
SOLR_JAVA_MEM="-Xms32768m -Xmx32768m "
GC_TUNE=" \
-XX:+UseG1GC \
-XX:+PerfDisableSharedMem \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPaus
Hi,
To get more complete picture, can you tell us how many shards/replicas do you
have per collection? Also what is index size on disk? Did you check GC?
BTW, using 32GB heap prevents you from using compressed oops, resulting in less
memory available than 31GB.
Thanks,
Emir
--
Monitoring - Log
I encountered a more serious problem in the process of using solr. We use
the solr version is 6.0, our daily amount of data is about 500 billion
documents, create a collection every hour, the online collection of more
than a thousand, 49 solr nodes. If the collection in less than 800, the
speed is
ching speed is
quite fast currently, even during indexing. It is the indexing speed that
is slow.
Regards,
Edwin
On 7 May 2017 at 21:14, Shawn Heisey wrote:
> On 5/6/2017 6:49 PM, Zheng Lin Edwin Yeo wrote:
> > For my rich documentation handling, I'm using Extracting Reques
On 5/6/2017 6:49 PM, Zheng Lin Edwin Yeo wrote:
> For my rich documentation handling, I'm using Extracting Request Handler, and
> it requires OCR.
>
> However, currently, for the slow indexing speed which I'm experiencing, the
> indexing is done directly from the Syba
Hi Shawn,
For my rich documentation handling, I'm using Extracting Request Handler,
and it requires OCR.
However, currently, for the slow indexing speed which I'm experiencing, the
indexing is done directly from the Sybase database. I will fetch about 1000
records at a time from S
On 5/1/2017 10:17 AM, Zheng Lin Edwin Yeo wrote:
> I'm using Solrj for the indexing, not using curl. Normally I bundle
> about 1000 documents for each POST. There's more than 300GB of RAM for
> that server, and I do not use any sharing at the moment.
Looking over your email history on the list, I
10:39:29 PM EDT, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com> wrote:
> >Hi,
> >
> >I'm using Solr 6.4.2.
> >
> >Would like to check, if there are alot of collections in my Solr which
> >has
> >very large index size, will the indexing speed be affected
Edwin Yeo
wrote:
>Hi,
>
>I'm using Solr 6.4.2.
>
>Would like to check, if there are alot of collections in my Solr which
>has
>very large index size, will the indexing speed be affected?
>
>Currently, I have created a new collections in Solr which has several
&g
Hi,
I'm using Solr 6.4.2.
Would like to check, if there are alot of collections in my Solr which has
very large index size, will the indexing speed be affected?
Currently, I have created a new collections in Solr which has several
collections with very large index size, and the indexing
This is my comparison of the indexing speed with and without Tesseract OCR.
The smaller file is taking longer to index, probably because there are more
text to do the OCR, as compared to the bigger file, which has lesser text.
Is that usually the case?
*With Tesseract OCR*
174KB - 5.20 sec
Yes, that would seem an accurate assessment of the problem.
-Original Message-
From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com]
Sent: Thursday, 30 March 2017 4:53 p.m.
To: solr-user@lucene.apache.org
Subject: Re: Indexing speed reduced significantly with OCR
Thanks for your reply
om]
> Sent: Thursday, March 30, 2017 7:37 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Indexing speed reduced significantly with OCR
>
> The workflow is
> -/ OCR new documents
> -/ check quality and tune until you get good output text -/ keep the output
> text in the
> Note that the OCRing is a separate task from Solr indexing, and is best done
> on separate machines.
+1
-Original Message-
From: Rick Leir [mailto:rl...@leirtech.com]
Sent: Thursday, March 30, 2017 7:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Indexing speed r
The workflow is
-/ OCR new documents
-/ check quality and tune until you get good output text
-/ keep the output text in the file system
-/ index and re-index to Solr as necessary from the file system
Note that the OCRing is a separate task from Solr indexing, and is best done on
separate mach
Thanks for your reply.
>From what I see, getting more hardware to do the OCR is inevitable?
Even if we run the OCR outside of Solr indexing stream, it will still take
a long time to process it if it is on just one machine. And we still need
to wait for the OCR to finish converting before we can r
Well I haven’t had to deal with a problem that size, but it seems to me that
you have little alternative except through more computer hardware at it. For
the job I did, I OCRed to convert PDF to searchable PDF outside the indexing
workflow. I used pdftotext utility to extract text from pdf. If t
28, 2017, at 2:52 AM, Zheng Lin Edwin Yeo wrote:
>
> Hi,
>
> Do you have suggestions that we can do to cope with the expensive process
> of indexing documents which requires OCR.
>
> For my current situation, the indexing takes about 2 weeks to complete. If
> the avera
Hi,
Do you have suggestions that we can do to cope with the expensive process
of indexing documents which requires OCR.
For my current situation, the indexing takes about 2 weeks to complete. If
the average indexing speed is say to be 50 times slower, it means it will
require 100 weeks to index
with OCR can be 100 times slower than indexing a PDF
> that is searchable (text extractable without OCR).
>
> -Original Message-
> From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com]
> Sent: Tuesday, 28 March 2017 4:13 p.m.
> To: solr-user@lucene.apache.org
> S
: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com]
Sent: Tuesday, 28 March 2017 4:13 p.m.
To: solr-user@lucene.apache.org
Subject: Indexing speed reduced significantly with OCR
Hi,
Does the indexing speed of Solr reduced significantly when we are using
Tesseract OCR to extract scanned inline
Hi,
Does the indexing speed of Solr reduced significantly when we are using
Tesseract OCR to extract scanned inline images from PDF?
I found that after I implement the solution to extract those scanned images
from PDF, the indexing speed is now slower by almost more than 10 times.
I'm
Hi Shawn,
Thanks for the information.
Regards,
Edwin
On 14 October 2016 at 20:19, Shawn Heisey wrote:
> On 10/13/2016 9:58 PM, Zheng Lin Edwin Yeo wrote:
> > Thanks for the reply Shawn. Currently, my heap allocation to each Solr
> > instance is 22GB. Is that big enough?
>
> I can't answer tha
On 10/13/2016 9:58 PM, Zheng Lin Edwin Yeo wrote:
> Thanks for the reply Shawn. Currently, my heap allocation to each Solr
> instance is 22GB. Is that big enough?
I can't answer that question. I know little about your install. Even
if I *did* know a few more things about your install, I could o
Thanks for the reply Shawn.
Currently, my heap allocation to each Solr instance is 22GB.
Is that big enough?
Regards,
Edwin
On 13 October 2016 at 23:56, Shawn Heisey wrote:
> On 10/13/2016 9:20 AM, Zheng Lin Edwin Yeo wrote:
> > Would like to find out, will the indexing speed in a c
On 10/13/2016 9:20 AM, Zheng Lin Edwin Yeo wrote:
> Would like to find out, will the indexing speed in a collection with a
> very large index size be much slower than one which is still empty or
> a very small index size? This is assuming that the configurations,
> indexing code and
Hi,
Would like to find out, will the indexing speed in a collection with a very
large index size be much slower than one which is still empty or a very
small index size? This is assuming that the configurations, indexing code
and the files to be indexed are the same.
Currently, I have a setup in
t. That's what
> doing the parsing.
>
> Regards,
> Alex
> On 3 May 2016 7:53 pm, "Zheng Lin Edwin Yeo" wrote:
>
> > Hi,
> >
> > I would like to find out, if the presence of inline images in EML files
> > will slow down the indexing speed sign
L files
> will slow down the indexing speed significantly?
>
> Even though the content of the EML files are in Plain Text instead of HTML.
> but I still found that the indexing performance is not up to expectation
> yet. Average speed which I'm getting are around 0.3GB/hr.
>
&
Hi,
I would like to find out, if the presence of inline images in EML files
will slow down the indexing speed significantly?
Even though the content of the EML files are in Plain Text instead of HTML.
but I still found that the indexing performance is not up to expectation
yet. Average speed
d over time, optimizing your collection(s) may help.
> On Apr 14, 2016 3:52 AM, "Emir Arnautovic"
> wrote:
>
>> Hi Edwin,
>> Indexing speed depends on multiple factors: HW, Solr configurations and
>> load, documents, indexing client: More complex documents, more CPU
If you delete a lot of documents over time, or if you add updated documents
of the same I'd over time, optimizing your collection(s) may help.
On Apr 14, 2016 3:52 AM, "Emir Arnautovic"
wrote:
> Hi Edwin,
> Indexing speed depends on multiple factors: HW, Solr configurations
Hi Edwin,
Indexing speed depends on multiple factors: HW, Solr configurations and
load, documents, indexing client: More complex documents, more CPU time
to process each document before indexing structure is written down to
disk. Bigger the index, more heap is used, more frequent GCs. Maybe
Hi,
Would like to find out, what is the optimal indexing speed in Solr?
Previously, I managed to get more than 3GB/hour, but now the speed has drop
to 0.7GB/hr. What could be the potential reason behind this?
Besides the index size getting bigger, I have only added in more
collections into the
t two cases. The raw
overhead imposed by Solr is probably your third case.
Yes, slowest replica determines indexing speed. To guarantee data
isn't lost, the process is:
> leader receives updates.
> leader indexes locally _and_ forwards docs to follower
> follower acks back to leade
I'm conducting some indexing experiments in SolrCloud and I want to confirm
my conclusions and ask for suggestions on how to improve performance.
My setup includes a single-sharded collection with 1 additional replica in
SolrCloud 5.3.1. I'm using SolrJ and the indexing speed refers to
nd commit time are set
> to very large numbers.
>
> I have tried indexing a test set of csv files which contains 1.44M records
> (total size 21MB). All my tests have been on different types of Amazon ec2
> instances - e.g. m1.xlarge (4vCPU, 15GB RAM) and m3.2xlarge(8vCPU, 30GB
) and m3.2xlarge(8vCPU, 30GB
RAM).
I have set my jvm heap size large enough and tuned gc parameters as seen on
various forums.
Observations:
1. My indexing speed for 1.44M records (or row in CSV file) is 240s on the
m1.xlarge instance and 160s on the m3.2xlarge instance.
2. The indexing speed is
to use more cores you need to use solrj. Or maybe
> more than one DIH and more cores of course.
>
> Primoz
>
>
>
> From: Giovanni Bricconi
> To: solr-user
> Date: 16.10.2013 16:25
> Subject: howto increase indexing speed?
>
>
>
> I
increase indexing speed?
I have a small solr setup, not even on a physical machine but a vmware
virtual machine with a single cpu that reads data using DIH from a
database. The machine has no phisical disks attached but stores data on a
netapp nas.
Currently this machine indexes 320 documents/sec
I have a small solr setup, not even on a physical machine but a vmware
virtual machine with a single cpu that reads data using DIH from a
database. The machine has no phisical disks attached but stores data on a
netapp nas.
Currently this machine indexes 320 documents/sec, not bad but we plan to
d
Now running the tests on a slightly reduced setup (2 machines, quadcore,
8GB ram ...), but that doesnt matter
We see that storing/indexing speed drops when using
IndexWriter.updateDocument in DirectUpdateHandler2.addDoc. But it does
not drop when just using IndexWriter.addDocument (update
On Fri, 2013-09-13 at 17:32 +0200, Shawn Heisey wrote:
> Put your OS and Solr itself on regular disks in RAID1 and your Solr data
> on the SSD. Due to the eventual decay caused by writes, SSD will
> eventually die, so be ready for SSD failures to take out shard replicas.
One of the very useful
On 9/13/2013 12:03 AM, Per Steffensen wrote:
What is it that will fill my heap? I am trying to avoid the FieldCache.
For now, I am actually not doing any searches - focus on indexing for
now - and certainly not group/facet/sort searches that will use the
FieldCache.
I don't know what makes up t
On 9/12/13 4:26 PM, Shawn Heisey wrote:
On 9/12/2013 2:14 AM, Per Steffensen wrote:
Starting from an empty collection. Things are fine wrt
storing/indexing speed for the first two-three hours (100M docs per
hour), then speed goes down dramatically, to an, for us, unacceptable
level (max 10M per
On 9/12/2013 2:14 AM, Per Steffensen wrote:
>> Starting from an empty collection. Things are fine wrt
>> storing/indexing speed for the first two-three hours (100M docs per
>> hour), then speed goes down dramatically, to an, for us, unacceptable
>> level (max 10M per ho
* doccount.png: Measured number of doc in Solr collection
Starting from an empty collection. Things are fine wrt
storing/indexing speed for the first two-three hours (100M docs per
hour), then speed goes down dramatically, to an, for us, unacceptable
level (max 10M per hour). At the same time as speed goes
: Measured number of doc in Solr collection
Starting from an empty collection. Things are fine wrt
storing/indexing speed for the first two-three hours (100M docs per
hour), then speed goes down dramatically, to an, for us, unacceptable
level (max 10M per hour). At the same time as speed goes down, we
images
* iowait.png: Measured I/O wait on the Solr machines
* doccount.png: Measured number of doc in Solr collection
Starting from an empty collection. Things are fine wrt storing/indexing
speed for the first two-three hours (100M docs per hour), then speed
goes down dramatically, to an, for us
Sorry, here are some details:
requestHandler: XmlUpdateRequesetHandler
protocol: http (10 concurrend threads)
document: 1kb size, 15 fields
cpu load: 20%
memory usage: 50%
But generally speaking, is that normal or must be something wrong with my
configuration, ...
2011/6/17 Erick Erickson
>
No, generally this isn't what I'd expect. There will be periodic
slowdowns when segments are flushed (I'm assuming
you're not using trunk, there have been speedups here, see:
http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-have-i-can-use-them/)
Does your config have any parameters set? Y
Sorry, here are some details:
requestHandler: XmlUpdateRequesetHandler
protocol: http (10 concurrend threads)
document: 1kb size, 15 fields
cpu load: 20%
memory usage: 50%
But generally speaking, is that normal or must be something wrong with my
configuration, ...
2011/6/17 Erick Erickson
> W
Well, it's kinda hard to say anything pertinent with so little
information. How are you indexing things? What kind of documents?
How are you feeding docs to Solr?
You might review:
http://wiki.apache.org/solr/UsingMailingLists
Best
Erick
On Fri, Jun 17, 2011 at 8:10 AM, Mark Schoy wrote:
> Hi,
>I tried to merge the 15 indexes again, and I found out that the new merged
>index (without opitmization) size was about 351 GB , but when I optimize it
>the size return back to 411 GB, Why?
Just as a sample, IOT in Oracle...
Ok, just in a kids-lang, what 'optimization' means? It means that Ma
On Tue, Aug 25, 2009 at 3:30 PM, engy.ali wrote:
>
> Summary
> ===
>
> I had about 120,000 object of total size 71.2 GB, those objects are already
> indexed using Lucene. The index size is about 111 GB.
>
> I tried to use solr 1.4 nightly build to index the same collection. I
> divided
On Sat, Aug 29, 2009 at 7:09 AM, engy.ali wrote:
> I thought that optimization would decrease or at least be equal to the same
> index size before optimization
Some index structures like norms are non-sparse. Index one unique
field with norms and there is a byte allocated for every document in
th
00; do it once at the end...
>
>
>
> -Original Message-
> From: engy.ali [mailto:omeshm...@hotmail.com]
> Sent: August-25-09 3:31 PM
> To: solr-user@lucene.apache.org
> Subject: Solr index - Size and indexing speed
>
>
> Summary
> ===
>
&
uninverted index) (Yonik), term
vectors, stored=true, copyField, etc.
Do not do commit per 100; do it once at the end...
-Original Message-
From: engy.ali [mailto:omeshm...@hotmail.com]
Sent: August-25-09 3:31 PM
To: solr-user@lucene.apache.org
Subject: Solr index - Size and indexing
the size and the result was that the new index is about twice size of
old index.
DO you have any idea what might be the reason?
2. the indexing speed is slow, 100 object on single solr instance were
indexed in 1 hour so i estimated that 1000 on single instance can be done in
10 hours, but that was
>
> And - indexing 160k documents now takes 5min instead of 1.5h!
>
Awesome! It works for all!
(Now I can go relaxed on vacation. :-D )
>
Take me along!
Cheers
Avlesh
On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann <
chantal.ackerm...@btelligent.de> wrote:
> Juhu, great news, guys. I merged m
Thanks for the tip, Shalin. I'm happy with 6 indexes running in parallel
and completing in less than 10min, right now, but I'll have look anyway.
Shalin Shekhar Mangar schrieb:
On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann <
chantal.ackerm...@btelligent.de> wrote:
Juhu, great news, guys.
On Fri, Aug 7, 2009 at 3:58 PM, Chantal Ackermann <
chantal.ackerm...@btelligent.de> wrote:
> Juhu, great news, guys. I merged my child entity into the root entity, and
> changed the custom entityprocessor to handle the additional columns
> correctly.
> And - indexing 160k documents now takes 5min
Juhu, great news, guys. I merged my child entity into the root entity,
and changed the custom entityprocessor to handle the additional columns
correctly.
And - indexing 160k documents now takes 5min instead of 1.5h!
(Now I can go relaxed on vacation. :-D )
Conclusion:
In my case performance w
>
> does DIH call commit periodically, or are things done in one big batch?
>
AFAIK, one big batch.
Cheers
Avlesh
On Thu, Aug 6, 2009 at 11:23 PM, Yonik Seeley wrote:
> On Mon, Aug 3, 2009 at 12:32 PM, Chantal
> Ackermann wrote:
> > avg-cpu: %user %nice%sys %iowait %idle
> > 1
On Mon, Aug 3, 2009 at 12:32 PM, Chantal
Ackermann wrote:
> avg-cpu: %user %nice %sys %iowait %idle
> 1.23 0.00 0.03 0.03 98.71
>
> Basically, it is doing very little? *scratch*
How often is commit being called? (a Lucene commit sync's all of the
index files so a cra
>
> Do you think it's possible to return (in the nested entity) rows
> independent of the unique id, and let the processor decide when a document
> is complete?
>
I don't think so.
In my case, I had 9 (JDBC) entities for each document. Most of these
entities returned a single column and limited nu
Hi all,
to keep this thread up to date... ;-)
d) jdbc batch size
changed to 10. (Was default: 500, then 1000)
The problem with my dih setup is that the root entity query returns a
huge set (all ids that shall be indexed). A larger fetchsize would be
good for that query.
The nested entity, ho
Hi Avlesh,
hi Otis,
hi Grant,
hi all,
(enumerating to keep track of all the input)
a) mergeFactor 1000 too high
I'll change that back to 10. I thought it would make Lucene use more RAM
before starting IO.
b) ramBufferSize:
OK, or maybe more. I'll keep that in mind.
c) solrconfig.xml - defau
How big are your documents? I haven't benchmarked DIH, so I am not
sure what to expect, but it does seem like something isn't right. Can
you fully describe how you are indexing? Have you done any profiling?
On Aug 3, 2009, at 12:32 PM, Chantal Ackermann wrote:
Hi all,
I'm still struggli
, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
- Original Message
> From: Chantal Ackermann
> To: "solr-user@lucene.apache.org"
> Sent: Monday, August 3, 2009 12:32:12 PM
> Subject: Re: mergeFactor / indexing speed
>
> Hi all,
>
> I'm still st
>
> avg-cpu: %user %nice%sys %iowait %idle
> 1.230.000.030.03 98.71
>
I agree, real bad statistics, actually.
Currently, I've set mergeFactor to 1000 and ramBufferSize to 256MB.
>
To me the former appears to be too high and latter too low (for your machine
configur
1 - 100 of 108 matches
Mail list logo