Hi,
I found that in the IndexMergeTool.java, we found that there is this line
which set the maxNumSegments to 1
writer.forceMerge(1);
For this, does it means that there will always be only 1 segment after the
merging?
Is there any way which we can allow the merging to be in multiple segment,
w
Hi Shawn,
Thanks for the info. We will most likely be doing sharding when we migrate
to Solr 7.1.0, and re-index the data.
But as Solr 7.1.0 is still not ready to index EML files yet due to this
JIRA, https://issues.apache.org/jira/browse/SOLR-11622, we have to make use
with our current Solr 6.5.
On 11/22/2017 6:19 PM, Zheng Lin Edwin Yeo wrote:
I'm doing the merging on the SSD drive, the speed should be ok?
The speed of virtually all modern disks will have almost no influence on
the speed of the merge. The bottleneck isn't disk transfer speed, it's
the operation of the merge code in
Hi Erick,
Yes, we are planning to do sharding when we upgrade to the newer Solr
7.1.0, and probably will re-index everything. But currently we are waiting
for certain issues on indexing the EML files to Solr 7.1.0 to be addressed
first, like for this JIRA, https://issues.apache.org/jira/browse/SOL
Sure, sharding can give you accurate faceting, although do note there
are nuances, JSON faceting can occasionally be not exact, although
there are JIRAs being worked on to correct this.
"traditional" faceting has a refinement phase that gets accurate counts.
But the net-net is that I believe your
I'm doing the merging on the SSD drive, the speed should be ok?
We need to merge because the data are indexed in two different collections,
and we need them to be under the same collection, so that we can do things
like faceting more accurately.
Will sharding alone achieve this? Or do we have to m
Really, let's back up here though. This sure seems like an XY problem.
You're merging indexes that will eventually be something on the order
of 3.5TB. I claim that an index of that size is very difficult to work
with effectively. _Why_ do you want to do this? Do you have any
evidence that you'll be
On 11/21/2017 9:10 AM, Zheng Lin Edwin Yeo wrote:
> I am using the IndexMergeTool from Solr, from the command below:
>
> java -classpath lucene-core-6.5.1.jar;lucene-misc-6.5.1.jar
> org.apache.lucene.misc.IndexMergeTool
>
> The heap size is 32GB. There are more than 20 million documents in the two
Hi Emir,
Yes, I am running the merging on a Windows machine.
The hard disk is a SSD disk in NTFS file system.
Regards,
Edwin
On 22 November 2017 at 16:50, Emir Arnautović
wrote:
> Hi Edwin,
> Quick googling suggests that this is the issue of NTFS related to large
> number of file fragments cau
Hi Edwin,
Quick googling suggests that this is the issue of NTFS related to large number
of file fragments caused by large number of files in one directory of huge
files. Are you running this merging on a Windows machine?
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr &
Hi,
I have encountered this error during the merging of the 3.5TB of index.
What could be the cause that lead to this?
Exception in thread "main" Exception in thread "Lucene Merge Thread #8"
java.io.
IOException: background merge hit exception: _6f(6.5.1):C7256757
_6e(6.5.1):C646
2072 _6d(6.5.1
I am using the IndexMergeTool from Solr, from the command below:
java -classpath lucene-core-6.5.1.jar;lucene-misc-6.5.1.jar
org.apache.lucene.misc.IndexMergeTool
The heap size is 32GB. There are more than 20 million documents in the two
cores.
Regards,
Edwin
On 21 November 2017 at 21:54, Sha
On 11/20/2017 9:35 AM, Zheng Lin Edwin Yeo wrote:
Does anyone knows how long usually the merging in Solr will take?
I am currently merging about 3.5TB of data, and it has been running for
more than 28 hours and it is not completed yet. The merging is running on
SSD disk.
The following will app
Hi Edwin,
I’ll let somebody with more knowledge about merge to comment merge aspects.
What do you use to merge those cores - merge tool or you run it using Solr’s
core API? What is the heap size? How many documents are in those two cores?
Regards,
Emir
--
Monitoring - Log Management - Alerting -
Hi Emir,
Thanks for your reply.
There are only 1 host, 1 nodes and 1 shard for these 3.5TB.
The merging has already written the additional 3.5TB to another segment.
However, it is still not a single segment, and the size of the folder where
the merged index is supposed to be is now 4.6TB, This ex
Hi Edwin,
How many host/nodes/shard are those 3.5TB? I am not familiar with merge code,
but trying to think what it might include, so don’t take any of following as
ground truth.
Merging for sure will include segments rewrite, so you better have additional
3.5TB if you are merging it to a single
16 matches
Mail list logo