I want to test virtuoso performance with a data set that consists of
approximately 100 million triples. I read through the virtuoso web scale pdf
which says 16Gb of RAM per 1 billion triple is a reasonable estimate.
1. What is the basis of this calculation?
2. Is this a linear relationship between no of triples and RAM?
3. When you say 16GB RAM, how much of the machine RAM is assigned to
virtuoso in terms of no of buffers?
4. Would a single instance of virtuoso handle 100 Million triple or should I
go for a cluster set up? If so, how many triples per cluster node? In
general how is this decided?
5. I am planing to do this in Amazon EC2 infrastructure and will be very
glad if some tells me what machine config would be the best for this test
:). If anyone is interested, I can share the test results with you guys once
I am done as I will be loading public life sciences data for this test.
--
Cheers,
Abhi