Hi,

Couple more things we learned about blazegraph are that it is pretty bad at
reclaiming free space esp. If you use multiple namespaces, dropping a
namespace will not reduce the size of the journal.
We are also experiencing a deadlock that makes the service totally
unresponsive, we believe that it's triggered by some query load since it
only happens on the public facing nodes[0], the main symptom is the thread
count increasing steadily blocking all the queries.
I would suggest taking a few thread dumps of blazegraph when this happens,
there might be things to learn.
You are very welcome to join on office hours[1] to discuss more about
blazegraph.

Hope it helps,

David.

0: https://phabricator.wikimedia.org/T242453
1: https://www.mediawiki.org/wiki/Wikimedia_Search_Platform#Office_Hours

On Wed, Jan 18, 2023 at 10:57 AM Guillaume Lederrey <[email protected]>
wrote:

> Hello!
>
> Wikidata currently has ~15B triples [1] for an on disk journal size of
> ~1.1TB. We have previously experienced issues with running out of
> allocators (some docs in [2]), which leads to not being able to add more
> triples to the store. We did some limited tuning with the help of a
> Blazegraph expert a while back ([3][4]), but those are not particularly
> well documented.
>
> I would take the 50B triples limit with a grain of salt. I suspect that
> this number comes from a fairly specific data set and workload that might
> not reflect real world usage.
>
> So overall, I'm afraid that we don't have great advice to share. We are
> struggling ourselves with scaling Blazegraph to the size of Wikidata. You
> might want to try the Blazegraph issues on Github [5], but activity there
> is limited.
>
> Good luck! And let us know what you find!
>
>   Have fun!
>
>      Guillaume
>
>
> [1]
> https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&refresh=1m&viewPanel=7
> [2] https://github.com/blazegraph/database/wiki/RWStore
> [3] https://phabricator.wikimedia.org/T213210
> [4] https://phabricator.wikimedia.org/T238362
> [5] https://github.com/blazegraph/database/issues
>
>
>
> On Tue, 17 Jan 2023 at 16:59, Ivan Heibi <[email protected]> wrote:
>
>> Dear Wikimedia,
>> Let me introduce myself first. My name is Ivan Heibi, I am a researcher
>> at the University of Bologna working at OpenCitations (directed by Silvio
>> Peroni) as the responsible of the technical infrastructure.
>>
>> We are currently facing a technical issue while managing our triplestore
>> I wanted to share with you, hoping that maybe your expertise regarding
>> similar issues might give us some new insights to help us deal with it.
>> Thank you in advance for your time and support, here I will briefly explain
>> you the issue.
>>
>> Currently OpenCitations stores and maintain its data (citations and
>> bibliographic metadata) in one big triplestore (JNL format) using the
>> Blazegraph database. The size of the current JNL file has reached almost
>> 1.5T, and this JNL file is regularly updated (almost every two months) with
>> new triples (data regarding new citations). However, it seems that the
>> current JNL file does not accept any further addition of data, yet its size
>> and total number of triples (almost 8 billion) is less than the limits that
>> Blazegraph states (50 billion). Therefore, any attempt to DATA LOAD
>> additional triples to the JNL file makes the process hanging forever, with
>> no effects on the triplestore.
>>
>> We tried to LOAD new data into the JNL file using different properties
>> when lanching the Blazegraph triplestore, yet all the tests we have tried
>> gave us the same negative results.
>>
>> Did you ever face a similar behaviour? are you aware of some limits that
>> Blazegraph has (that we are ignoring)? What are the solutions you have
>> adopted and suggest in order to deal with such issues (in case you have
>> faced such problems)?
>>
>> Thank you in advance for your support and help,
>> Have a nice day,
>> Ivan Heibi
>>
>> ----------------------------------------------------------------
>> Ivan Heibi, Ph.D.
>> Digital Humanities Advanced Research Centre (DHARC),
>> Department of Classical Philology and Italian Studies,
>> University of Bologna, Bologna (Italy)
>>
>> E-mail: [email protected]
>> Twitter: @ivanHeiB <https://twitter.com/ivanheib>
>> Personal web site: ivanhb.it
>> University web page: unibo.it/sitoweb/ivan.heibi2
>> <https://www.unibo.it/sitoweb/ivan.heibi2/>
>> _______________________________________________
>> Discovery mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>>
>
>
> --
> *Guillaume Lederrey* (he/him)
> Engineering Manager
> Wikimedia Foundation <https://wikimediafoundation.org/>
> _______________________________________________
> Discovery mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
_______________________________________________
Discovery mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to