Hi, Couple more things we learned about blazegraph are that it is pretty bad at reclaiming free space esp. If you use multiple namespaces, dropping a namespace will not reduce the size of the journal. We are also experiencing a deadlock that makes the service totally unresponsive, we believe that it's triggered by some query load since it only happens on the public facing nodes[0], the main symptom is the thread count increasing steadily blocking all the queries. I would suggest taking a few thread dumps of blazegraph when this happens, there might be things to learn. You are very welcome to join on office hours[1] to discuss more about blazegraph.
Hope it helps, David. 0: https://phabricator.wikimedia.org/T242453 1: https://www.mediawiki.org/wiki/Wikimedia_Search_Platform#Office_Hours On Wed, Jan 18, 2023 at 10:57 AM Guillaume Lederrey <[email protected]> wrote: > Hello! > > Wikidata currently has ~15B triples [1] for an on disk journal size of > ~1.1TB. We have previously experienced issues with running out of > allocators (some docs in [2]), which leads to not being able to add more > triples to the store. We did some limited tuning with the help of a > Blazegraph expert a while back ([3][4]), but those are not particularly > well documented. > > I would take the 50B triples limit with a grain of salt. I suspect that > this number comes from a fairly specific data set and workload that might > not reflect real world usage. > > So overall, I'm afraid that we don't have great advice to share. We are > struggling ourselves with scaling Blazegraph to the size of Wikidata. You > might want to try the Blazegraph issues on Github [5], but activity there > is limited. > > Good luck! And let us know what you find! > > Have fun! > > Guillaume > > > [1] > https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&refresh=1m&viewPanel=7 > [2] https://github.com/blazegraph/database/wiki/RWStore > [3] https://phabricator.wikimedia.org/T213210 > [4] https://phabricator.wikimedia.org/T238362 > [5] https://github.com/blazegraph/database/issues > > > > On Tue, 17 Jan 2023 at 16:59, Ivan Heibi <[email protected]> wrote: > >> Dear Wikimedia, >> Let me introduce myself first. My name is Ivan Heibi, I am a researcher >> at the University of Bologna working at OpenCitations (directed by Silvio >> Peroni) as the responsible of the technical infrastructure. >> >> We are currently facing a technical issue while managing our triplestore >> I wanted to share with you, hoping that maybe your expertise regarding >> similar issues might give us some new insights to help us deal with it. >> Thank you in advance for your time and support, here I will briefly explain >> you the issue. >> >> Currently OpenCitations stores and maintain its data (citations and >> bibliographic metadata) in one big triplestore (JNL format) using the >> Blazegraph database. The size of the current JNL file has reached almost >> 1.5T, and this JNL file is regularly updated (almost every two months) with >> new triples (data regarding new citations). However, it seems that the >> current JNL file does not accept any further addition of data, yet its size >> and total number of triples (almost 8 billion) is less than the limits that >> Blazegraph states (50 billion). Therefore, any attempt to DATA LOAD >> additional triples to the JNL file makes the process hanging forever, with >> no effects on the triplestore. >> >> We tried to LOAD new data into the JNL file using different properties >> when lanching the Blazegraph triplestore, yet all the tests we have tried >> gave us the same negative results. >> >> Did you ever face a similar behaviour? are you aware of some limits that >> Blazegraph has (that we are ignoring)? What are the solutions you have >> adopted and suggest in order to deal with such issues (in case you have >> faced such problems)? >> >> Thank you in advance for your support and help, >> Have a nice day, >> Ivan Heibi >> >> ---------------------------------------------------------------- >> Ivan Heibi, Ph.D. >> Digital Humanities Advanced Research Centre (DHARC), >> Department of Classical Philology and Italian Studies, >> University of Bologna, Bologna (Italy) >> >> E-mail: [email protected] >> Twitter: @ivanHeiB <https://twitter.com/ivanheib> >> Personal web site: ivanhb.it >> University web page: unibo.it/sitoweb/ivan.heibi2 >> <https://www.unibo.it/sitoweb/ivan.heibi2/> >> _______________________________________________ >> Discovery mailing list -- [email protected] >> To unsubscribe send an email to [email protected] >> > > > -- > *Guillaume Lederrey* (he/him) > Engineering Manager > Wikimedia Foundation <https://wikimediafoundation.org/> > _______________________________________________ > Discovery mailing list -- [email protected] > To unsubscribe send an email to [email protected] >
_______________________________________________ Discovery mailing list -- [email protected] To unsubscribe send an email to [email protected]
