Re: [Virtuoso-users] best way to update large RDF stores with triples of a large document

Gang Fu Tue, 21 Apr 2015 14:43:30 -0700

Thank you very much, Hugh!

On Sat, Apr 18, 2015 at 12:57 PM, Hugh Williams <hwilli...@openlinksw.com>
wrote:


> Hi Gang,
>
> The “with_delete” option is only designed to work with nquads with all
> nquads of the same graph name being in the same dataset file. You can use
> sparql delete for for large changes to the database it will be rather
> inefficient as even when updating an existing triples  it will always
> delete and re-insert them which is not very efficient and is why the
> “with_delete” option was implemented in the first place …
>
> Best Regards
> Hugh Williams
> Professional Services
> OpenLink Software, Inc.      //              http://www.openlinksw.com/
> Weblog   -- http://www.openlinksw.com/blogs/
> LinkedIn -- http://www.linkedin.com/company/openlink-software/
> Twitter  -- http://twitter.com/OpenLink
> Google+  -- http://plus.google.com/100570109519069333827/
> Facebook -- http://www.facebook.com/OpenLinkSoftware
> Universal Data Access, Integration, and Management Technology Providers
>
> On 17 Apr 2015, at 13:29, Gang Fu <gangfu1...@gmail.com> wrote:
>
> Thank you very much, Hugh! You are right, we are going to update the large
> data set on weekly basis, and the updates are on the scale of a couple of
> millions, or less than that. The bulk loader with delete option sounds good
> for me, but only the nquad files are allowed. Our input files are in ttl,
> which are dumped from sql database. Preparing another set of dump scripts
> is not good....converting ttl to nquad requires extra step in the
> pipeline....is there a way to do rdf loader 'with delete' with ttl files?
> Otherwise, I think sparql delete is better for us, since no extra efforts
> are needed.
>
> On Wed, Apr 8, 2015 at 12:59 PM, Hugh Williams <hwilli...@openlinksw.com>
> wrote:
>
>> Hi Gang,
>>
>> To be clear when you say "I want to update a large RDF store with 10
>> billions triples once a week" , presume you are *NOT* loading 10billion new
>> triples every week, but rather the base 10billion triples are to be updated
>> which triples/graphs being inserted/deleted/updated, thus  the overall
>> number of triples does increase (or decrease) on that scale ?
>>
>> As if these updates are in the form of documents ie datasets and if they
>> or can be converted to nquad format to meet the requirements of the
>> Virtuoso RDF Bulk Loder "with_delete" [1] option, then this would be the
>> most the fastest and most efficient way to do this I would say ...
>>
>> [1]
>> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFBulkLoaderWithDelete
>>
>> Best Regards
>> Hugh Williams
>> Professional Services
>> OpenLink Software, Inc.      //              http://www.openlinksw.com/
>> Weblog   -- http://www.openlinksw.com/blogs/
>> LinkedIn -- http://www.linkedin.com/company/openlink-software/
>> Twitter  -- http://twitter.com/OpenLink
>> Google+  -- http://plus.google.com/100570109519069333827/
>> Facebook -- http://www.facebook.com/OpenLinkSoftware
>> Universal Data Access, Integration, and Management Technology Providers
>>
>> On 8 Apr 2015, at 12:27, Gang Fu <gangfu1...@gmail.com> wrote:
>>
>> using isql or jdbc or http will make any difference?
>>
>> On Wed, Apr 8, 2015 at 7:25 AM, Gang Fu <gangfu1...@gmail.com> wrote:
>>
>>> There are millions of triples to be updated on weekly basis.
>>>
>>> On Wed, Apr 8, 2015 at 7:24 AM, Gang Fu <gangfu1...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to update a large RDF store with 10 billions triples once a week.
>>>> The triples to be inserted or deleted are save in documents.
>>>> There is no variable binding or blank nodes in the documents.
>>>> So I guess the best fit sparql update functions are
>>>> insert data/delete data
>>>>
>>>> What is the best way to do this?
>>>> Using JDBC connection pool or http?
>>>> Using 'modify graph <graph-iri> insert/delete', or insert/delete data?
>>>> Is it possible to run concurrent update jobs?
>>>>
>>>>
>>>> Best,
>>>> Gang
>>>>
>>>>
>>>
>>
>> ------------------------------------------------------------------------------
>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>> Develop your own process in accordance with the BPMN 2 standard
>> Learn Process modeling best practices with Bonita BPM through live
>> exercises
>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
>> event?utm_
>>
>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF_______________________________________________
>> Virtuoso-users mailing list
>> Virtuoso-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>>
>>
>>
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live
> exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> event?utm_
>
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF_______________________________________________
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>
>
>

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF

_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Re: [Virtuoso-users] best way to update large RDF stores with triples of a large document

Reply via email to