Kunal,

The log_mode parameter of DB.DBA.RDF_LOAD_RDFXML_MT() does not match
one-to-one to the mode set by log_enable(). In multithreaded RDF loaders
quads are always committed without waiting for end of loading but
log_mode parameter controls how much data are written into log.

There exists an integrity issue. There are tables that store 'numeric'
values of type IRI_IDs for all IRIs stored in the database, so it's
possible to find exising or create new IRI_ID for given IRI string or
find IRI string by its IRI_ID. RDF_QUAD table should not contain IRI_IDs
that are not listed in that tables. If this ever happens then it's
better to zap entire RDF storage and reload all RDF data from scratch.
This may be impossible so the integrity violation should be avoided.

The only way to get this integrity violation is to kill the server when
some quads are written to the database and logged whereas new IRI_IDs
are not logged. This is possible, for instance, when a transaction with
disabled logging loads a resource, allocates a new IRI_ID for a new IRI,
same IRI occurs in triples loaded by second client with enabled logging
and then crash happens. After server restart and log replay the database
will contain quads of the second client but not IRI_IDs that were not
written to the log by the first client.

To avoid the problem, multithread parsers use the following logging
modes:

log_mode=0 means no logging at all; all RDF data should be re-loaded in
case of server crash.

log_mode=1 means log only IRI_IDs, but not quads. This provides RDF
storage integrity in case of any crashes, but the administrator should
re-load resources that were loading at the time of crash, otherwise the
storage will contain no new quads or some parts of them.

log_mode=2 means logging of both IRI_IDs and quads. This is the default,
when unsure -- use it.

Best Regards,

Ivan Mikhailov,
OpenLink Software.

On Wed, 2008-02-27 at 15:09 -0800, Kunal Patel wrote:
> Hi all,
> 
>   When I use the function DB.DBA.RDF_LOAD_RDFXML_MT with the log_mode
> parameter set to 0, does it log all the transactions?  Would it be
> more efficient to set the value of log_mode to 2 instead of 0 when
> loading a huge rdf file.  I believe using the value 2 turns on the row
> by row autocommit which the virtuoso documentation says, 
> 
> "Row by row autocommit mode is good for any batch operations where
> concurrent updates are not expected or are not an issue. Examples
> include bulk loading of data, materialization of RDF inferred data
> etc."  http://docs.openlinksw.com/virtuoso/coredbengine.html
> 
> Regards,
> Kunal
> 
> 
> 
> ______________________________________________________________________
> Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try
> it now.
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________ Virtuoso-users mailing list 
> Virtuoso-users@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Reply via email to