Re: [Virtuoso-users] full-text indexing after mass import (was: rdf_loader_run logs "missed delete of name id cache" messages)

Jörn Hees Sun, 22 Nov 2015 19:02:54 -0800

Hi,

> On 23 Nov 2015, at 03:36, Hugh Williams <hwilli...@openlinksw.com> wrote:
> 
>>>     
>>> http://docs.openlinksw.com/virtuoso/sparqlextensions.html#rdfsparqlrulefulltext
>> 
>> I really think that this could be it, as by default there seems to be an 
>> "all" index.
> 
> [Hugh] If you installed the Virtuoso Faceted Browser then the FT index would 
> be enabled and run as a scheduled job.


It's a "vanilla" virtuoso-server as installed from the .deb packages of the 
7.2.1 sources, the only VAD package installed is the conductor.


>> 
>> Reading the doc page, i have two remaining questions:
>> 
>> After a normal `rdf_loader_run()`, would a 
>> `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` be sufficient to get a complete 
>> full-text index? Or do i have to run `DB.DBA.RDF_OBJ_FT_RECOVER();` in those 
>> cases and will otherwise never arrive at a complete free-text index (not 
>> even after the background tasks finished?)?
> 
> [Hugh] The scheduler will run `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` so you 
> can wait for it to run or run it manually itself.


Yes, that's what i understood, but in the 
http://docs.openlinksw.com/virtuoso/sparqlextensions.html#rdfsparqlrulefulltext 
there's this remark:

> One problem related to free-text indexing of DB.DBA.RDF_QUAD is that some 
> applications (e.g. those that import billions of triples) may set off 
> triggers. This will make free-text index data incomplete. Calling procedure 
> DB.DBA.RDF_OBJ_FT_RECOVER () will insert all missing free-text index items by 
> dropping and re-inserting every existing free-text index rule.

So i'm asking: is `rdf_loader_run();` one of those "applications" which 
deactivate some triggers leading to an incomplete full-text index and need 
`DB.DBA.RDF_OBJ_FT_RECOVER();` to be called?

Or are the only two interesting alternatives for me waiting or calling 
`DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();`?


>> If i have to, a mention of this around 
>> http://docs.openlinksw.com/virtuoso/rdfperformancetuning.html#rdfperfloadinglod
>>  would be nice.
>> 
>> I ran `DB.DBA.RDF_OBJ_FT_RECOVER();` on a small instance with just the 
>> DBpedia core (~ 430 M triples) and it seems to only use 2 - 3 CPUs with very 
>> little IO. The whole importing of that dataset only took 1:30 hours, but the 
>> full-text indexing is still running after 3 hours now... Is there any way to 
>> go full speed at the cost of locking the whole DB or something?
> 
> [Hugh] Will have to check with development as I am not aware of a param to 
> control CPU usage, it should run with full platform utilisation I would have 
> thought 

Thanks,

Jörn


------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Re: [Virtuoso-users] full-text indexing after mass import (was: rdf_loader_run logs "missed delete of name id cache" messages)

Reply via email to