Hi All, We did a test round of test for 15000 xmls which has xi:include element (Sample given below). The require large xml (hierarchy xml) is getting generated in just PT23.040552S. We used node-expand API to generate the xml. Whereas our old recursive approach is taking more than 30 minute to perform the same operation. Can you please provide any thoughts ? Any other things we should be consider ?
import module namespace xinc = "http://marklogic.com/xinclude" at "/MarkLogic/xinclude/xinclude.xqy"; xinc:node-expand(fn:doc("/data/d14d44ec-59d5-4ada-b47d-3d62b69633c8") ) Where "/data/d14d44ec-59d5-4ada-b47d-3d62b69633c8" is the root xml URI in the hierarchy. 1- Root object which contains relationships <object name="package" id="d14d44ec-59d5-4ada-b47d-3d62b69633c8"> <properties> <property name="myPackage" type="string"> <value>somevalue</value> </property> ..... .... </properties> <relationships> <include href="/data/c525e14d-59d5-4ada-b47d-3d62b69633c8" xpointer="xpath(/*:object)" xmlns="http://www.w3.org/2001/XInclude"/> <include href="/data/12970f40-053d-4f22-8e39-073ca3a17454" xpointer="xpath(/*:object)" xmlns="http://www.w3.org/2001/XInclude"/> .... </relationships> </object> 2- Child object which contains further relationships (It is one of the child which is inside the relationships) <object name="myImage" id="c525e14d-59d5-4ada-b47d-3d62b69633c8"> <properties> <property name="pixelXDimension" type="int"> <value>645</value> </property> ..... .... </properties> <relationships> <include href="/data/xyzzqqka-59d5-4ada-b47d-125shydtt2bs" xpointer="xpath(/*:object)" xmlns="http://www.w3.org/2001/XInclude"/> .... </relationships> </object> 3- Further Child object which contains other relationships <object name="thumbnail" id="xyzzqqka-59d5-4ada-b47d-125shydtt2bs"> <properties> <property name="pixelXDimension" type="int"> <value>645</value> </property> ..... .... </properties> <relationships> <include href="/data/abcgdt13-59d5-125a-b47d-425shydtt2bs" xpointer="xpath(/*:object)" xmlns="http://www.w3.org/2001/XInclude"/> .... </relationships> </object> And so on. And final xml which we want : <object name="package" id="d14d44ec-59d5-4ada-b47d-3d62b69633c8"> <properties> <property name="myPackage" type="string"> <value>somevalue</value> </property> ..... .... </properties> <relationships> <object name="myImage" id="c525e14d-59d5-4ada-b47d-3d62b69633c8"> <properties> <property name="pixelXDimension" type="int"> <value>645</value> </property> ..... .... </properties> <relationships> <object name="thumbnail" id="xyzzqqka-59d5-4ada-b47d-125shydtt2bs"> <properties> <property name="pixelXDimension" type="int"> <value>645</value> </property> ..... .... </properties> <relationships> .... </relationships> </object> .... </relationships> </object> .... </relationships> </object> Regards, Abhinav From: Mishra, Abhinav Kumar (Cognizant) Sent: Thursday, June 16, 2016 12:55 PM To: MarkLogic Developer Discussion Cc: Singh, Vikas (Cognizant) Subject: RE: [MarkLogic Dev General] performance issue for creating large xml Hi Geert, We are creating an xml which looks like a hierarchy. And once the hierarchy is prepared from small chunks we are using an XSLT to transform the hierarchy into another format. The small chunks contains metadata for different-2 files. Currently we are having more than 30000 small chunks and we have to create a large xml (hierarchy xml) out of these chunks in memory. The generated large xml (hierarchy xml) will be more than 30MB in size. And this process is taking more than 45 minutes to complete. So we are looking for a design change. Vikas pointed out to use xi:include. So we thought of having a discussion here. Let me try to explain what we are doing. 1- Root object which contains relationships <object name="package" id="d14d44ec-59d5-4ada-b47d-3d62b69633c8"> <properties> <property name="myPackage" type="string"> <value>somevalue</value> </property> ..... .... </properties> <relationships> <value>c525e14d-59d5-4ada-b47d-3d62b69633c8</value> <value>12970f40-053d-4f22-8e39-073ca3a17454</value> .... </relationships> </object> 2- Child object which contains further relationships (It is one of the child which is inside the relationships) <object name="myImage" id="c525e14d-59d5-4ada-b47d-3d62b69633c8"> <properties> <property name="pixelXDimension" type="int"> <value>645</value> </property> ..... .... </properties> <relationships> <value>xyzzqqka-59d5-4ada-b47d-125shydtt2bs</value> .... </relationships> </object> 3- Further Child object which contains other relationships <object name="thumbnail" id="xyzzqqka-59d5-4ada-b47d-125shydtt2bs"> <properties> <property name="pixelXDimension" type="int"> <value>645</value> </property> ..... .... </properties> <relationships> <value>abcgdt13-59d5-125a-b47d-425shydtt2bs</value> .... </relationships> </object> and so on. and at the end we are creating a large xml which will look like: <object name="package" id="d14d44ec-59d5-4ada-b47d-3d62b69633c8"> <properties> <property name="myPackage" type="string"> <value>somevalue</value> </property> ..... .... </properties> <contains> <object name="myImage" id="c525e14d-59d5-4ada-b47d-3d62b69633c8"> <properties> <property name="pixelXDimension" type="int"> <value>645</value> </property> ..... .... </properties> <contains> <object name="thumbnail" id="xyzzqqka-59d5-4ada-b47d-125shydtt2bs"> <properties> <property name="pixelXDimension" type="int"> <value>645</value> </property> ..... .... </properties> <contains> .... </contains> </object> .... </contains> </object> .... </contains> </object> Now we are using XSLT to transform into another format which we need as a business requirement. Regards Abhinav From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Geert Josten Sent: Thursday, June 16, 2016 10:29 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] performance issue for creating large xml Hi Vikas, XInclude processing requires building the large xml in memory too, regardless where it will be going. So whether this will work well enough for your case depends on how large `large` is.. Kind regards, Geert From: <[email protected]<mailto:[email protected]>> on behalf of "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]>> Date: Thursday, June 16, 2016 at 4:24 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: [MarkLogic Dev General] performance issue for creating large xml Thanks Geert for quick reply As per current process also we are creating large xml by adding all related fragment, but not committing this large xml into database , so we are planning to create xml as below. <object name="Test" > <!--Some metadata properties --> <relationships> <relationship type="reference"> <value>49d7116c24d541aea73328b761cdd89f</value> <xi:include href="/49d7116c24d541aea73328b761cdd89f.xml" xpointer="49d7116c24d541aea73328b761cdd89f" /> </relationship> </object> As per above xml we are planning to add one more value as <xi:include> which will be same as value element but contains exact xpath. So when we want expanded form based on the xinclude it will automatically expanded. Will this approach improve our performance. This xi:include will be the different content with same structure. Regards, Vikas Singh From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Geert Josten Sent: Thursday, June 16, 2016 7:29 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] performance issue for creatign large xml Hi Vikas, Keep in mind you will be buffering all related fragments in memory while building this large XML. It might work out, but it won't scale well. To allow keeping memory usage small, and streaming through the results, you are better off returning all xml chunks without wrapping them in a single large document or element node. Not very elegant, but this would probably work: "<wrapper>", <p>hello world</p>, <p>hello world</p>, "</wrapper>" You can replace the p elements with anything that produces results in a streaming manner.. Cheers, Geert From: <[email protected]<mailto:[email protected]>> on behalf of "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]>> Date: Thursday, June 16, 2016 at 3:47 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: [MarkLogic Dev General] performance issue for creatign large xml Hi All, As per current design in our project we are creating large xml by adding all small xml chunks for a final outcome .For achieving this we are using cts:search and this search will work recursively . Example: We have one xml which contains metadata and all references of it .Now when we will create final result , we will be getting all references and metadata of all references and creating one large xml. Child references also contains other references and so on. This process is taking around one hour for creating the final result. Can we change our design and use XInclude in all the parent document so when we want final output. It will be automatically expanded for all child so no need to search in database . Will this improve our performance for generation of final outcome. Regards, Vikas Singh This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
