Hi Geert,
We are creating an xml which looks like a hierarchy. And once the hierarchy is
prepared from small chunks we are using an XSLT to transform the hierarchy into
another format. The small chunks contains metadata for different-2 files.
Currently we are having more than 30000 small chunks and we have to create a
large xml (hierarchy xml) out of these chunks in memory. The generated large
xml (hierarchy xml) will be more than 30MB in size. And this process is taking
more than 45 minutes to complete. So we are looking for a design change. Vikas
pointed out to use xi:include. So we thought of having a discussion here.
Let me try to explain what we are doing.
1- Root object which contains relationships
<object name="package" id="d14d44ec-59d5-4ada-b47d-3d62b69633c8">
<properties>
<property name="myPackage" type="string">
<value>somevalue</value>
</property>
.....
....
</properties>
<relationships>
<value>c525e14d-59d5-4ada-b47d-3d62b69633c8</value>
<value>12970f40-053d-4f22-8e39-073ca3a17454</value>
....
</relationships>
</object>
2- Child object which contains further relationships (It is one of the child
which is inside the relationships)
<object name="myImage" id="c525e14d-59d5-4ada-b47d-3d62b69633c8">
<properties>
<property name="pixelXDimension" type="int">
<value>645</value>
</property>
.....
....
</properties>
<relationships>
<value>xyzzqqka-59d5-4ada-b47d-125shydtt2bs</value>
....
</relationships>
</object>
3- Further Child object which contains other relationships
<object name="thumbnail" id="xyzzqqka-59d5-4ada-b47d-125shydtt2bs">
<properties>
<property name="pixelXDimension" type="int">
<value>645</value>
</property>
.....
....
</properties>
<relationships>
<value>abcgdt13-59d5-125a-b47d-425shydtt2bs</value>
....
</relationships>
</object>
and so on. and at the end we are creating a large xml which will look like:
<object name="package" id="d14d44ec-59d5-4ada-b47d-3d62b69633c8">
<properties>
<property name="myPackage" type="string">
<value>somevalue</value>
</property>
.....
....
</properties>
<contains>
<object name="myImage"
id="c525e14d-59d5-4ada-b47d-3d62b69633c8">
<properties>
<property
name="pixelXDimension" type="int">
<value>645</value>
</property>
.....
....
</properties>
<contains>
<object
name="thumbnail" id="xyzzqqka-59d5-4ada-b47d-125shydtt2bs">
<properties>
<property name="pixelXDimension" type="int">
<value>645</value>
</property>
.....
....
</properties>
<contains>
....
</contains>
</object>
....
</contains>
</object>
....
</contains>
</object>
Now we are using XSLT to transform into another format which we need as a
business requirement.
Regards
Abhinav
From: [email protected]
[mailto:[email protected]] On Behalf Of Geert Josten
Sent: Thursday, June 16, 2016 10:29 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] performance issue for creating large xml
Hi Vikas,
XInclude processing requires building the large xml in memory too, regardless
where it will be going. So whether this will work well enough for your case
depends on how large `large` is..
Kind regards,
Geert
From:
<[email protected]<mailto:[email protected]>>
on behalf of "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion
<[email protected]<mailto:[email protected]>>
Date: Thursday, June 16, 2016 at 4:24 PM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: [MarkLogic Dev General] performance issue for creating large xml
Thanks Geert for quick reply
As per current process also we are creating large xml by adding all related
fragment, but not committing this large xml into database , so we are
planning to create xml as below.
<object name="Test" >
<!--Some metadata properties -->
<relationships>
<relationship type="reference">
<value>49d7116c24d541aea73328b761cdd89f</value>
<xi:include href="/49d7116c24d541aea73328b761cdd89f.xml"
xpointer="49d7116c24d541aea73328b761cdd89f" />
</relationship>
</object>
As per above xml we are planning to add one more value as <xi:include> which
will be same as value element but contains exact xpath. So when we want
expanded form based on the xinclude it will automatically expanded. Will this
approach improve our performance. This xi:include will be the different
content with same structure.
Regards,
Vikas Singh
From:
[email protected]<mailto:[email protected]>
[mailto:[email protected]] On Behalf Of Geert Josten
Sent: Thursday, June 16, 2016 7:29 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] performance issue for creatign large xml
Hi Vikas,
Keep in mind you will be buffering all related fragments in memory while
building this large XML. It might work out, but it won't scale well. To allow
keeping memory usage small, and streaming through the results, you are better
off returning all xml chunks without wrapping them in a single large document
or element node.
Not very elegant, but this would probably work:
"<wrapper>",
<p>hello world</p>,
<p>hello world</p>,
"</wrapper>"
You can replace the p elements with anything that produces results in a
streaming manner..
Cheers,
Geert
From:
<[email protected]<mailto:[email protected]>>
on behalf of "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion
<[email protected]<mailto:[email protected]>>
Date: Thursday, June 16, 2016 at 3:47 PM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: [MarkLogic Dev General] performance issue for creatign large xml
Hi All,
As per current design in our project we are creating large xml by adding all
small xml chunks for a final outcome .For achieving this we are using
cts:search and this search will work recursively .
Example: We have one xml which contains metadata and all references of it .Now
when we will create final result , we will be getting all references and
metadata of all references and creating one large xml. Child references also
contains other references and so on.
This process is taking around one hour for creating the final result.
Can we change our design and use XInclude in all the parent document so when we
want final output. It will be automatically expanded for all child so no need
to search in database .
Will this improve our performance for generation of final outcome.
Regards,
Vikas Singh
This e-mail and any files transmitted with it are for the sole use of the
intended recipient(s) and may contain confidential and privileged information.
If you are not the intended recipient(s), please reply to the sender and
destroy all copies of the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email,
and/or any action taken in reliance on the contents of this e-mail is strictly
prohibited and may be unlawful. Where permitted by applicable law, this e-mail
and other e-mail communications sent to and from Cognizant e-mail addresses may
be monitored.
This e-mail and any files transmitted with it are for the sole use of the
intended recipient(s) and may contain confidential and privileged information.
If you are not the intended recipient(s), please reply to the sender and
destroy all copies of the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email,
and/or any action taken in reliance on the contents of this e-mail is strictly
prohibited and may be unlawful. Where permitted by applicable law, this e-mail
and other e-mail communications sent to and from Cognizant e-mail addresses may
be monitored.
This e-mail and any files transmitted with it are for the sole use of the
intended recipient(s) and may contain confidential and privileged information.
If you are not the intended recipient(s), please reply to the sender and
destroy all copies of the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email,
and/or any action taken in reliance on the contents of this e-mail is strictly
prohibited and may be unlawful. Where permitted by applicable law, this e-mail
and other e-mail communications sent to and from Cognizant e-mail addresses may
be monitored.
_______________________________________________
General mailing list
[email protected]
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general