Re: [OpenBD] Re: Memory Issue while looping over large file

Alex Skinner Thu, 12 Jan 2012 11:13:29 -0800

I think basically you don't want to hold the whole file in memory, there is
no reason to, try the code i provided and without outputting the line just
out put a counter e.g.
1
2
3
4
5
6
7
See if it barfs at the same line number


A

On 12 January 2012 19:09, Aaron J. White <[email protected]> wrote:

> midstring and split taken from cflib
>
> http://www.cflib.org/udf/MidString
> http://www.cflib.org/udf/split
>
> On Jan 12, 1:03 pm, "Aaron J. White" <[email protected]> wrote:
> > Not really.
> >
> >         <cfset locals.startOfTitle = "<example_node>" />
> >         <cfset locals.endOfTitle = "</example_node>" />
> >
> >         <cfloop index="locals.line" file="#locals.absFilePath#">
> >                 <cfif locals.line DOES NOT CONTAIN locals.endOfTitle>
> >                         <!--- add line to titleitem  --->
> >                         <cfset locals.titleItem &= locals.line />
> >                         <cfset application.import.lineCount += 1 />
> >                         <cfif application.import.stop>
> >                                 <cfabort />
> >                         </cfif>
> >                 <cfelse>
> >                         <cfset locals.titleItem &= locals.line />
> >                         <cfset application.import.lineCount += 1 />
> >                         <!--- we hit the end of a title. first get exta
> chars from back.
> > we'll need those later--->
> >                         <cfset locals.tempArr =
> application.utility.split(locals.titleItem,
> > locals.endOfTitle) />
> >                         <cfset locals.tempItem =
> locals.tempArr[arraylen(locals.tempArr)] &
> > "" />
> >                         <!--- now get everything id middle of nodes --->
> >                         <cfset locals.titleItem = locals.startOfTitle &
> > application.utility.midstring(locals.titleItem, locals.startOfTitle,
> > locals.endOfTitle) & locals.endOfTitle/>
> >                         <!--- convert title item to xml object--->
> >                         <cfset locals.titleXml =
> xmlparse(locals.titleItem) />
> >                         <!--- we have our node. prepare titleItem text
> for next iteration
> > --->
> >                         <cfset locals.titleItem = locals.tempItem/>
> >                         <cfif application.import.stop >
> >                                 <cfabort />
> >                         <cfelse>
> >                                 <!--- process the title xml and add
> required info to the database
> > --->
> >                                 <cfset processTitleItem(locals.titleXml)
> />
> >                         </cfif>
> >                 </cfif>
> >         </cfloop>
> >
> > On Jan 12, 12:43 pm, Alex Skinner <[email protected]> wrote:
> >
> >
> >
> >
> >
> >
> >
> > > Seeing some code would be good how are you doing the read
> >
> > > I google and found something like this
> >
> > > <cfscript>
> > > // Define the file to read, use forward slashes only
> > > FileName="C:/Example/ReadMe.txt";
> > > // Initilize Java File IO
> > > FileIOClass=createObject("java","java.io.FileReader");
> > > FileIO=FileIOClass.init(FileName);
> > > LineIOClass=createObject("java","java.io.BufferedReader" );
> > > LineIO=LineIOClass.init(FileIO);
> > > </cfscript>
> >
> > > <CFSET EOF=0>
> > > <CFLOOP condition="NOT EOF">
> > >     <!--- Read in next line --->
> > >     <CFSET CurrLine=LineIO.readLine()>
> > >     <!--- If CurrLine is not defined, we have reached the end of file
> --->
> > >     <CFIF IsDefined("CurrLine") EQ "NO">
> > >         <CFSET EOF=1>
> > >         <CFBREAK>
> > >     </CFIF>
> > >     <CFOUTPUT>#CurrLine#<br></CFOUTPUT><CFFLUSH>
> > > </CFLOOP>
> >
> > > Is your solution similar ?
> >
> > > A
> >
> > > On 12 January 2012 17:57, Aaron J. White <[email protected]> wrote:
> >
> > > > Hey all,
> >
> > > > I am receiving an OutOfMemory error while running a script that is
> > > > trying to loop over a 1.2gb+ xml file (~ 12 million lines). I'm not
> > > > really sure if what I am doing is just horrible and there is a better
> > > > way or if it is a memory issue in openbd.
> >
> > > > I have assigned tomcat 2gb max memory. While I'm running the script I
> > > > can see the memory usage slowly creep up in task manager. With 4gb of
> > > > ram on the vps I get to about 7 million lines before tomcat gives up.
> > > > When I had 3gb of ram on the server and 1gb applied to Tomcat I could
> > > > only get to about 4 million lines.
> >
> > > > Here's the logic behind what I am doing.
> >
> > > > I am interested in one particular node in the large file so I loop
> > > > over the file line by line. As I loop if the line does not contain
> the
> > > > end of the node I'm looking for then I <cfset locals.exampleNode &=
> > > > locals.line />
> > > > Once I hit a line that contains the end of the node ( </
> > > > example_node> ). I do a few operations to clean up any extra text
> from
> > > > the front and back of the node string and then convert it to xml with
> > > > xmlparse.
> >
> > > > Once I have the node as xml I push it to another function that does
> > > > serveral things.
> > > > ** uses xpath to grab particular information from the node. Seven
> > > > xpath searches are done on each node unless I decide to skip the node
> > > > after the first two xpath searches.
> > > > ** Depending on the content I either add the information to my
> > > > database, update the information, or skip it. I have about 5 tables
> > > > that are getting modified from the script. A few of the unimportant
> > > > queries use background="yes".
> > > > The whole script runs in a cfthread so it doesn't time out.
> >
> > > > Can anyone give any insight. Also, I could post some code example,
> but
> > > > my script is about 600 lines long.
> >
> > > > --
> > > > online documentation:http://openbd.org/manual/
> > > >   google+ hints/tips:https://plus.google.com/115990347459711259462
> > > >    http://groups.google.com/group/openbd?hl=en
> >
> > > >     Join us @http://www.OpenCFsummit.org/Dallas, Feb 2012
> >
> > > --
> > > Alex Skinner
> > > Managing Director
> > > Pixl8 Interactive
> >
> > > Tel: +448452600726
> > > Email: [email protected]
> > > Web: pixl8.co.uk
>
> --
> online documentation: http://openbd.org/manual/
>   google+ hints/tips: https://plus.google.com/115990347459711259462
>     http://groups.google.com/group/openbd?hl=en
>
>     Join us @ http://www.OpenCFsummit.org/ Dallas, Feb 2012
>



-- 
Alex Skinner
Managing Director
Pixl8 Interactive

Tel: +448452600726
Email: [email protected]
Web: pixl8.co.uk

-- 
online documentation: http://openbd.org/manual/
   google+ hints/tips: https://plus.google.com/115990347459711259462
     http://groups.google.com/group/openbd?hl=en

     Join us @ http://www.OpenCFsummit.org/ Dallas, Feb 2012

Re: [OpenBD] Re: Memory Issue while looping over large file

Reply via email to