[Tutor] Newbie Trouble Processing SRT Strings In Text File

2014-10-31 Thread Matt Varner
TL:DR - Skip to "My Script: "subtrans.py"



Optional Links to (perhaps) Helpful Images:
1. The SRT download button:
http://i70.photobucket.com/albums/i82/RavingNoah/Python%20Help/tutor1_zps080f20f7.png

2. A visual comparison of my current problem (see 'Desire Versus
Reality' below):
http://i70.photobucket.com/albums/i82/RavingNoah/Python%20Help/newline_problem_zps307f8cab.jpg


The SRT File


The SRT file that you can download for every lesson that has a video
contains the caption transcript data and is organized according to
text snippets with some connecting time data.


Reading the SRT File and Outputting Something Useful


There may be a hundred different ways to read one of these file types.
The reliable method I chose was to use a handy editor for the purpose
called Aegisub.  It will open the SRT file and let me immediately
export a version of it, without the time data (which I don't
need...yet).  The result of the export is a plain-text file containing
each string snippet and a newline character.

==
Dealing with the Text File
==

One of these text files can be anywhere between 130 to 500 lines or
longer, depending (obviously) on the length of its attendant video.
For my purposes, as a springboard for extending my own notes for each
module, I need to concatenate each string with an acceptable format.
My desire for this is to interject spaces where I need them and kill
all the newline characters so that I get just one big lump of properly
spaced paragraph text.  From here, I can divide up the paragraphs how
I see fit and I'm golden...

==
My first Python script: Issues
==

I did my due diligence.  I have read the tutorial at www.python.org.
I went to my local library and have a copy of "Python Programming for
the Absolute Beginner, 3rd Edition by Michael Dawson."  I started
collecting what seemed like logical little bits here and there from
examples found using Uncle Google, but none of the examples anywhere
were close enough, contextually, to be automatically picked up by my
dense 'noobiosity.'  For instance, when discussing string
methods...almost all operations taught to beginners are done on
strings generated "on the fly," directly inputted into IDLE, but not
on strings that are contained in an external file.  There are other
examples for file operations, but none of them involved doing string
operations afterward.  After many errors about not being able to
directly edit strings in a file object, I finally figured out that
lists are used to read and store strings kept in a file like the one
I'm sourcing from...so I tried using that.  Then I spent hours
unsuccessfully trying to call strings using index numbers from the
list object (I guess I'm dense).  Anyhow, I put together my little
snippets and have been banging my head against the wall for a couple
of days now.

After many frustrating attempts, I have NEARLY produced what I'm
looking to achieve in my test file.


Example - Source


My Test file contains just twelve lines of a much larger (but no more
complex) file that is typical for the SRT subtitle caption file, of
which I expect to have to process a hundred...or hundreds, depending
on how many there are in all of the courses I plan to take
(coincidentally, there is one on Python)

Line 01: # Exported by Aegisub 3.2.1
Line 02: [Deep Dive]
Line 03: [CSS Values & Units Numeric and Textual Data Types with
Guil Hernandez]
Line 04: In this video, we'll go over the
Line 05: common numeric and textual values
Line 06: that CSS properties can accept.
Line 07: Let's get started.
Line 08: So, here we have a simple HTML page
Line 09: containing a div and a paragraph
Line 10: element nested inside.
Line 11: It's linked to a style sheet named style.css
Line 12: and this is where we'll be creating our new CSS rules.


My Script: "subtrans.py"


# Open the target file, create file object
f = open('tmp.txt', 'r')

# Create an output file to write the changed strings to
o = open('result.txt', 'w')

# Create a list object that holds all the strings in the file object
lns = f.readlines()

# Close the source file you no longer
# need now that you have
 your strings
f.close()

# Import sys to get at stdout (standard output) - "print" results will
be written to file
import sys

# Associate stdout with the output file
sys.stdout = o

# Try to print strings to output file using loopback variable (line)
and the list object
for line in lns:
if ".\n" in line:
a = line.replace('.\n','.  ')
print(a.strip('\n'))
else:
b = line.strip('\n')
print(b + " ")

# Close your output file
o.close()

=
Desire Versus Reality
=

The source file contains a series of strings with n

Re: [Tutor] Newbie Trouble Processing SRT Strings In Text

2014-11-01 Thread Matt Varner
Alan G wrote: "This is a bad idea.  Instead, write your strings directly to o

o.write(s)

Print adds newlines automatically(unless you explicitly suppress
them). But printing to a file is messy compared to writing directly to
the file. (And also means you cant print debug messages while
developing your code!)"

>>> Thank you so much, Alan.  I had the feeling I was making it more difficult 
>>> on myself.  In fact, before I tried using stdout as a solution, I was 
>>> getting errors on my syntax because I was (apparently) trying to "call" the 
>>> list.  I'm sure I was putting syntax in the wrong order...but now I better 
>>> understand the use of file objects and lists (and strings!).  :D

===

Danny Yoo wrote: "You may want to look at existing parsers that people
have written. https://github.com/byroot/pysrt

>>>  That will definitely help out in the future once I get a little better 
>>> wrapping my head around programming with Python.  I would like to cut out 
>>> the middle man (app: Aegisub) if I could...eventually.  Now that my very 
>>> basic stumbling block is worked out, I can start building on it.  This will 
>>> likely come in handy.  :)

Danny Yoo wrote: "Rather than immediately print the string, you may
want to accumulate results in a list.  You can then do some processing
on your list of strings."

>>> Indeed, this is what I was trying with:

lns = f.readlines()

I was creating more trouble for myself by not having the basic syntax
order right, and then using stdout tied to my output file, using print
statements that were adding newlines, even though I was trying to edit
them out.

Thank you both again for your time!

This result works perfectly (REMs removed):

f = open('tmp.txt', 'r')
o = open('result.txt', 'w')
lns = f.readlines()
f.close()
for line in lns:
if ".\n" in line:
a = line.replace('.\n','.  ')
o.write(a)
else:
a = line.strip('\n')
o.write(a + " ")
o.close()
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor