TL:DR - Skip to "My Script: "subtrans.py" <beg>
Optional Links to (perhaps) Helpful Images: 1. The SRT download button: http://i70.photobucket.com/albums/i82/RavingNoah/Python%20Help/tutor1_zps080f20f7.png 2. A visual comparison of my current problem (see 'Desire Versus Reality' below): http://i70.photobucket.com/albums/i82/RavingNoah/Python%20Help/newline_problem_zps307f8cab.jpg ============ The SRT File ============ The SRT file that you can download for every lesson that has a video contains the caption transcript data and is organized according to text snippets with some connecting time data. ======================================== Reading the SRT File and Outputting Something Useful ======================================== There may be a hundred different ways to read one of these file types. The reliable method I chose was to use a handy editor for the purpose called Aegisub. It will open the SRT file and let me immediately export a version of it, without the time data (which I don't need...yet). The result of the export is a plain-text file containing each string snippet and a newline character. ========================== Dealing with the Text File ========================== One of these text files can be anywhere between 130 to 500 lines or longer, depending (obviously) on the length of its attendant video. For my purposes, as a springboard for extending my own notes for each module, I need to concatenate each string with an acceptable format. My desire for this is to interject spaces where I need them and kill all the newline characters so that I get just one big lump of properly spaced paragraph text. From here, I can divide up the paragraphs how I see fit and I'm golden... ============================== My first Python script: Issues ============================== I did my due diligence. I have read the tutorial at www.python.org. I went to my local library and have a copy of "Python Programming for the Absolute Beginner, 3rd Edition by Michael Dawson." I started collecting what seemed like logical little bits here and there from examples found using Uncle Google, but none of the examples anywhere were close enough, contextually, to be automatically picked up by my dense 'noobiosity.' For instance, when discussing string methods...almost all operations taught to beginners are done on strings generated "on the fly," directly inputted into IDLE, but not on strings that are contained in an external file. There are other examples for file operations, but none of them involved doing string operations afterward. After many errors about not being able to directly edit strings in a file object, I finally figured out that lists are used to read and store strings kept in a file like the one I'm sourcing from...so I tried using that. Then I spent hours unsuccessfully trying to call strings using index numbers from the list object (I guess I'm dense). Anyhow, I put together my little snippets and have been banging my head against the wall for a couple of days now. After many frustrating attempts, I have NEARLY produced what I'm looking to achieve in my test file. ================ Example - Source ================ My Test file contains just twelve lines of a much larger (but no more complex) file that is typical for the SRT subtitle caption file, of which I expect to have to process a hundred...or hundreds, depending on how many there are in all of the courses I plan to take (coincidentally, there is one on Python) Line 01: # Exported by Aegisub 3.2.1 Line 02: [Deep Dive] Line 03: [CSS Values & Units Numeric and Textual Data Types with Guil Hernandez] Line 04: In this video, we'll go over the Line 05: common numeric and textual values Line 06: that CSS properties can accept. Line 07: Let's get started. Line 08: So, here we have a simple HTML page Line 09: containing a div and a paragraph Line 10: element nested inside. Line 11: It's linked to a style sheet named style.css Line 12: and this is where we'll be creating our new CSS rules. ======================== My Script: "subtrans.py" ======================== # Open the target file, create file object f = open('tmp.txt', 'r') # Create an output file to write the changed strings to o = open('result.txt', 'w') # Create a list object that holds all the strings in the file object lns = f.readlines() # Close the source file you no longer # need now that you have your strings f.close() # Import sys to get at stdout (standard output) - "print" results will be written to file import sys # Associate stdout with the output file sys.stdout = o # Try to print strings to output file using loopback variable (line) and the list object for line in lns: if ".\n" in line: a = line.replace('.\n','. ') print(a.strip('\n')) else: b = line.strip('\n') print(b + " ") # Close your output file o.close() ================= Desire Versus Reality ================= The source file contains a series of strings with newline characters directly following whatever the last character in the snippet...with absolutely no whitespace. This is a problem for me if I want to concatentate it back together into paragraph text to use as the jumping off point for my additional notes. I've been literally taking four hours to type explicitly the dialogue from the videos I've been watching...and I know this is going to save me a lot of time and get me interacting with the lessons faster and more efficiently. However... My script succeeds in processing the source file and adding the right amount of spaces for each line, the rule being "two spaces added following a period, and one space added following a string with no period in it (technically, a period/newline pairing (which was the only way I could figure out not target the period in 'example.css' or 'version 2.3.2'. But, even though it successfully kills these additional newlines that seem to form in the list-making process...I end up with basically a non-concatenated file of strings...with the right spaces I need, but not one big chunk of text, like I expect using the s.strip('\n') functionality. ============================================================ What I'm Holding Out For - This is what my output should look like (for the test file) ============================================================ # Exported by Aegisub 3.2.1 [Deep Dive] [CSS Values & Units Numeric and Textual Data Types with Guil Hernandez] In this video, we'll go over the common numeric and textual values that CSS properties can accept. Let's get started. So, here we have a simple HTML page containing a div and a paragraph element nested inside. It's linked to a style sheet named style.css and this is where we'll be creating our new CSS rules. =========================== Thank You For Your Time and Efforts! =========================== </beg> _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor