[Tutor] Newbie Trouble Processing SRT Strings In Text File
TL:DR - Skip to "My Script: "subtrans.py" Optional Links to (perhaps) Helpful Images: 1. The SRT download button: http://i70.photobucket.com/albums/i82/RavingNoah/Python%20Help/tutor1_zps080f20f7.png 2. A visual comparison of my current problem (see 'Desire Versus Reality' below): http://i70.photobucket.com/albums/i82/RavingNoah/Python%20Help/newline_problem_zps307f8cab.jpg The SRT File The SRT file that you can download for every lesson that has a video contains the caption transcript data and is organized according to text snippets with some connecting time data. Reading the SRT File and Outputting Something Useful There may be a hundred different ways to read one of these file types. The reliable method I chose was to use a handy editor for the purpose called Aegisub. It will open the SRT file and let me immediately export a version of it, without the time data (which I don't need...yet). The result of the export is a plain-text file containing each string snippet and a newline character. == Dealing with the Text File == One of these text files can be anywhere between 130 to 500 lines or longer, depending (obviously) on the length of its attendant video. For my purposes, as a springboard for extending my own notes for each module, I need to concatenate each string with an acceptable format. My desire for this is to interject spaces where I need them and kill all the newline characters so that I get just one big lump of properly spaced paragraph text. From here, I can divide up the paragraphs how I see fit and I'm golden... == My first Python script: Issues == I did my due diligence. I have read the tutorial at www.python.org. I went to my local library and have a copy of "Python Programming for the Absolute Beginner, 3rd Edition by Michael Dawson." I started collecting what seemed like logical little bits here and there from examples found using Uncle Google, but none of the examples anywhere were close enough, contextually, to be automatically picked up by my dense 'noobiosity.' For instance, when discussing string methods...almost all operations taught to beginners are done on strings generated "on the fly," directly inputted into IDLE, but not on strings that are contained in an external file. There are other examples for file operations, but none of them involved doing string operations afterward. After many errors about not being able to directly edit strings in a file object, I finally figured out that lists are used to read and store strings kept in a file like the one I'm sourcing from...so I tried using that. Then I spent hours unsuccessfully trying to call strings using index numbers from the list object (I guess I'm dense). Anyhow, I put together my little snippets and have been banging my head against the wall for a couple of days now. After many frustrating attempts, I have NEARLY produced what I'm looking to achieve in my test file. Example - Source My Test file contains just twelve lines of a much larger (but no more complex) file that is typical for the SRT subtitle caption file, of which I expect to have to process a hundred...or hundreds, depending on how many there are in all of the courses I plan to take (coincidentally, there is one on Python) Line 01: # Exported by Aegisub 3.2.1 Line 02: [Deep Dive] Line 03: [CSS Values & Units Numeric and Textual Data Types with Guil Hernandez] Line 04: In this video, we'll go over the Line 05: common numeric and textual values Line 06: that CSS properties can accept. Line 07: Let's get started. Line 08: So, here we have a simple HTML page Line 09: containing a div and a paragraph Line 10: element nested inside. Line 11: It's linked to a style sheet named style.css Line 12: and this is where we'll be creating our new CSS rules. My Script: "subtrans.py" # Open the target file, create file object f = open('tmp.txt', 'r') # Create an output file to write the changed strings to o = open('result.txt', 'w') # Create a list object that holds all the strings in the file object lns = f.readlines() # Close the source file you no longer # need now that you have your strings f.close() # Import sys to get at stdout (standard output) - "print" results will be written to file import sys # Associate stdout with the output file sys.stdout = o # Try to print strings to output file using loopback variable (line) and the list object for line in lns: if ".\n" in line: a = line.replace('.\n','. ') print(a.strip('\n')) else: b = line.strip('\n') print(b + " ") # Close your output file o.close() = Desire Versus Reality = The source file contains a series of strings with n
Re: [Tutor] passing named arguments through command line
cool, thanks guys :) -Robert On Thu, Oct 30, 2014 at 7:24 PM, Danny Yoo wrote: > > > On Thu Oct 30 2014 at 7:58:32 AM Lukas Nemec wrote: > >> Hello, >> >> take a look at argparse library. >> > > > Hi Robert, > > As Lukas mentions, it sounds like you're looking for a "flag parsing" > library. A flag parsing library reads a set of key/value pairs that are > encoded in sys.argv, so they let command-line programs provide variable > values through the use of these flags. > > There are a few of these flag libraries in Python due to Python's long > history. The one that Lukas recommends, 'argparse', is probably the one > you want to use. > > You can find documentation for argparse at: > > https://docs.python.org/2/howto/argparse.html#id1 > https://docs.python.org/2/library/argparse.html > > Good luck! > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Newbie Trouble Processing SRT Strings In Text File
On 31/10/14 11:07, Matt Varner wrote: # Import sys to get at stdout (standard output) - "print" results will be written to file import sys This is a bad idea. Instead, write your strings directly to o o.write(s) Print adds newlines automatically(unless you explicitly suppress them). But printing to a file is messy compared to writing directly to the file. (And also means you cant print debug messages while developing your code!) HTH -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Practicing with sockets
Hello all, hope everyone is doing well. I have been practicing with sockets and I am trying to send a small png from the client to the server. the client code is... import socket f = open('/Users/Bo/Desktop/logo_ONEConnxt.png', 'rb') strf = f.read() client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) client_socket.connect(("ip.ip.ip.ip", 8999)) client_socket.sendall(strf) f.close() exit() and the server code is... import socket f = open('img.png', 'wb') s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) port = 8999 s.bind(('', port)) s.listen(5) client_socket, address = s.accept() data = client_socket.recv(4029) f.write(data) client_socket.close() Both the above client and server code runs without error, however the "img.png" file that is placed on the server shows zero bytes? Will someone please show me what I am doing wrong? Thank you, Bo ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Newbie Trouble Processing SRT Strings In Text File
[code cut] Hi Matt, It looks like you're trying to write your own srt parser as part of this problem. If you're in a hurry, you may want to look at existing parsers that people have written. For example: https://github.com/byroot/pysrt But, even though it successfully kills these additional newlines that > seem to form in the list-making process...I end up with basically a > non-concatenated file of strings...with the right spaces I need, but > not one big chunk of text, like I expect using the s.strip('\n') > functionality. > Rather than immediately print the string, you may want to accumulate your results in a list. You can then do some processing on your list of strings. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Practicing with sockets
On Fri Oct 31 2014 at 10:31:20 AM Bo Morris wrote: > Hello all, hope everyone is doing well. > > I have been practicing with sockets and I am trying to send a small png > from the client to the server. > Hey Bo, Very cool! Socket programming is fun, because it lets your programs start talking to other programs. But it can get frustrating at times too, since it's all about communication, and we know communcation can fail for so many different reasons. :P We'll try to help where we can. Just to make sure, you are probably following the Socket HOWTO: https://docs.python.org/2/howto/sockets.html Reading code... > import socket > > f = open('/Users/Bo/Desktop/logo_ONEConnxt.png', 'rb') > strf = f.read() > client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) > client_socket.connect(("ip.ip.ip.ip", 8999)) > client_socket.sendall(strf) > f.close() > exit() > > This is problematic: the server code won't know up front how many bytes it should expect to read from the socket. That is, the code here is sending "variable-length" message, and variable lengths are difficult to work with. One common solution is to prefix the payload with a fixed-size byte length. That way, the server can read the fixed-size length first, and then run a loop that reads the rest of the bytes. This looks something like: import struct # ... # Send the length... client_socket.send(struct.pack("!I", len(strf))) # followed by the content client_socket.sendall(strf) Your client code will symmetrically read the first four bytes, use struct.unpack() to find how how large the rest of the message is going to be, and then do a loop until it reads the exact number of bytes. Ok, I'm reading through the client code a bit more... data = client_socket.recv(4029) > f.write(data) > client_socket.close() > You probably want to open the output file _after_ the socket has accepted. Otherwise, it seems a bit premature to open that "f" file. Also, don't forget to close the "f" file once you've finished reading the bytes. Also note here that since recv() doesn't guarantee how many bytes you'll read at a time, the byte-reading code needs to be in a loop. Also, I strongly suggest that you place some logging messages in both your client and server to trace where your programs are. One distinguishing feature of network programs is that they are typically long-running, and so logs help to expose what the heck they're doing at a given time. See: https://docs.python.org/2/howto/logging.html#logging-basic-tutorial https://docs.python.org/2/library/logging.html As it stands, your server might not have ever accepted a message from your client, and you'll still see an empty file, since the code is opening the file for writing before listening for a request. That's the other reason why you want to move the file opening to _after_ the socket is accepted. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Practicing with sockets
Hey Danny, yes I have been having quite a bit of fun learning to work with sockets. Thank you for your response. I have applied what you suggested with the exception of the "logging." I read through the logging docs and figured logging would be learning for another day. I have a hard time enough staying focused on one task at time haha. I did however insert some print statements into the code so I could keep track of where it was at, but to keep my email short, I omitted them here. After implementing what you suggested, the image fie that is saved on the server is now 4 bytes, but I assume that is due to... "Your client code will symmetrically read the first four bytes, use struct.unpack() to find how how large the rest of the message is going to be, and then do a loop until it reads the exact number of bytes" and I have not quite got the correct loop to read all the bytes? I also reread the docs at https://docs.python.org/2/howto/sockets.html and decided to remove the "b" from "open('myfile.png', 'wb') open('myfile.png', 'rb') seeing how binary could be different depending on the machine and I have not yet learned how to deal with this. Would I be better off converting the image to base64 prior to sending it to the server, then decoding it on the server? Here is my updated code...for brevity sake, I have omitted the "import" statments... Client: f = open('/Users/Bo/Desktop/SIG.png', 'r') strf = f.read() client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) client_socket.connect(("ip,ip,ip,ip", 8999)) payload = client_socket.send(struct.pack("!I", len(strf))) for data in payload: client_socket.sendall(strf) f.close() exit() Server: s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) port = 8999 s.bind(('', port)) s.listen(5) client_socket, address = s.accept() data = client_socket.recv(4029) f = open('img.png', 'w') for item in data: f.write(item) f.flush() f.close() client_socket.close() At least I am getting 4 bytes in oppose to 0 like I was getting before. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Practicing with sockets
ok so I finally got all the bytes to be transfered to the server, however I am unable to open the image on the server; although the filed is saved as a png file on the server, the server does not recognize the file as png format? I changed the loops to the following... Client: f = open('/Users/Bo/Desktop/SIG.png', 'r') strf = f.read() client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) client_socket.connect(("25.78.28.110", 8999)) while True: client_socket.send(struct.pack("!I", len(strf))) data = client_socket.sendall(strf) if not data: break f.close() print "Data Received successfully" exit() Server: s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) port = 8999 s.bind(('', port)) s.listen(5) client_socket, address = s.accept() f = open('img.png', 'w') while True: data = client_socket.recv(4029) f.write(data) if not data: break #f.flush() f.close() client_socket.close() On Fri, Oct 31, 2014 at 3:42 PM, Bo Morris wrote: > Hey Danny, yes I have been having quite a bit of fun learning to work with > sockets. Thank you for your response. I have applied what you suggested > with the exception of the "logging." I read through the logging docs and > figured logging would be learning for another day. I have a hard time > enough staying focused on one task at time haha. I did however insert some > print statements into the code so I could keep track of where it was at, > but to keep my email short, I omitted them here. > > After implementing what you suggested, the image fie that is saved on the > server is now 4 bytes, but I assume that is due to... > > "Your client code will symmetrically read the first four bytes, use > struct.unpack() to find how how large the rest of the message is going to > be, and then do a loop until it reads the exact number of bytes" > > and I have not quite got the correct loop to read all the bytes? > > I also reread the docs at https://docs.python.org/2/howto/sockets.html and > decided to remove the "b" from "open('myfile.png', 'wb') open('myfile.png', > 'rb') seeing how binary could be different depending on the machine and I > have not yet learned how to deal with this. Would I be better off > converting the image to base64 prior to sending it to the server, then > decoding it on the server? > > Here is my updated code...for brevity sake, I have omitted the "import" > statments... > > Client: > > f = open('/Users/Bo/Desktop/SIG.png', 'r') > strf = f.read() > client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) > client_socket.connect(("ip,ip,ip,ip", 8999)) > payload = client_socket.send(struct.pack("!I", len(strf))) > for data in payload: > client_socket.sendall(strf) > f.close() > exit() > > Server: > > s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) > port = 8999 > s.bind(('', port)) > s.listen(5) > client_socket, address = s.accept() > data = client_socket.recv(4029) > f = open('img.png', 'w') > for item in data: > f.write(item) > f.flush() > f.close() > client_socket.close() > > At least I am getting 4 bytes in oppose to 0 like I was getting before. > > > > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Practicing with sockets
> > > I also reread the docs at https://docs.python.org/2/howto/sockets.html and > decided to remove the "b" from "open('myfile.png', 'wb') open('myfile.png', > 'rb') seeing how binary could be different depending on the machine and I > have not yet learned how to deal with this. > Whoa, wait. I think you're misunderstanding the point of binary mode. You _definitely_ need binary mode on when working with binary file formats like PNG. Otherwise, your operating system environment may do funny things to the file content like treat the 0-character (NULL) as a terminator, or try to transparently translate line ending sequences. Would I be better off converting the image to base64 prior to sending it to > the server, then decoding it on the server? > The socket approach is low-level: all you've got is a pipe that can send and receive bytes. It's _all_ binary from the perspective of the network layer. base64-encoding and decoding these bytes won't harm anything, of course, but I don't see it helping either. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor