On Sun, Mar 09, 2014 at 03:22:35PM -0400, street.swee...@mailworks.org wrote: > - In the get_long_names() function, the for/if thing is reading > the whole fileNames.tab file every time, isn't it? In reality, > the file was only a few dozen lines long, so I suppose it doesn't > matter, but is there a better way to do this? Yes, read the file once and build a dictionary mapping the first column to the second column.
> - Really, I wanted to create a new sequence number at the end of > each file name, but I thought this would be difficult. In order > for it to count from 01 to whatever the last file is per set p01, > p02, etc, it would have to be aware of the set name and how many > files are in it. So I settled for getting the last 3 digits of > the original file name using splitext(). The strings were unique, > so it worked out. However, I can see this being useful in other > places, so I was wondering if there is a good way to do this. > Is there a term or phrase I can search on? Buckets? You'll need a function to differentiate the files into the desired groups(buckets). You can use that function to generate a dictionary mapping a bucket name to a list of files(the bucket). Iterate over all the files, sorting them into buckets. Then iterate over the buckets, increasing the counter every file, and reseting the counter in between buckets. Here's how I'd do it for your script: def get_bucket_key(filename): prefix = filename.split('_')[0] return prefix def sort_files_into_buckets(file_list): buckets_to_files = {} for filename in file_list: key = get_bucket_key(filename) if key in buckets_to_files: buckets_to_files[key].append(filename) else: buckets_to_files[key] = [filename] return buckets_to_files file_list = os.listdir(...) buckets_to_files = sort_files_into_buckets(file_list) for (bucket_key, bucket_file_list) in buckets_to_files.items(): counter = 1 for file in bucket_file_list: print "{0}_{1}.jpg".format(bucket_key, counter) counter += 1 > - I'd be interested to read any other comments on the code. > I'm new to python and I have only a bit of computer science study, > quite some time ago. You should read PEP8[http://legacy.python.org/dev/peps/pep-0008/] for the basic Python style guide. If you'll be writing things to share or come back to after a while, it would be good to learn about documentation, start by reading PEP257[http://legacy.python.org/dev/peps/pep-0257/]. Note that I've written my own versions alongside my comments. I'm not sure if that's OK on this mailing list, and if you want a spoiler-free version with just the commentary and no code let me know and I'll resend it: > # get longnames from fileNames.tab > def get_long_name(glnAbbrev): Don't be afraid to use longer function/argument names and skip the comment, your code will end up much more readable. Does the "gln" in "glnAbbrev" stand for "get_long_name"? It seems repetitive to include the name of the function in the name of it's parameters. And I think we're passing the function a prefix, not an abbreviation: def get_long_names_from_file(prefix): > with open( > os.path.join(os.path.expanduser('~'),'temp2','fileNames.tab') > ) as filenames: Good, pythonic usage of context managers but it took a second longer to click than splitting it out into 2 lines. Also you should add spaces after each parameter in a function call: filenames_path = os.path.join(os.path.expanduser('~'), 'temp2', 'fileNames.tab') with open(filenames_path) as input_file: > filenamesdata = csv.reader(filenames, delimiter='\t') > for row in filenamesdata: > if row[0] == glnAbbrev: > return row[1] This is where you would build a dictionary. You could skip the csv module and just iterate through the file, splitting each line to get the prefix and long_name. The split() function splits a string into a list of strings. If every line in this file has only 2 columns, we can unpack the list directly into 2 variables. I usually name my dictionary variables "key_to_value": prefix_to_long_name = {} for line in input_file: prefix, long_name = line.split() prefix_to_long_name[prefix] = long_name return prefix_to_long_name This could also be done by dictionary comprehensions: return {prefix: long_name for (prefix, long_name) in (line.split() for line in input_file)} Then just return your finished dictionary. Note that since we return a dictionary instead of searching for a specific prefix, we do not need our "prefix" parameter: def get_long_names_from_file(): But this sounds like we are returning a list of long_names, not a dictionary so I would rename it again to: def get_prefix_to_long_name_from_file() > # find shortname from slice in picture filename > def get_slice(fn): > threeColSlice = fn[1:4] > return threeColSlice I agree that this is overkill but with the suggestions above we can at least make it nicer. Variables should be lower_case_with_underscores: def get_prefix_from_filename(filename): prefix = filename[1:4] return prefix I try to avoid hardcoded splices, so I would opt to split by underscores and keep the leading "p" (at least document the expected filenames somewhere in the script so people understand what those indexes represent): def get_prefix_from_filename(filename): prefix = filename.split('_')[0] return prefix > # get 3-digit sequence number from basename > def get_bn_seq(fn): > seq = os.path.splitext(fn)[0][-3:] > return seq Again you could use better names and a split instead of a splice. The fact that the function uses the basename instead of the full filename is not something that needs to be exposed in the function name: def get_number_from_filename(filename): base_filename = os.path.splitext(filename)[0] number = filename.split('_')[-1] return number I would also write 2 more functions. The first to get paths to the temp2/4/5 folders and prevent code repetition(Don't Repeat Yourself): def get_folder_in_home(folder): folder_path = os.path.join(os.path.expanduser('~'), folder) Then we can change the relevant line in get_long_names_from_file(): folder_path = get_folder_in_home('temp2') filenames_path = os.path.join(folder_path, 'fileNames.tab') The second function would return the new name of a file: def get_new_filename(filename, prefix_to_long_name): prefix = get_prefix_from_filename(filename) long_name = prefix_to_long_name[prefix] number = get_number_from_filename(filename) new_filename = long_name + "_-_" + number + ".jpeg" return new_filename Using String Formatting is safer than concatenation(which would fail if number was an int): new_filename = "{0}_-_{1}.jpeg".format(long_name, number) Now that we're past the definitions we should build our prefix->long_name dictionary: prefix_to_long_name = get_prefix_to_long_name_from_file() > # directory locations > indir = os.path.join(os.path.expanduser('~'),'temp4') > outdir = os.path.join(os.path.expanduser('~'),'temp5') The input and output directories are set and then never change. We can indicate this to the reader by writing the variable in ALL_CAPS: INPUT_DIRECTORY = get_folder_in_home('temp4') OUTPUT_DIRECTORY = get_folder_in_home('temp5') > # rename > for f in os.listdir(indir): > if f.endswith(".jpg"): > os.rename( > os.path.join(indir,f),os.path.join( > outdir, > get_long_name(get_slice(f))+"_-_"+get_bn_seq(f)+".jpeg") > ) I would at least rename "f" to "input_file". We could also use a list comprehension to generate the list and rename in one step: [os.rename(os.path.join(INPUT_DIRECTORY, image_file), os.path.join(OUTPUT_DIRECTORY, get_new_filename(image_file, prefix_to_long_name)) ) for image_file in os.listdir(INPUT_DIRECTORY) if image_file.endswith(".jpg")] Although that's a bit much: for input_file in os.listdir(INPUT_DIRECTORY): if input_file.endswith(".jpg"): input_path = os.path.join(INPUT_DIRECTORY, input_file) new_filename= get_new_filename(input_file, prefix_to_long_name) output_path = os.path.join(OUTPUT_DIRECTORY, new_filename) os.rename(input_path, output_path) Some would also stick all of the top-level scripting into a function called main. You set the main() function to run automatically only if the script is called as a file. It won't execute if imported as a module in another script: if __name__ == '__main__': main() Here's the fully revised script: import os from os.path import join def main(): INPUT_DIRECTORY = get_folder_in_home('temp4') OUTPUT_DIRECTORY = get_folder_in_home('temp5') prefix_to_long_name = get_prefix_to_long_name_from_file() for input_file in os.listdir(INPUT_DIRECTORY): if input_file.endswith(".jpg"): new_filename = get_new_filename(input_file, prefix_to_long_name) input_path = join(INPUT_DIRECTORY, input_file) output_path = join(OUTPUT_DIRECTORY, new_filename) os.rename(input_path, output_path) def get_folder_in_home(folder): folder_path = join(os.path.expanduser('~'), folder) return folder_path def get_prefix_to_long_name_from_file(): prefix_to_long_name = {} filenames_path = join(get_folder_in_home('temp2'), 'fileNames.tab') with open(filenames_path) as input_file: for line in input_file: prefix, long_name = line.split() prefix_to_long_name[prefix] = long_name return prefix_to_long_name def get_prefix_from_filename(filename): prefix = filename.split('_')[0] return prefix def get_number_from_filename(filename): base_filename = os.path.splitext(filename)[0] number = base_filename.split('_')[-1] return number def get_new_filename(filename, prefix_to_long_name): prefix = get_prefix_from_filename(filename) long_name = prefix_to_long_name[prefix] number = get_number_from_filename(filename) new_filename = "{0}_-_{1}.jpeg".format(long_name, number) return new_filename if __name__ == '__main__': main()
signature.asc
Description: GnuPG Digital Signature
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor