George Nyoro wrote:
Hey all, again:
Thanks for the delete thing. Helped with that problem a lot. Especially the
getattr thing came in handy.

Question 1:
How do you guys indent and put in the triple greater than signs and all
that when sending mail? Manually? Coz the last mail I sent was using the
gmail indent thing and it doesnt seem to have worked. Anyway:


Copy text including >>> signs, and any other characters you want.
Paste into your email client. The characters should appear exactly as you copied and pasted them.

If your email client (gmail) takes it upon itself to mangle your text, then use a better email client that doesn't suck. (Any software which thinks it knows what you want better than you do should be taken out and slapped with a wet fish.)


Question 2:
I have this script that downloaded a lot of files from one server into one
local directory and the files are named in a particular order and I want
them to be opened in this very same order so that I can copy the contents
of each to one main file and then convert to pdf. However, the naming
schema is like this (Roman numerals in parts) and python doesn't seem to be
able to sort it properly.
1.I.1; 1.I.2; ....; 1.I.20;
1.II.1.1;1.II.1.2
1.II.2;...1.II.10; 2.I.3;

I don't understand this. Is each line ONE file name, or are there many file names per line? Are the semi-colons part of the file names?

I'm going to *guess* that each file name has THREE sections only, with no semi-colons, and look like these 3 examples:

1.I.1
1.III.20
5.IX.7

etc. Am I close?


Because of the roman numerals that are always in the second part fo the
filename, it could sort based on how I wanted.

I listed all the files in the said directory:
files=os.listdir(location)

I tried using a sort algorithm that goes through the list starting from
index 0 and then compares each index with the next one and if they are not
arranged as they should be, the two are switched and then it starts again.
The thing is, it is so slow! It hardly goes past index three in the list
and there are so many files. *about fifty*.

Congratulations! You appear to have re-invented one of the world's slowest sorting functions, bubblesort. <wink> Look it up on Wikipedia.

Only probably with a bug in it, because even bubblesort shouldn't get stuck after only three files.

Except as a learning exercise, or to prove you can, never try to program your own sort in Python. That's like buying a car, and then putting it on the back of your push-bike and trying to cycle home with it. Uphill. In the snow. With a flat tire.

Instead, you should use Python's built-in sort function, which if you use it properly is faster than anything you can write (unless you are Tim Peters, who invented it in the first place).

Normally, you would just do something like this:

files = os.listdir(location)
files.sort()  # sort the list in-place, without making a copy


or perhaps this:

files = sorted(os.listdir(location))  # make a copy and sort


but in this case you have special requirements, thanks to the naming convention for your files. In this case, you don't want the ordinary sort order for strings, you need a special sort order based on Roman numerals.

The best way to sort these is with a technique called:

Decorate, Sort, Undecorate

also known as DSU, which sorted() supports using a key function. It sounds harder than it is. In this case, you need to decorate each file name (which is hard to sort!) with something that is easier to sort. In this case, the filenames look like:

number DOT roman number DOT number

all as one string, so the obvious way to sort them is to convert to three numbers. The first and last numbers are easy, you just use the int() function, the roman number is a bit trickier because Python doesn't have a built-in converter.

So here is my converter, and a decorate function that uses it:

def from_roman(s):
    # Convert Roman numerals into an integer.
    table = {'I': 1, 'II': 2, 'III': 3, 'IV': 4, 'V': 5, 'VI': 6,
             'VII': 7, 'VIII': 8, 'IX': 9, 'X': 10}  # Is this enough?
    return table[s.upper()]


def decorate(filename):
    a, b, c = filename.split(".")  # Split into three pieces, at the dots.
    a = int(a)  # Convert the first and last into integers.
    c = int(c)
    b = from_roman(b)  # and the middle using roman numerals
    return (a, b, c)



And finally you can do the sorting like so:


files = sorted(os.listdir(location), key=decorate)


Note that Python will do the undecorate part automatically.

Warning: I have not tested any of this. It should work, but it's not unpossible for me to have make a misteak.



--
Steven

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
  • [Tutor] Sort George Nyoro
    • Re: [Tutor] Sort Steven D'Aprano

Reply via email to