Bob, I re-ran the script with the files being output and may have changed the names. File2 was being cut into file3. The ^M (\r) was not in file2 but shows up in file3. I did the first cut at 22 so that I would have a leading space just to avoid getting the whole line. If you look at file2 in the previous message, you will see that it is indented one space on all lines. Therefore, the optvg line does have a delimiter - the leading space and that is why I am pulling field 2 instead of field 1. I tried it the way you are suggesting and still got the ^M.
Unfortunately, we are not moving the actual file, we are taking the output from the screen and copying that to Excel for further processing. I never said I was a gawk scripter. But it was the only way I could think of to quickly get groups of three lines catted together. Remember, we usually have multiple groups of three lines in the input file. I would have preferred Perl, but that is not on all *nix systems there. There are no extra ^M characters in the output from vgdisplay. I can reproduce this on that particular system repeatedly (have not tried it on sister systems yet). My main concern was a potential gothca in the coreutils that could creep back in future releases. You may be right in that this might be a RedHat problem that they corrected in a later release. But you never know. I will be glad when the client forces everyone onto a current release. dmg -----Original Message----- From: Bob Proulx [mailto:[EMAIL PROTECTED] Sent: Friday, October 26, 2007 2:55 PM To: Gambs, David (CONT) Cc: [email protected] Subject: Re: Undocumented cut feature Gambs, David (CONT) wrote: > From vi w/set list I have the following - > > file3: > optvg^M$ > 4$ > 3171$ That shows that the carriage return was already in the file before 'cut' processed it. That is the source of the issue. > file2 (the one that I cut on): But your previous example showed that you were cutting file3 into file1. > optvg$ > 4 MB$ > 3171 / 12.39 GB$ > > The command: > cut -f2 -d' ' ~/file2 > ~/file3 Okay. No carriage returns going in. > Your suggested command gives: > $ cut -f2 -d' ' file2 | od -tx1 -c > 0000000 6f 70 74 76 67 0d 0a 34 0a 33 31 37 31 0a > o p t v g \r \n 4 \n 3 1 7 1 \n ^^ A carriage return. I cannot recreate this behavior on a RHEL3 machine. Can you double check that your input files? I believe there may be a mixup in which file is which file. Your first example in the previous message showed you using file3 and the above shows that file3 contains carriage returns in the data. Note that cut prints the entire line if no delimiter is present. `-f FIELD-LIST' `--fields=FIELD-LIST' Select for printing only the fields listed in FIELD-LIST. Fields are separated by a TAB character by default. Also print any line that contains no delimiter character, unless the `--only-delimited' (`-s') option is specified I believe what is happening is that your original input data contains a carriage return in the input. The optvg line is the only line without any delimiters and is therefore passed through by cut. This is why you are seeing the carriage return in the output. > And I have found differences within RedHat on the vgdisplay. This > vgdisplay is in /sbin and not linked to anything. On the system where > the problem does not happen (newer coreutils & OS release) the command > is /usr/sbin/vgdisplay and is linked to lvm. Don't know where that > would make a difference though. You should be able to use 'rpm -qf FILE' where FILE is /sbin/vgdisplay and /usr/sbin/vgdisplay to determine what package contains that file. I don't think vgdisplay should output carriage returns. > cd ~ > /sbin/vgdisplay | egrep -e Name -e "PE Size" -e Free | cut -b 22- | > cut > -f2 -d' ' > file1 > rm file.out > touch file.out > gawk ' > { line0 = /[:alpha:]/ } > { printf "%s ", $line0 >> "file.out" } { getline } { line1 = > /[:print:]/ } { printf "%i ", $1 >> "file.out" } { getline } { line2 = > /[:print:]/ } { print $1 >> "file.out" } ' file1 rm file1 That is a very unconventional awk script! Unfortunately I do not have the time right now to look at what it is doing in detail. > In the gawk script when you output line0, the ^M puts the cursor at > the beginning of the line. The next print lines then overwrite what > was there. In this case optvg is completely overwritten. A longer vg > name would have some of it left. Overwriting would only happen to a terminal. The character stream would still contain all of the characters. > $ cat file.out > 4 3171 I think if you can debug why CRs are in the vgdisplay output and ensure that they are removed there that everything else will flow through normally. > And all this started on HP-UX. The script works just fine there. It > was when I brought it over to Linux that problems arose and > modifications had to be made. About the time you have ported to three different systems is when most scripts start to get portable. :-) Bob The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer. _______________________________________________ Bug-coreutils mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-coreutils
