Problem with reading file and executing other stuffs?

2007-11-01 Thread Horinius

I've been struggling with the following code of reading a text file
(test.txt) and counting the number of lines.  Well, I know there're simpler
method to count the number of lines of a text file, but that's not the point
of this post.
__

n=0
cat test.txt | 
while read line
do
n=$((n+1))
echo "${line}"
done

echo "$n"
__

The result of the last echo is zero, meaning that n is never incremented
inside the while loop.  It seems to me that inside this while loop, except
the echo of the lines, nothing else is done.

Pitfall? Bug?  Or feature?

I'd appreciate if somebody could shed some light on this.

-- 
View this message in context: 
http://www.nabble.com/Problem-with-reading-file-and-executing-other-stuffs--tf4733602.html#a13535762
Sent from the Gnu - Bash mailing list archive at Nabble.com.





Re: Problem with reading file and executing other stuffs?

2007-11-02 Thread Horinius


Paul Jarc wrote:
> 
> Read entry E4 in the bash FAQ:
> http://tiswww.case.edu/php/chet/bash/FAQ
> 

OK, I see, the problem comes from use of pipeline which triggers creation of
subprocess (why should they do so?  -- no need to answer this question :p )
I've read several times that section but I'm not sure how to use the IFS.

However,
http://en.wikipedia.org/wiki/Bash_syntax_and_semantics#I.2FO_redirection
seems to give some hints.  Need to try it.

-- 
View this message in context: 
http://www.nabble.com/Problem-with-reading-file-and-executing-other-stuffs--tf4733602.html#a13545261
Sent from the Gnu - Bash mailing list archive at Nabble.com.





Re: Problem with reading file and executing other stuffs?

2007-11-02 Thread Horinius


Paul Jarc wrote:
> 
> If you're reading from a regular file, you can just eliminate the
> useless use of cat:
> while read line; do ...; done < test.txt
> 

Oh yes!  This is a lot better and syntactically simpler than using the file
descriptor 6 (which nevertheless is also a working solution).

It's a pity that the filename can't be put before the while loop or it'll be
a lot easier to read, esp when the while loop is very big.  (Once more, no
need to answer to this comment of mine :p )

Is there any pitfall using this solution of yours?  You talked about
"regular file", what's that supposed to be?  Text file vs binary file?

I've found that if the last line isn't terminated by a new-line, that line
can't be read.  This seems to be a very common error and I've seen it in
other commands.

-- 
View this message in context: 
http://www.nabble.com/Problem-with-reading-file-and-executing-other-stuffs--tf4733602.html#a13553050
Sent from the Gnu - Bash mailing list archive at Nabble.com.





Re: Problem with reading file and executing other stuffs?

2007-11-02 Thread Horinius



Hugh Sasse wrote:
> 
> On Fri, 2 Nov 2007, Horinius wrote:
>> I've found that if the last line isn't terminated by a new-line, that
>> line
>> can't be read.  This seems to be a very common error and I've seen it in
>> other commands.
> 
> This is a Unix convention.  I don't know the origins.
> 

I was not talking about new-line vs carriage-return vs new-line AND
carriage-return.

I was saying that if the last character of the last line is also the last
character of the file, that line isn't read.  Or is it really what you're
referring to as "Unix convention"?
-- 
View this message in context: 
http://www.nabble.com/Problem-with-reading-file-and-executing-other-stuffs--tf4733602.html#a13553839
Sent from the Gnu - Bash mailing list archive at Nabble.com.





Re: Problem with reading file and executing other stuffs?

2007-11-08 Thread Horinius


Hugh Sasse wrote:
> 
> And vi warns about it in a similar way to ed.
> 
> Again, what problem are you trying to solve, if any?  
> 
I'm doing some processing in a big file which is well formatted.  It's sort
of a database table (or a CVS file if you like).  Every line contains a
unique element that determines what should be done.  Of course, I could go a
grep on the file to find out the elements, but this would give a complexity
of O(n^2).

I know that every line is processed only once, and the pointer to the
current line will never go back.  So I figure out that I could read every
line in an array element and I could process line by line.  This would give
a O(n) which is much faster.

-- 
View this message in context: 
http://www.nabble.com/Problem-with-reading-file-and-executing-other-stuffs--tf4733602.html#a13651074
Sent from the Gnu - Bash mailing list archive at Nabble.com.





Re: Problem with reading file and executing other stuffs?

2007-11-12 Thread Horinius


Hugh Sasse wrote:
> 
> OK, if it is in fields, like /etc/passwd, then awk is probably more
> suited to this problem than reading it directly with shell script.
> 
> If it has some delimited keyword, but each line has variable structure,
> then you'd be better using sed.
> 

The files contain something like:
aaa xxx xxx x x 
bbb xxx xxx xxx xxx xx  xx
ccc xx x  x 

aaa, bbb, ccc are the known unique elements.  No, they don't have a fixed
size.  And no, there's no delimited keyword except the first space after
them.  Those xxx are sequences of characters that can be anything, from
numbers to letters and different length.

The elements are known and unique, and I need to extract the whole line
beginning with such elements.  That's why I used the example of "database
table".  Is awk suitable?  I know nothing about awk.


Hugh Sasse wrote:
> 
> Both of these operate linewise on their input, and can use regular
> expressions and actions in braces to produce some textual response.
> You can pass that response to `xargs -n 1` or  something.
> 

I'm not sure I understand since I know nothing about awk.  But this could be
postponed to a later time for discussion if adequate.


Hugh Sasse wrote:
> 
>> unique element that determines what should be done.  Of course, I could
>> go a
>> grep on the file to find out the elements, but this would give a
>> complexity
>> of O(n^2).
> 
> Not sure how you get the O(n^2) from that unless you don't know what
> the unique elements are, but I still make that "one pass to read them
> all, one pass to execute them" [with apologies to Tolkien :-)]
>> 
>> I know that every line is processed only once, and the pointer to the
>> current line will never go back.  So I figure out that I could read every
>> line in an array element and I could process line by line.  This would
>> give
>> a O(n) which is much faster.
> 
> Yes, agreed.  Throw us a few example lines, fictionalised, then we may
> be able to give you an example of an approach with greater simplicity.
> 

Put it in a simple way, the pseudo algo of extracting lines is like this:
n = number of lines in the file (which is also the number of elements to
process)
element = array(1 to n) of known elements
for i = 1 to n
   use grep or whatever to extract a whole line beginning with element(i)
   //process the line
end

Here, grep has to parse the whole file to extract one line.  In other words,
if there're 3 elements, grep has to parse 3 lines for every element.  Thus
it has to parse 9 lines during the whole algo.  Therefore, if there're n
elements, grep has to parse n lines for n times.  Thus O(n^2).

Even if grep stops at the first occurence of the element, grep has to parse
n/2 lines in average.  So the time is proportional to n^2/2, so the
complexity is still O(n^2).

-- 
View this message in context: 
http://www.nabble.com/Problem-with-reading-file-and-executing-other-stuffs--tf4733602.html#a13706888
Sent from the Gnu - Bash mailing list archive at Nabble.com.