HELO, Please check the mailing list archives. If i don't remember wrong i've heard about a package that breaks after a certain time, a bug that is resolved now. I don't remember which package it was but you could at least try to find something about it.
I know this reply is very vague but maybe it gives you somewhere to start... Regards, Björn Elwhagen On Mon, Oct 23, 2000 at 08:57:59AM +0200, Jean Orloff wrote: > Hello, dear debian fellows! > > Please forgive my paranoid anonymity, in view of the last section of > this message. > > 1) My problem: > > I have happily used debian since 1995 (0.93R6 if I recall?). But since I > installed 2.1 on my new PC at work, about a year ago, that machine > undergoes about a crash per month in average. Nothing to scare a > windblows user, of course, but unbearable for someone who knows this > should not be so. Especially as these crashes are unrecoverable: screen > frozen, mouse/keyboard frozen (no vt switching nor clean reboot > possible) and even no access from outside through the network. Thus no > alternative to the brutal power switchoff, with subsequent fsck'ing of > the whole disk. > > When does this happen? Always with a heavy load (2-3 users on a 128Mb > pentium > 400, each with several windows, netscape, emacs etc + some compilation > or latex2html going on); always with at least one remote ssh login. I > also sometimes had the impression of the mouse freezing temporarily > before the total crash, but you know how short time causality can be > violated in the human brain. > > 2) Software problems? > > In the beginning, I attributed this to the network interface card > (3C905C=Tornado) that was not officially supported by Donald Becker's > 3C59x > driver. Indeed, a twin sister machine (same install, same hardware > except for an officially supported 3C905B) had no problem whatsoever. So > I fetched the official driver from 3com site, tried a precompiled > mandrake kernel with this 3com driver, but the problem remained. I then > tried various kernel+3C59x pre- or home-compiled versions (2.13, 2.15, > 2.17) > but with each I endured at least one crash. I checked the NIC > autoconfigured network parameters with ether-diag, found out half duplex > generated less error messages but nothing more serious. I installed > gnu-accounting package (acct), to see the last commands that were > executed before the crash, but found nothing special. > > 3) Hardware problems? > > Bored with switching kernels, I followed the hardware problem track. > Despite a successful memory test at boot, maybe one of the 2 memory bars > was bad? I ran during the summer on half memory, but it ended up by a > crash again. I switched the memory bar: problem again a month later. > Maybe the NIC slot was bad? I switched with the soundcard last week. No > crash yet, but I have reasons to believe it won't help. > > 4) Hacker/virus problems? > > During the very first hour of the very first install, I got port-scan > attacked (see log below). Bad point for debian, I thought: what is the > probability of a PC being attacked in the first hour of its connection > to the net? Looks more probable that the attack was triggered by the > install process! Anyway, the ports for telnet (22) and ftp (??) were > filtered by the local router (except for local machines), I was not > running any daemon, so I was not too scared. After watching the logs for > about a week, I opened the machine to full internet exclusively through > ssh connections which are not filtered by the local router. > > Until last week, I had no reason to think of hacker origin to my > crashes. But last week, I got 2 crashes. And I noticed something very > curious in the accounting logs. Among the last processes that finished > less than 5 minutes before the crash, there was a bunch of NAMELESS root > processes, that started at 0 unix time (Jan 1 1970) and lasted 0 second > (!?). E.g: > > # lastcomm > > S20acct S root ?? 0.01 secs Thu Oct 19 19:40 > accton S root ?? 0.00 secs Thu Oct 19 19:40 > ---> reboot > root ?? 0.00 secs Thu Jan 1 01:00 > root ?? 0.00 secs Thu Jan 1 01:00 > root ?? 0.00 secs Thu Jan 1 01:00 > root ?? 0.00 secs Thu Jan 1 01:00 > root ?? 0.00 secs Thu Jan 1 01:00 > root ?? 0.00 secs Thu Jan 1 01:00 > root ?? 0.00 secs Thu Jan 1 01:00 > root ?? 0.00 secs Thu Jan 1 01:00 > root ?? 0.00 secs Thu Jan 1 01:00 > root ?? 0.00 secs Thu Jan 1 01:00 > root ?? 0.00 secs Thu Jan 1 01:00 > root ?? 0.00 secs Thu Jan 1 01:00 > root ?? 0.00 secs Thu Jan 1 01:00 > root ?? 0.00 secs Thu Jan 1 01:00 > root ?? 0.00 secs Thu Jan 1 01:00 > perl user1 ?? 0.18 secs Thu Oct 19 18:57 > sh user1 ?? 0.01 secs Thu Oct 19 18:57 > > Suspicious, no? Such nameless processes occured in both crashes last > week. I kept the acct logs of the previous one too: there they were, > less than 5 minutes before the crash. Unfortunately, I could not check > the ones before. > > But even more curious: my previous machine (call it PC2), with the > same install, but a totally different (older) hardware. Users from my > machine (call it PC1) often ssh-log to PC2 and vice-versa. Furthermore, > during the portscan-attacked install, the new PC1 was bearing the name > and address the previously existing PC2. Anyway, for the first time last > week, PC2 endured 2 crashes too. On one of these crashes, there were > nameless-timeless root processes just before the > crash also. But no sign of remote login the full day: looks more like a > virus than a hacker? Unless the nameless root processes were in fact > erasing the footprints of the remote-login in the various logs, and > their name got erased by the process that crashed the machine and thus > left no track. > > All this is a bit loose: not that many statistics to be sure. But I am > not too keen on accumulating statistics! So I would like to be able to > get as much info out of each crash as I can. Acct is better than > nothing, but in fact, I have no way to know which processes were > actually active *during* the crash. Nor how much memory was used and > things like that. > > 5) Questions: > > 0) Is anybody experiencing the same kind of flakiness? If it is really > the install that triggered it for me, it should have triggered > it for others! > 1) How can one generate nameless processes in acct logs? Can this be > normal? > 2) What tools could I use to help pinpointing the problem? E.g: a > process accounting that would log the beginning (instead of > ending) processes... > 3) Can a network driver really freeze the full kernel? > 4) How can the kernel be frozen? Is there a kernel bug that propagated > through 2.2.13-17? > > Many thanks for any help! > > PS: you can privately reply to this mail. > > Annex 1: Portscan attack (november 99) > > 9:07:13 tcplogd: port 1114 connection attempt from > [EMAIL PROTECTED] [123.4.576.89] > 9:07:13 tcplogd: port 1116 (idem) > 9:07:15 tcplogd: port 1171 " > 9:07:18 tcplogd: port 1174 " > 9:07:20 tcplogd: port 1183 " > 9:07:24 tcplogd: port 1186 > 9:07:26 tcplogd: port 1192 > 9:08:34 tcplogd: port 1195 > 9:09:10 tcplogd: port 1203 > 9:09:42 tcplogd: port 1206 > 9:10:05 tcplogd: port 1212 > 9:10:27 tcplogd: port 1215 > 9:11:03 tcplogd: port 1282 > 9:13:15 tcplogd: port 1371 > 9:14:07 tcplogd: port 1430 > 9:14:37 tcplogd: port 1433 > 9:14:48 tcplogd: port 1503 > 9:15:00 tcplogd: port 1506 > 9:18:23 tcplogd: port 1599 > 9:19:05 tcplogd: port 1634 > 9:19:13 tcplogd: port 1667 > 9:19:15 tcplogd: port 1794 > 9:19:17 tcplogd: port 1888 > 9:19:18 tcplogd: port 2042 > 9:19:20 tcplogd: port 2089 > 9:19:22 tcplogd: port 2093 > 9:21:20 tcplogd: port 2098 > 9:21:33 tcplogd: port 2103 > 9:21:35 tcplogd: port 2106 > 9:21:37 tcplogd: port 2146 > 9:21:38 tcplogd: port 2149 > 9:21:40 tcplogd: port 2153 > 9:21:42 tcplogd: port 2157 > 9:22:05 tcplogd: port 2160 > 9:24:01 tcplogd: port 2166 > 9:24:09 tcplogd: port 2169 > 9:26:10 tcplogd: port 2174 > 9:27:57 tcplogd: port 2213 > 9:27:57 tcplogd: port 2216 > 9:28:27 tcplogd: port 2221 > 9:31:17 tcplogd: port 2224 > 9:31:29 tcplogd: port 2232 > 9:31:48 tcplogd: port 2235 > 9:32:03 tcplogd: port 2243 > 9:32:16 tcplogd: port 2252 > 9:32:29 tcplogd: port 2255 > 9:32:42 tcplogd: port 2258 > 9:32:59 tcplogd: port 2266 > 9:33:23 tcplogd: port 2308 > 9:34:39 tcplogd: port 2377 > 9:34:41 tcplogd: port 2383 > 9:34:42 tcplogd: port 2386 > 9:34:45 tcplogd: port 2456 > 9:34:48 tcplogd: port 2465 > 9:35:29 tcplogd: port 2480 > 9:35:34 tcplogd: port 2545 > 9:35:38 tcplogd: port 2662 > 9:35:42 tcplogd: port 2666 > 9:35:46 tcplogd: port 2670 > 9:35:51 tcplogd: port 2857 > 9:35:58 tcplogd: port 2904 > 9:36:11 tcplogd: port 3084 > 9:36:13 tcplogd: port 3138 > 9:36:22 tcplogd: port 3141 > 9:36:36 tcplogd: port 3146 > 9:36:40 tcplogd: port 3203 > 9:36:51 tcplogd: port 3271 > 9:37:03 tcplogd: port 3329 > 9:37:15 tcplogd: port 3388 > 9:37:23 tcplogd: port 3444 > 9:37:26 tcplogd: port 3631 > 9:37:29 tcplogd: port 3689 > 9:37:32 tcplogd: port 3695 > 9:37:34 tcplogd: port 3755 > 9:37:38 tcplogd: port 3879 > 9:37:41 tcplogd: port 4003 > 9:37:43 tcplogd: port 4126 > 9:37:45 tcplogd: port 4129 > 9:37:54 tcplogd: port 4136 > 9:37:57 tcplogd: port 4142 > 9:38:03 tcplogd: port 4147 > 9:38:10 tcplogd: port 4152 > 9:38:19 tcplogd: port 4156 > 9:38:26 tcplogd: port 4159 > 9:38:28 tcplogd: port 4163 > 9:38:30 tcplogd: port 4169 > 9:38:41 tcplogd: port 4174 > 9:38:45 tcplogd: port 4180 > 9:38:54 tcplogd: port 4183 > 9:38:58 tcplogd: port 4188 > 9:39:03 tcplogd: port 4191 > > > -- > Unsubscribe? mail -s unsubscribe [EMAIL PROTECTED] < /dev/null >