close

Privacy guaranteed - Your email is not shared with anyone.

Bizarre System Crash. . Any Ideas?

Discussion in 'Tech Talk' started by Bushflyr, May 8, 2012.


  1. Bushflyr

    Bushflyr
    Expand Collapse
    ʇno uıƃuɐɥ ʇsnɾ
    Millennium Member

    Joined:
    Mar 17, 1999
    3,524
    0
    Location:
    Western WA
    I noticed my server (Ubuntu 11.10 Server) HDD light was on when nothing was accessing the server. I already had a SSH window open on my Mac. So I tried a few commands to try and see what was going on.

    Code:
    [/usr/local/sbin/hourly.active]: htop
    -bash: /usr/bin/htop: Input/output error
    [/usr/local/sbin/hourly.active]: sudo cat /proc/mdstat
    -bash: /usr/bin/sudo: Input/output error
    [/usr/local/sbin/hourly.active]: ls
    /usr/local/sbin/ls: line 50: 19495 Bus error               /bin/ls $@ 1>&1
    [/usr/local/sbin/hourly.active]: la
    /usr/local/sbin/ls: line 50: 19500 Bus error               /bin/ls $@ 1>&1
    [/usr/local/sbin/hourly.active]: cd
    [~]: ls
    /usr/local/sbin/ls: line 50: 19507 Bus error               /bin/ls $@ 1>&1
    [~]: top
    Segmentation fault
    
    top, htop, cat, and ls gave errors, but cd worked fine.

    I tried a reboot, but wound up with a "no operating system found" sort of error. I had to go to work, so I shut down and switched off the PSU. After coming home I rebooted into recovery, powered down the system normally (halt -p) and rebooted. It came up fine except for a failed sdb in in the raid. It rebuilt fine on the spare. I'm currently running a smart test (smartctl -t long /dev/sdb) but I don't expect any errors as the RAID has dropped disks before and they checked out fine.

    It seems odd thought that just failing a raid disk (the OS is on a separate drive) would take the whole system down.

    Any thoughts?
     

    Wanna kill these ads? We can help!
  2. gemeinschaft

    gemeinschaft
    Expand Collapse
    AKA Fluffy316

    Joined:
    Feb 7, 2004
    2,181
    0
    Location:
    Houston, TX
    I am not sure, but I would consider setting up a Cron job to check your disks daily to monitor to see if you just had a bad drive or what the deal was.
     

  3. Linux3

    Linux3
    Expand Collapse

    Joined:
    Dec 31, 2008
    1,399
    0
    Not enough info about your system but.
    cd /var/log
    ls -al
    look at the time stamps on dmesg and syslog.
    cat dmesg |grep sdb
    cat /var/log/syslog |grep sdb

    Or use dmesg.0 and syslog.1
    or whatever to match the time of the problems.

    Any errors?
     
  4. Bushflyr

    Bushflyr
    Expand Collapse
    ʇno uıƃuɐɥ ʇsnɾ
    Millennium Member

    Joined:
    Mar 17, 1999
    3,524
    0
    Location:
    Western WA
    Thanks for the ideas. I've gone through all the log files and there's nothing there. I'll try a smartctl cronjob, but I don't expect much there. The drives are all new and I've run a long test after each failure with no errors. Different drives have dropped out at different points, but it had been running reliably for a few weeks now with no probs. :dunno:
     
  5. Detectorist

    Detectorist
    Expand Collapse

    Joined:
    Jul 16, 2008
    13,023
    2,333
    Location:
    Missouri
    Windows 7 would have prevented that.


    :rofl:
     
  6. Bushflyr

    Bushflyr
    Expand Collapse
    ʇno uıƃuɐɥ ʇsnɾ
    Millennium Member

    Joined:
    Mar 17, 1999
    3,524
    0
    Location:
    Western WA
    If by "would have prevented that" you mean "would have prevented my even installing a RAID since Win7 wouldn't know a RAID if it bit it on the ASSH," then yes, you are correct.

    Oh, wait, it doesn't do ASSH either. :upeyes:
     
    #6 Bushflyr, May 11, 2012
    Last edited: May 11, 2012
  7. Detectorist

    Detectorist
    Expand Collapse

    Joined:
    Jul 16, 2008
    13,023
    2,333
    Location:
    Missouri
    Win 7 Professional Ultimate supports Mirrored type of RAID.
     
  8. Bushflyr

    Bushflyr
    Expand Collapse
    ʇno uıƃuɐɥ ʇsnɾ
    Millennium Member

    Joined:
    Mar 17, 1999
    3,524
    0
    Location:
    Western WA
    I know. But adding in the exception in ruined the lyrical flow. :supergrin:

    And the intent is still correct since Windows 7 Professional Ultimate Super Duper Apex Pinnacle etc etc still doesn't do RAID 5 (which is what I'm using), RAID 6, or any sort of nested RAID. It does RAID 1. And I'm purposely leaving out "RAID" 0 because it's not really RAID as there is no Redundant in it.
     
    #8 Bushflyr, May 12, 2012
    Last edited: May 12, 2012
  9. jarubla

    jarubla
    Expand Collapse
    Dos Pistolas

    Joined:
    Feb 16, 2010
    377
    0
    Location:
    UT
    Raid 5 is single parity, right? Can you ID which disk failed or hiccuped? Any chance that you had more than the one disk report an issue, or even when it was rebuilding? Smells like a possible RAID rebuild issue to me, and disks sometimes do funny things at the worst possible times. ONe of the main reasons why I am a RAID 6 guy. More costs involved on that extra disk, but can help alleviate dual disk failures.

    Are you able to parse through any log files to pinpoint when the issue occurred? Hoping maybe an error message can be pulled and we can wash it through the ubuntu bug tool:

    https://bugs.launchpad.net/ubuntu

    Also, as a side note, I just saw your thread over on http://ubuntuforums.org, following this as I am curious now as to the outcome.

    -Jay
     
  10. Bushflyr

    Bushflyr
    Expand Collapse
    ʇno uıƃuɐɥ ʇsnɾ
    Millennium Member

    Joined:
    Mar 17, 1999
    3,524
    0
    Location:
    Western WA
    Raid 5 is single parity, but I'm also running a hot spare, so there is some extra safety there. I've lost sdb and sde at one point or another, but no errors ever showed up when scanning the drives afterward and I readded them to the RAID without issue.

    The first couple times it happened I was thinking maybe cables, but there were no IO errors. And nothing listed in any log files. At this point I'm wondering if it's possibly flaky power in my house. (I haven't gotten the UPS yet, but it's on the list) I recall reading somewhere that RAIDs are particularly sensitive to power fluctuations. And, all my lights dim for a second when the wife turns on the hair dryer.

    Also previous RAID failures never took down the OS. Everything is back up and running fine, so at this point :dunno:
     
Loading...
Similar Threads Forum Date
Any ideas? Reloading Aug 6, 2011
any ideas Business Forum Jun 25, 2008
Great spyware ?? Any ideas? Tech Talk Jan 16, 2007
WEIRD Problem With System...Any Ideas??? Tech Talk May 3, 2005