[kernel-xen] Kernel Panic

Wed May 18 09:04:12 AEST 2016

Steve,

I thought I had a response from you somewhere I could reply to but I can’t find it. Perhaps it was a note in the tracker.

I’m replying back as requested in the event anyone else runs into a similar issue.

I think this turned out to be a combination of a couple things not at the fault of the kernel.

1) I use fibre channel from each of 4 Dell R610 Xen Hosts to FreeNAS on a R710 directly connected via T/R reversed on one end. FreeNAS does not allow me to map a file extent/LUN to a particular port that I can figure out. So, all 4 extents show up as devices in /dev on each server. I had the extents mounted by the dev id on each server. Upon a reboot of FreeNAS I found out that one of the extents was mounted by 2 different servers over FC. This clearly through a bunch of woes my way. I *seem* to have resolved it by mounting the volumes by UUID rather than device. So far this has withstood a couple reboots with all extents mounted on each server correctly.
2) Upon setting up these Xen hosts I put a USB thumb drive internally with a full install of CentOS 7 on them for rescue purposes. I did not disable the USB port via the BIOS once done setting them up the recue thumb drive. When I installed CentOS on the RAID, for some reason, it included the thumb drive in the kernel init string and also in fstab. Recipe for disaster! All the swap on the thumb drive killed it to the point it would intermittently work and intermittently cause kernel panics. In looking at the USB errors showing up in the console upon boot, the only thing I found was a few threads saying it’s a USB device drawing too much power and throwing the error. I removed the thumb drive, all references to it in the fstab and kernel init string in grub.conf, rebooted the server and it’s been up now for a few weeks without a kernel panic. On the other three R610 Xen hosts, I simply disabled the thumb drives in the bios and removed accordingly as mentioned earlier. This ensure they are there for emergency purpose and can be enabled as needed. Thank God for a KVM over IP. This makes it easy.

I never did figure out why "module scsi_wait_scan not found” kept flooding the console but once the issues above were fixed, it’s not returned.

Any way, long winded yes, but at least now people will know if they hit any of these issues, they’ll know what it is.

Thanks again Steve for being of great assistance and for your contributions to the OS community!

Thank you,
Steffan Cline
602-793-0014

On 4/5/16, 11:11 PM, "kernel-xen on behalf of Steffan Cline" <kernel-xen-bounces at lists.wireless.org.au on behalf of steffan at hldns.com> wrote:

>Steven,
>
>
>
>On 4/5/16, 3:22 PM, "kernel-xen on behalf of Steven Haigh" <kernel-xen-bounces at lists.wireless.org.au on behalf of netwiz at crc.id.au> wrote:
>>On 6/04/2016 1:11 AM, Steffan Cline wrote:
>>> Had an odd issue a couple weeks back where I checked on my server and
>>> there was a kernel panic. I didn’t think much of it and restarted. It
>>> booted into the default CentOS 6 kernel. Once I realized what happened I
>>> rebooted back into the Xen kernel just fine.
>>> 
>>> This morning I woke to find that the server had been crashing and
>>> rebooting for several hours.
>>> 
>>> I logged in and the cause seems to be the notorious error "module
>>> scsi_wait_scan not found” There are no good easy solutions on how to fix
>>> this.
>>
>>This isn't a crash. The error shows because the module doesn't exist.
>>I'm not keep on patching the init scripts of EL6 to remove the message
>>as that has the potential to cause more problems than to remove a
>>cosmetic only error.
>
>Can you please tell me how to patch it? I’ll happily boot back to an older kernel, change it and reboot.
>
>
>>> Most docs I came across say it was supposed to be removed a while back.
>>
>>And it has - hence you see the 'error'.
>
>That ‘error’ repeated itself about a dozen times before the computer crashed and rebooted repeatedly. I should have taken a screenshot to be able to document some of the different kernel panics I saw when trying different kernels. One I did see was usb 1-3.3: device descriptor read/8, error -110 on a couple different kernels. The RAM did pass the memory checks if that matters.
>
>
>>> Suggestions anyone? I only got my system back up by using an older Xen
>>> kernel. Hard part is my mail runs in a VM on the system. Such luck.
>>
>>Which kernel were you using? DomU configuration? Dom0 configuration? Are
>>you using PVH? Linux guest? Windows guest? 'Older Xen kernel' in what?
>>The DomU? The Dom0?
>
>I was using kernel-xen-4.4.6-2.el6xen.x86_64. I’m only a novice with virtualization but if I have it right, it’s Dom0. 
>The host is CentOS 6.7 and I had 2 CentOS 6.7 guests and a Windows Server 2008R2 goes.
>
>The older kernel I reverted to in order to get the system back up is 4.4.4-2.el6xen.x86_64.
>
>
>>
>>As a side note, I'd recommend removing the stock kernels on the Dom0 -
>>it will operate quite fine without them.
>
>I’d heard that before too but if it wasn’t for that, I’d wouldn’t have been able to get it back up and check everything first. I’ll probably do that once I figure out how to fix this.
>>
>
>
>
>Thank you,
>Steffan Cline
>602-793-0014
>
>
>_______________________________________________
>kernel-xen mailing list
>kernel-xen at lists.wireless.org.au
>https://lists.wireless.org.au/mailman/listinfo/kernel-xen