Stability Problem with Mercury & Jaunty

Posted by kcoop on March 16, 2010 at 5:54pm

I had a soft lockup this morning on my 32 bit mercury instance (ami ami-9f9f70f6). No use of XFS at all. Rebooting failed with more soft lockups. I've appended the initial error output, fwiw.

As I searched for the issue, I ran into the following from Eric Hammond. The kernel we use sounds like the same one - is it?

[EDIT: oops - misread Eric's post. The security problem is referring to an earlier build. My bad.]

http://developer.amazonwebservices.com/connect/message.jspa?messageID=15...

danpeg1:

It was started with ami-ccf615a5 (alestic/ubuntu-9.04-jaunty-base-20091011.manifest.xml). I'm going to start up a new one now, but was curious what happened.

Eric Hammond:

If you're using XFS, that AMI uses an Amazon kernel which does not support XFS (reboot triggering the problem).

If you need XFS, you might switch to the 20090804 version of that AMI, though the Amazon kernel there has a security hole which allows any logged in user to become root.

Better yet, upgrade to the Karmic AMIs published by Canonical which use true Ubuntu kernels that do not have problems with XFS or the security hole.

BUG: soft lockup detected on CPU#0!

 [<c104a2b2>] softlockup_tick+0xa5/0xb4

 [<c1009814>] timer_interrupt+0x55f/0x5b7

 [<ee0118bd>] do_blkif_request+0x32e/0x377 [xenblk]

 [<c1129d7e>] __add_entropy_words+0x56/0x18b

 [<c104a53a>] handle_IRQ_event+0x36/0x6e

 [<c104b9f2>] handle_level_irq+0x81/0xc7

 [<c104b971>] handle_level_irq+0x0/0xc7

 [<c100719a>] do_IRQ+0xac/0xd2

 [<c114f086>] evtchn_do_upcall+0x82/0xdb

 [<c100585e>] hypervisor_callback+0x46/0x4e

 [<c1201b17>] _spin_lock+0xa/0xf

 [<c10181e9>] mm_unpin+0x14/0x23

 [<c101835f>] _arch_exit_mmap+0x167/0x16f

 [<c1062769>] exit_mmap+0x1c/0xee

 [<c101f2c3>] __cond_resched+0x25/0x3c

 [<c10219ef>] mmput+0x34/0x78

 [<c1026457>] do_exit+0x215/0x730

 [<c10063bd>] die+0x227/0x24c

 [<c10067cb>] do_invalid_op+0x0/0xab

 [<c100686d>] do_invalid_op+0xa2/0xab

 [<c101bdf3>] xen_l2_entry_update+0x8d/0x98

 [<c1057b78>] __do_page_cache_readahead+0x8b/0x202

 [<c105645c>] get_page_from_freelist+0x28c/0x33c

 [<c1201edd>] error_code+0x35/0x3c

 [<c105007b>] utrace_report_vfork_done+0x5a/0x156

 [<c101bdf3>] xen_l2_entry_update+0x8d/0x98

 [<c105e16d>] __pte_alloc+0x1e1/0x269

 [<c105fc4b>] __handle_mm_fault+0x199/0x1146

 [<c10e1738>] prio_tree_remove+0x8d/0x9b

 [<c1061366>] free_pgtables+0x70/0x7c

 [<c105a71a>] vma_prio_tree_insert+0x17/0x2a

 [<c1203862>] do_page_fault+0x72d/0xc24

 [<c10641ec>] do_mmap_pgoff+0x584/0x6ec

 [<c1203135>] do_page_fault+0x0/0xc24

 [<c1201edd>] error_code+0x35/0x3c

Comments

EBS?

Posted by joshk on March 24, 2010 at 5:46pm

Are you running off an EBS volume or have you done anything else to alter the base AMI? We've run the system through many a reboot without issues, but I've personally had some problems in the past with improperly formatted/mounted EBS volumes.

https://pantheon.io | http://www.chapterthree.com | https://www.outlandishjosh.com

I didn't have an EBS volume mounted

Posted by kcoop on March 24, 2010 at 6:27pm

I'm pretty sure it was a stock configuration (other than my own /var/www and mysql data). It had been running for about a week. It wasn't so much about the reboot as that it failed while running, and was then in a bad state. Have you had instances running for long periods of time?

Stability Problem with Mercury & Jaunty

Comments

EBS?

I didn't have an EBS volume mounted

High performance

Group organizers

New groups

Group notifications