[kernel-xen] kernel page allocation failure on numa

Thu Apr 2 12:41:14 AEDT 2015

Hi all

We are seeing the attached kind of error below on some of our larger xen 
hosts. The pattern seems to relate to an old issue around numa settings 
in the kernel. eg 
http://rhaas.blogspot.co.nz/2014/06/linux-disables-vmzonereclaimmode-by.html

We are using 3.14.33-1. The thing that seems to fix that is the 
following sysctl setting...

sysctl -w 'vm.zone_reclaim_mode=1'

All our VMS are pinned to specific cpus and we are using mainly intel 
CPUs. To my understanding the pinning is best practice. So it makes 
sense to have that setting on by default.

Given this kernel is tuned for xen servers, I wonder if it is worth 
patching to set that to 1 by default? Especially with so many multiproc 
machines coming out now this is going to be increasingly common setting. 
Does anyone know if this could affect arm based machines also?

Regards, Glenn
http://ri.mu - Startups start here.
Hosting. DNS. Web Programming. Email. Backups. Monitoring.

kswapd0: page allocation failure: order:1, mode:0x204020
CPU: 0 PID: 37 Comm: kswapd0 Tainted: GF 3.14.33-1.el6xen.x86_64 #1
Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 4.6.5 02/08/2012
   0000000000000001 ffff88007a203618 ffffffff8161e672 0000000000000010
   0000000000204020 ffff88007a2036a8 ffffffff811415db fffffffe00203638
   ffff880101b6cb38 0100000000000000 0000000000000041 fffffffe00203601
Call Trace:
   <IRQ>  [<ffffffff8161e672>] dump_stack+0x49/0x5f
   [<ffffffff811415db>] warn_alloc_failed+0xeb/0x150
   [<ffffffff811447bb>] __alloc_pages_nodemask+0x74b/0xaa0
   [<ffffffff811447bb>] ? __alloc_pages_nodemask+0x74b/0xaa0
   [<ffffffff8161e5f4>] ? printk+0x4d/0x4f
   [<ffffffff8118fba8>] kmem_getpages+0x78/0x170
   [<ffffffff81190703>] fallback_alloc+0x193/0x240
   [<ffffffff81190494>] ____cache_alloc_node+0x94/0x170
   [<ffffffff8119197b>] kmem_cache_alloc+0x11b/0x1c0
   [<ffffffff81541a48>] sk_prot_alloc+0x48/0x150
   [<ffffffff81541c60>] sk_clone_lock+0x20/0x320
   [<ffffffff8159de66>] inet_csk_clone_lock+0x16/0xd0
   [<ffffffff815b8893>] tcp_create_openreq_child+0x23/0x4e0
   [<ffffffff815b5111>] tcp_v4_syn_recv_sock+0x41/0x2b0
   [<ffffffff815b85c7>] tcp_check_req+0x237/0x4e0
   [<ffffffff815b6739>] tcp_v4_do_rcv+0x339/0x4a0
   [<ffffffff815b8196>] tcp_v4_rcv+0x696/0x700
   [<ffffffff81592a40>] ? ip_rcv+0x3a0/0x3a0
   [<ffffffff81592ae8>] ip_local_deliver_finish+0xa8/0x230
   [<ffffffff81592cf8>] ip_local_deliver+0x88/0x90
   [<ffffffff815922f9>] ip_rcv_finish+0x119/0x380
   [<ffffffff81592965>] ip_rcv+0x2c5/0x3a0
   [<ffffffff815bbe32>] ? tcp4_gro_receive+0xf2/0x140
   [<ffffffff81558a6e>] __netif_receive_skb_core+0x5fe/0x7a0
   [<ffffffff81558c37>] __netif_receive_skb+0x27/0x70
   [<ffffffff81558ebd>] netif_receive_skb_internal+0x2d/0x90
   [<ffffffff81559b98>] napi_gro_receive+0x98/0x100
   [<ffffffffa00ee226>] igb_poll+0x6b6/0x1040 [igb]
   [<ffffffff810c0429>] ? handle_irq_event_percpu+0xc9/0x200
   [<ffffffff81559271>] net_rx_action+0x111/0x2a0
   [<ffffffff8107253c>] __do_softirq+0xfc/0x2b0
   [<ffffffff810727fd>] irq_exit+0xbd/0xd0
   [<ffffffff813b88e5>] xen_evtchn_do_upcall+0x35/0x50
   [<ffffffff8162d0be>] xen_do_hypervisor_callback+0x1e/0x30