Skip to main content

apache 2.2 - Server Freeze Up Under Load

I'm having a problem with a debian server that I thought was due to bad RAM, but is persisting.




It's a Dell Poweredge 6800 with two dual-core 3.6GHZ Xeon processors and 5GB of DDR2 ECC 333.



I've got a single 73GB SCSI Drive.



I'm working it to death right now, pulling records from MySQL to build asterisk .call files (small text files) which trigger SIP calls.



We manage it via a cgi interface, and the system is also running citadel for our mail, but we have less than five users. It's not a huge drain.



My peak usage seems to be about 460 calls per minute. Load hovers between 2.0 - 4.3, if I push it past that, it spikes to >22.0.




The problem I'm having is that, about an hour into a dial, it's freezing up on me. Last night I started it at 5:59, and at 6:55:17 seconds, the system became non-responsive. Nothing was logged, I couldn't connect via ssh or http, it responded to ping, and nmap showed open ports which I was able to telnet to, but not elicit any response from.



My sar data collection ran at 6:50, and at that time, I was seeing heavy usage, as expected, but nothing outrageous, as far as I can tell.



The system had been complaining of a memory error in one of the new 2GB strips I'd installed, so after the first crash, I replaced that pair with the 512MB strips we upgraded from.



I'm currently dialing with a live sar data collection running, in case it crashes again. At least I'll be able to dial in with a little more granularity.



Other than that, I'm lost as to how to diagnose the system freeze in absence of any relevant log data, or a crash dump. As the system is still running, but completely nonresponsive during this time, until I perform a power-cycle. Any ideas?




NOTE: I have new servers on order to take some of the load off of this system by distributing services, but in the meantime, it's a mean time where our production is relying on this workhorse.



Here's the Sar Data from Last Night's crash.



UPDATE: This sar snapshot was running in 10sec increments, last gathered 1 sec prior to freeze-up



I've purchased a terminal console server, and can now see what's going on when the system freezes up.



This set of messages just repeats every 30 seconds or so, cycling through CPU1 and CPU2




[17675.940127] BUG: soft lockup - CPU#1 stuck for 61s! [asterisk:4579]
[17675.940127] Modules linked in: btrfs zlib_deflate crc32c libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ext]
[17675.940127]
[17675.940127] Pid: 4579, comm: asterisk Not tainted (2.6.32-5-686-bigmem #1) PowerEdge 6800
[17675.940127] EIP: 0060:[] EFLAGS: 00000202 CPU: 1
[17675.940127] EIP is at native_flush_tlb_others+0x85/0xa6
[17675.940127] EAX: 00000282 EBX: c14620ac ECX: c102fb3a EDX: 00000020
[17675.940127] ESI: 00000001 EDI: 00000040 EBP: c14620a0 ESP: f35d1a3c
[17675.940127] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[17675.940127] CR0: 80050033 CR2: b3f06946 CR3: 36787000 CR4: 000006f0

[17675.940127] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[17675.940127] DR6: ffff0ff0 DR7: 00000400
[17675.940127] Call Trace:
[17675.940127] [] ? flush_tlb_page+0x5d/0x65
[17675.940127] [] ? ptep_set_access_flags+0x59/0x63
[17675.940127] [] ? do_wp_page+0x3b9/0x7dd
[17675.940127] [] ? kmap_atomic_prot+0xd7/0xfc
[17675.940127] [] ? handle_mm_fault+0x982/0xa22
[17675.940127] [] ? lock_hrtimer_base+0x15/0x2f
[17675.940127] [] ? hrtimer_try_to_cancel+0x2f/0x35

[17675.940127] [] ? do_page_fault+0x2f1/0x307
[17675.940127] [] ? do_page_fault+0x0/0x307
[17675.940127] [] ? error_code+0x73/0x78
[17675.940127] [] ? copy_strings+0x94/0x1ba
[17675.940127] [] ? do_sys_poll+0x2c3/0x312
[17675.940127] [] ? __pollwait+0x0/0xa5
[17675.940127] [] ? pollwake+0x0/0x65
[17675.940127] [] ? pollwake+0x0/0x65
[17675.940127] [] ? pollwake+0x0/0x65
[17675.940127] [] ? pollwake+0x0/0x65

[17675.940127] [] ? activate_task+0x1e/0x24
[17675.940127] [] ? push_rt_task+0x208/0x242
[17675.940127] [] ? post_schedule+0x31/0x3e
[17675.940127] [] ? schedule+0x78f/0x7dc
[17675.940127] [] ? futex_wait_setup+0x5c/0xcd
[17675.940127] [] ? futex_wait_queue_me+0x87/0x98
[17675.940127] [] ? sched_clock+0x5/0x7
[17675.940127] [] ? zone_watermark_ok+0x16/0x99
[17675.940127] [] ? cpupri_find+0x4c/0xd6
[17675.940127] [] ? get_page_from_freelist+0xc0/0x3c7

[17675.940127] [] ? check_preempt_curr_rt+0x76/0xe3
[17675.940127] [] ? smp_invalidate_interrupt+0x73/0x86
[17675.940127] [] ? __alloc_pages_nodemask+0xf3/0x4d9
[17675.940127] [] ? cpumask_any_but+0x20/0x2b
[17675.940127] [] ? flush_tlb_page+0x4a/0x65
[17675.940127] [] ? mutex_lock+0xb/0x24
[17675.940127] [] ? do_sync_read+0xc0/0x107
[17675.940127] [] ? do_send_sig_info+0x4f/0x59
[17675.940127] [] ? autoremove_wake_function+0x0/0x2d
[17675.940127] [] ? ktime_get_ts+0xcd/0xd5

[17675.940127] [] ? sys_poll+0x44/0x8d
[17675.940127] [] ? sysenter_do_call+0x12/0x28


The first iteration had another set of modules listed.



[267866.376128] Modules linked in: cpufreq_powersave cpufreq_stats cpufreq_conservative cpufreq_userspace parport_pc ppdev lp parport sco bridge stp bnep rfcomm l2cap crc16 bluetooth rfkill nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs binfmt_misc fuse loop radeon ttm psmouse drm_kms_helper serio_raw evdev pcspkr drm i2c_algo_bit rng_core i2c_core dcdbas shpchp button pci_hotplug processor ext3 jbd mbcache sd_mod crc_t10dif sg sr_mod cdrom ata_generic uhci_hcd ata_piix mptspi mptscsih ehci_hcd mptbase usbcore nls_base libata tg3 scsi_transport_spi scsi_mod floppy libphy thermal thermal_sys [last unloaded: scsi_wait_scan]


I installed intel-microcode microcode.ctl haven't figured out how to disable hyperthreading as some other forums have suggested.

Comments

Popular posts from this blog

iLO 3 Firmware Update (HP Proliant DL380 G7)

The iLO web interface allows me to upload a .bin file ( Obtain the firmware image (.bin) file from the Online ROM Flash Component for HP Integrated Lights-Out. ) The iLO web interface redirects me to a page in the HP support website ( http://www.hp.com/go/iLO ) where I am supposed to find this .bin firmware, but no luck for me. The support website is a mess and very slow, badly categorized and generally unusable. Where can I find this .bin file? The only related link I am able to find asks me about my server operating system (what does this have to do with the iLO?!) and lets me download an .iso with no .bin file And also a related question: what is the latest iLO 3 version? (for Proliant DL380 G7, not sure if the iLO is tied to the server model)

linux - Awstats - outputting stats for merged Access_logs only producing stats for one server's log

I've been attempting this for two weeks and I've accessed countless number of sites on this issue and it seems there is something I'm not getting here and I'm at a lost. I manged to figure out how to merge logs from two servers together. (Taking care to only merge the matching domains together) The logs from the first server span from 15 Dec 2012 to 8 April 2014 The logs from the second server span from 2 Mar 2014 to 9 April 2014 I was able to successfully merge them using the logresolvemerge.pl script simply enermerating each log and > out_putting_it_to_file Looking at the two logs from each server the format seems exactly the same. The problem I'm having is producing the stats page for the logs. The command I've boiled it down to is /usr/share/awstats/tools/awstats_buildstaticpages.pl -configdir=/home/User/Documents/conf/ -config=example.com awstatsprog=/usr/share/awstats/wwwroot/cgi-bin/awstats.pl dir=/home/User/Documents/parced -month=all -year=all...

linux - How can I get my mediawiki to stop thinking I have cookies disabled?

I've searched half a day for how to resolve this issue, and can't figure it out. Shortly after I made my wiki a simple private wiki according to the instructions at Mediawiki's website, it started giving me this weird login error message: Wiki uses cookies to log in users. You have cookies disabled. Please enable them and try again. If I remove those private wiki settings, the error disappears, even if I try logging in. But I need it to be a private wiki for only my team. So what do I do? Here's what I've done so far. Just to be safe, after ever change, I try rebooting Apache using: sudo /etc/init.d/apache2 restart In my php.ini file, I have the following set: session.save_path = "/var/lib/php5" session.cookie_secure = secure session.cookie_path = /tmp session.cookie_domain = my server's internal URL (should I even set this? this field was blank before, but not commented out) session.referer_check = Off I ran the following to ensure that the fold...