Skip to main content

linux - memory leak? RHEL 5.5. RSS show ok, almost no free memory left, swap used heavily

i encounter a very instresting problem, and it seems that some physical may disapper quietly. i am very puzzled, so if anyone could give some help, I would be very appreciated.



here is my top show:




sort by memory usage


Cpu(s): 0.8%us, 1.0%sy, 0.0%ni, 81.1%id, 14.2%wa, 0.0%hi, 2.9%si, 0.0%st
Mem: 4041160k total, 3947524k used, 93636k free, 736k buffers
Swap: 4096536k total, 2064148k used, 2032388k free, 41348k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15168 root 20 0 3127m 290m 1908 S 108.2 7.4 43376:10 STServer-1
18303 root 20 0 99.7m 12m 912 S 0.0 0.3 0:00.86 sshd
7129 root 20 0 17160 7800 520 S 0.5 0.2 5:37.52 thttpd
2583 root 10 -10 4536 2488 1672 S 0.0 0.1 1:19.33 iscsid
4360 root 20 0 15660 2308 464 S 0.0 0.1 15:42.71 lbtcpd.out

4361 root 20 0 186m 1976 964 S 0.5 0.0 82:00.36 lbsvr.out
3932 root 20 0 100m 1948 836 S 0.0 0.0 30:31.38 snmpd
18604 root 20 0 66212 1184 820 S 0.0 0.0 0:00.06 bash
18305 root 20 0 66112 1136 764 S 0.0 0.0 0:00.03 bash
18428 root 20 0 12924 1076 708 R 1.0 0.0 0:21.10 top
15318 root 20 0 99.7m 1020 996 S 0.0 0.0 0:01.15 sshd
15320 root 20 0 66228 996 788 S 0.0 0.0 0:00.80 bash
1719 root 20 0 90216 980 884 S 0.0 0.0 0:02.29 sshd
15492 root 20 0 66216 972 780 S 0.0 0.0 0:00.20 bash
15382 root 20 0 90300 964 892 S 0.0 0.0 0:00.57 sshd

1688 root 20 0 90068 960 852 S 0.0 0.0 0:00.57 sshd
2345 root 20 0 90068 928 852 S 0.0 0.0 0:00.50 sshd
16175 root 20 0 90216 924 884 S 0.0 0.0 0:00.64 sshd
2377 root 20 0 90068 908 852 S 0.0 0.0 0:00.44 sshd
2725 root 20 0 90216 896 884 S 0.0 0.0 0:05.27 sshd
3929 root 20 0 182m 896 816 S 0.0 0.0 0:43.61 systemInfoSubAg
15986 root 20 0 66216 884 772 S 0.0 0.0 0:00.03 bash


and here is my free shows:





[root@ric ~]# free -m
total used free shared buffers cached
Mem: 3946 3846 100 0 0 48
-/+ buffers/cache: 3796 149
Swap: 4000 2037 1963


here is my iostat shows:





[root@ric ~]# iostat -x -d -m 2
Linux 2.6.37 (ric) 08/16/2011

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 93.24 222.57 95.44 64.40 4.10 1.12 66.96 1.37 25.46 2.78 44.44
sda1 0.00 0.00 0.00 0.00 0.00 0.00 40.80 0.00 4.00 3.10 0.00
sda2 0.00 0.00 0.00 0.00 0.00 0.00 22.35 0.00 22.52 14.80 0.00
sda4 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 33.00 33.00 0.00

sda5 92.73 7.49 53.39 45.79 0.57 0.21 16.08 0.72 34.67 3.19 31.67
sda6 0.50 215.08 42.06 18.61 3.53 0.91 150.14 0.65 55.27 6.36 38.58

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 596.02 139.30 248.26 153.73 3.38 1.14 23.02 147.54 482.67 2.49 99.90
sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda5 596.02 129.35 244.28 150.25 3.30 1.09 22.79 146.51 488.14 2.53 99.90 this is swap partition
sda6 0.00 9.95 3.98 3.48 0.08 0.05 35.20 1.03 193.60 75.20 56.12



some number got from /proc/meminfo



MemTotal:        4041160 kB
MemFree: 130288 kB
Buffers: 820 kB
Cached: 40940 kB
SwapCached: 82632 kB
SwapTotal: 4096536 kB

SwapFree: 2005408 kB


uname -a shows:
Linux ric 2.6.37 #4 SMP Fri Jan 14 10:23:46 CST 2011 x86_64 x86_64 x86_64 GNU/Linux



we can find that the swap fs is heavily used. And it consumes much IO resouce. but when we take the RSS colume in top into account, we find the sum of all processes RES is not too much.



so my question is: is this a kernel level leak? or there is something wrong with STServer-1 process? (STServer uses momery pool to cache file data that was swapped out due to no use for a few days).




any comment is welcome. thanks!



udpate 1, slabtop shows




Active / Total Objects (% used) : 487002 / 537888 (90.5%)
Active / Total Slabs (% used) : 39828 / 39873 (99.9%)
Active / Total Caches (% used) : 102 / 168 (60.7%)
Active / Total Size (% used) : 145605.37K / 154169.46K (94.4%)
Minimum / Average / Maximum Object : 0.02K / 0.29K / 4096.00K


OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
133920 133862 99% 0.02K 930 144 3720K avtab_node
98896 94881 95% 0.03K 883 112 3532K size-32
74052 73528 99% 1.00K 18513 4 74052K size-1024
72112 70917 98% 0.44K 9014 8 36056K skbuff_fclone_cache
...


update 2, add pmap -x 15168 (STServer-1) results





0000000000881000 45116 17872 17272 rw--- [ anon ]
00000000403a1000 4 0 0 ----- [ anon ]
00000000403a2000 8192 8 8 rw--- [ anon ]
...
00000000510aa000 4 0 0 ----- [ anon ]
00000000510ab000 8192 0 0 rw--- [ anon ]
... up to 32 8192


00007f8f2c000000 9832 4004 3964 rw--- [ anon ]
00007f8f2c99a000 55704 0 0 ----- [ anon ]
00007f8f34000000 11992 5068 5032 rw--- [ anon ]
00007f8f34bb6000 53544 0 0 ----- [ anon ]
00007f8f38000000 9768 4208 4164 rw--- [ anon ]
00007f8f3898a000 55768 0 0 ----- [ anon ]
00007f8f3c000000 13064 4080 4024 rw--- [ anon ]
00007f8f3ccc2000 52472 0 0 ----- [ anon ]
00007f8f40000000 11244 3700 3688 rw--- [ anon ]
00007f8f40afb000 54292 0 0 ----- [ anon ]

00007f8f44000000 11824 7884 7808 rw--- [ anon ]
00007f8f44b8c000 53712 0 0 ----- [ anon ]
00007f8f4c000000 19500 6848 6764 rw--- [ anon ]
00007f8f4d30b000 46036 0 0 ----- [ anon ]
00007f8f54000000 18344 6660 6576 rw--- [ anon ]
00007f8f551ea000 47192 0 0 ----- [ anon ]
00007f8f58774000 1434160 0 0 rw--- [ anon ] memory pool
00007f8fb0000000 64628 32532 30692 rw--- [ anon ]
00007f8fb7dfe000 1028 1016 1016 rw--- [ anon ]
00007f8fb8000000 131072 69512 65300 rw--- [ anon ]

00007f8fc0000000 65536 52952 50220 rw--- [ anon ]
00007f8fc40a8000 3328 1024 1024 rw--- [ anon ]
00007f8fc4aa5000 1028 1028 1028 rw--- [ anon ]
00007f8fc4d12000 1028 1020 1020 rw--- [ anon ]
00007f8fc4f15000 2640 988 936 rw--- [ anon ]
00007f8fc53b6000 2816 924 848 rw--- [ anon ]
00007f8fc5bf6000 102440 0 0 rw--- [ anon ]

total kB 3202160 348944 327480



it seems that the kernel swap the old memory (not used for a few days) to swap partition, but the private memory is not too much. if this program leaks memory, then where is it? in swap? in RSS?



update 3, kill STServer-1
I try to kill the STServer-1 process. the use free -m to check the physical memory. but there is still not too much left. about 400MB left. no cache, no buffer yet.
I write a small program to allocate memory, it can only request 400M in the physical memory, after that, swap will be heavily used again.



so should I say that there is a kernel memory leak?



update 4, it happened again!

here is the grep ^VmPea /proc/*/status | sort -n -k+2 | tail shows:




/proc/3841/status:VmPeak: 155176 kB
/proc/3166/status:VmPeak: 156408 kB
/proc/3821/status:VmPeak: 169172 kB
/proc/3794/status:VmPeak: 181380 kB
/proc/3168/status:VmPeak: 210880 kB
/proc/3504/status:VmPeak: 242268 kB
/proc/332/status:VmPeak: 254184 kB

/proc/5055/status:VmPeak: 258064 kB
/proc/3350/status:VmPeak: 336932 kB
/proc/28352/status:VmPeak: 2712956 kB


top shows:




Tasks: 225 total, 1 running, 224 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.9%us, 1.3%sy, 0.0%ni, 51.9%id, 43.6%wa, 0.0%hi, 1.3%si, 0.0%st

Mem: 4041160k total, 3951284k used, 89876k free, 1132k buffers
Swap: 4096536k total, 645624k used, 3450912k free, 382088k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28352 root 20 0 2585m 1.6g 2320 D 52.2 42.7 267:37.28 STServer-1
3821 snort 20 0 165m 8320 3476 S 10.2 0.2 1797:20 snort
21043 root 20 0 17160 7924 520 S 0.0 0.2 1:50.55 thttpd
2586 root 10 -10 4536 2488 1672 S 0.0 0.1 0:28.59 iscsid



iostat shows:




Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 72.50 0.00 351.00 2.50 12.25 0.01 71.02 174.22 213.93 2.83 100.20
sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda5 64.00 0.00 50.00 0.00 0.43 0.00 17.76 76.06 59.44 20.04 100.20 swap partition
sda6 8.50 0.00 301.00 2.50 11.81 0.01 79.79 98.16 239.39 3.30 100.20



any idea??

Comments

Popular posts from this blog

linux - iDRAC6 Virtual Media native library cannot be loaded

When attempting to mount Virtual Media on a iDRAC6 IP KVM session I get the following error: I'm using Ubuntu 9.04 and: $ javaws -version Java(TM) Web Start 1.6.0_16 $ uname -a Linux aud22419-linux 2.6.28-15-generic #51-Ubuntu SMP Mon Aug 31 13:39:06 UTC 2009 x86_64 GNU/Linux $ firefox -version Mozilla Firefox 3.0.14, Copyright (c) 1998 - 2009 mozilla.org On Windows + IE it (unsurprisingly) works. I've just gotten off the phone with the Dell tech support and I was told it is known to work on Linux + Firefox, albeit Ubuntu is not supported (by Dell, that is). Has anyone out there managed to mount virtual media in the same scenario?

hp proliant - Smart Array P822 with HBA Mode?

We get an HP DL360 G8 with an Smart Array P822 controller. On that controller will come a HP StorageWorks D2700 . Does anybody know, that it is possible to run the Smart Array P822 in HBA mode? I found only information about the P410i, who can run HBA. If this is not supported, what you think about the LSI 9207-8e controller? Will this fit good in that setup? The Hardware we get is used but all original from HP. The StorageWorks has 25 x 900 GB SAS 10K disks. Because the disks are not new I would like to use only 22 for raid6, and the rest for spare (I need to see if the disk count is optimal or not for zfs). It would be nice if I'm not stick to SAS in future. As OS I would like to install debian stretch with zfs 0.71 as file system and software raid. I have see that hp has an page for debian to. I would like to use hba mode because it is recommend, that zfs know at most as possible about the disk, and I'm independent from the raid controller. For us zfs have many benefits,

apache 2.2 - Server Potentially Compromised -- c99madshell

So, low and behold, a legacy site we've been hosting for a client had a version of FCKEditor that allowed someone to upload the dreaded c99madshell exploit onto our web host. I'm not a big security buff -- frankly I'm just a dev currently responsible for S/A duties due to a loss of personnel. Accordingly, I'd love any help you server-faulters could provide in assessing the damage from the exploit. To give you a bit of information: The file was uploaded into a directory within the webroot, "/_img/fck_uploads/File/". The Apache user and group are restricted such that they can't log in and don't have permissions outside of the directory from which we serve sites. All the files had 770 permissions (user rwx, group rwx, other none) -- something I wanted to fix but was told to hold off on as it wasn't "high priority" (hopefully this changes that). So it seems the hackers could've easily executed the script. Now I wasn't able