Working through a performance issue Linux Oracle

I have recently built a Red Hat Network Satellite system on Red Hat Enterprise Linux 6.1 x86_64 and Satellite 5.4.1 with the embedded database (Oracle DB - not sure which version). Also noteworthy is the fact that this Virtual Machine is running on VMware vSphere 4.1.0 using the LSI Logic Parallel SCSI controller. Anyhow - I have built this type of system numerous times previously and this particular one is running rather poorly. At the time I am writing this, I am still unsure what is causing the issues(s).
1.) processes are taking more time that I am accustomed to (I sound like a typical user now?)
2.) I/O wait on this host seems to be relatively high most of the time

Now - while I am attempting to troubleshoot, I am running a satellite-sync of one of my RHN channels. The software has all been downloaded, and at this point I believe it is simply being catalogued and inserted into the Satellite DB.

I have plenty of memory dedicated to this system, I also assigned 2 CPUs to this host - after noticing that there were 2 to 3 blocked processes, I had hoped another proc would help. It seems to have improved response times when the network is involved.

[root@rhnsat01 ~]# free -m
total used free shared buffers cached
Mem: 3833 3715 118 0 18 2029
-/+ buffers/cache: 1667 2166
Swap: 6015 29 5986

[root@rhnsat01 ~]# vmstat 2 3

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----

r b swpd free buff cache si so bi bo in cs us sy id wa st

1 0 30188 115300 18544 2072444 0 1 349 532 396 775 9 3 36 51 0

0 2 30188 112324 18732 2075204 0 0 1118 540 339 459 5 2 39 55 0

1 2 30188 107216 19180 2079660 0 0 2446 120 447 702 6 3 49 41 0

[root@rhnsat01 ~]# ps aux | grep \ D

oracle 1617 0.6 19.3 1283752 758144 ? Ds 16:19 1:39 ora_dbw0_rhnsat

oracle 1619 0.6 19.2 1282216 754064 ? Ds 16:19 1:38 ora_dbw1_rhnsat

root 2847 12.3 9.0 863964 353428 pts/0 D+ 16:38 31:16 /usr/bin/python /usr/bin/satellite-sync --channel rhel-x86_64-server-5 --email --traceback-mail=blah_blah@blah.com

root 6364 0.0 0.0 103232 832 pts/1 S+ 20:51 0:00 grep D

Now things get a little ridiculous (and a bit over my head at this point, but I thought I'd start to look anyhow...) I'm going to step into the Database process (this one happens to be a DB writer command) and see if something looks amiss.

[root@rhnsat01 ~]# strace -p 1617

Process 1617 attached - interrupt to quit

times(NULL) = 431066880

semtimedop(425986, {{9, -1, 0}}, 1, {0, 860000000}) = -1 EAGAIN (Resource temporarily unavailable)

getrusage(RUSAGE_SELF, {ru_utime={12, 757060}, ru_stime={87, 712665}, ...}) = 0

times(NULL) = 431066969

times(NULL) = 431066970

semtimedop(425986, {{9, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)

times(NULL) = 431067270

pwrite(26, "\6\242\0\0\350\r\200\1\252\333O\0\0\0\1\6{\314\0\0\2\0\0\0,)\0\0\251\333O\0"..., 8192, 29163520) = 8192

So - I notice the 8192 and I assume that indicates a block-size, which makes me wonder about a few things.

Leaving a few details out of this effort, I happen to know that the database has it's own volume.

[root@rhnsat01 ~]# df -k

Filesystem 1K-blocks Used Available Use% Mounted on

/dev/mapper/vg_rhnsat01-lv_root

30584700 4166684 24864404 15% /

tmpfs 1962640 0 1962640 0% /dev/shm

/dev/sda1 495844 88764 381480 19% /boot

/dev/mapper/vg_satellite-lv_varsatellite

61927420 38073216 20708476 65% /var/satellite

/dev/mapper/vg_satellite-lv_varcacherhn

10321208 1166960 8629960 12% /var/cache/rhn

/dev/mapper/vg_satellite-lv_rhnsat

16513960 7200656 8474444 46% /rhnsat

[root@rhnsat01 ~]# stat /dev/mapper/vg_satellite-lv_rhnsat

File: `/dev/mapper/vg_satellite-lv_rhnsat' -> `../dm-2'

Size: 7 Blocks: 0 IO Block: 4096 symbolic link

Device: 5h/5d Inode: 10272 Links: 1

Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root)

Access: 2011-12-04 16:18:46.153088686 -0600

Modify: 2011-12-04 16:18:45.960093540 -0600

Change: 2011-12-04 16:18:45.960093540 -0600

Hmm... I wonder if this means that there are 2 writes for each 8k block, rather than 1. As much as this bothers me, it doesn't matter as mkfs will only accept 1 of 3 values (1024, 2048 and 4096).

My database resides on an LVM volume, with default options. As well as default options for mkfs and mount options. Time to investigate these things...

I determine that I am running

SQL> SQL> Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

In my research I discovered the following:

http://blog.smartlogicsolutions.com/2009/06/04/mount-options-to-improve-ext4-file-system-performance/

I determined that my ext4 filesystem already has a journal, so I'll add writeback option which he recommends.

[root@rhnsat01 tmp]# dumpe2fs /dev/mapper/vg_satellite-lv_rhnsat | head -50 > /var/tmp/dumpe2fs.20111204.1

[root@rhnsat01 tmp]# tune2fs -o journal_data_writeback /dev/mapper/vg_satellite-lv_rhnsat

[root@rhnsat01 tmp]# dumpe2fs /dev/mapper/vg_satellite-lv_rhnsat | head -50 > /var/tmp/dumpe2fs.20111204.2

[root@rhnsat01 tmp]# sdiff dumpe2fs.20111204.1 dumpe2fs.20111204.2

dumpe2fs 1.41.12 (17-May-2010) <

Filesystem volume name: lv_rhnsat Filesystem volume name: lv_rhnsat

Last mounted on: /rhnsat Last mounted on: /rhnsat

Filesystem UUID: 48d7816e-0276-4e76-ac51-432d8fe827a Filesystem UUID: 48d7816e-0276-4e76-ac51-432d8fe827a

Filesystem magic number: 0xEF53 Filesystem magic number: 0xEF53

Filesystem revision #: 1 (dynamic) Filesystem revision #: 1 (dynamic)

Filesystem features: has_journal ext_attr resize_inode d Filesystem features: has_journal ext_attr resize_inode d

Filesystem flags: signed_directory_hash Filesystem flags: signed_directory_hash

Default mount options: (none) | Default mount options: journal_data_writeback

However, the mount options seem reasonably compelling (and those options are actually the reason I found his site in the first place). Unfortunately my database is in use and I can't simply remount the filesystem at this time.

I am really hoping this will fix this. Otherwise - I may have look at separating the different parts of the database (redo, db, index - at least I can assume that this database has all of those components, like a typical Oracle Database).

On epart, in particular, that is really throwing me is why is there very little I/O, but the system appears to be waiting on I/O.

[root@rhnsat01 tmp]# iostat 2 3
Linux 2.6.32-131.21.1.el6.x86_64 (rhnsat01.ncell.com) 12/04/2011 _x86_64_ (2 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
7.71 0.00 3.06 55.02 0.00 34.22

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 75.67 1028.61 1718.27 23746366 39667848
sda 2.24 79.21 19.87 1828622 458690
dm-0 3.55 78.05 16.17 1801938 373184
dm-1 0.49 0.19 3.70 4432 85488
dm-2 261.66 170.33 1671.87 3932154 38596584
dm-3 7.13 51.21 44.06 1182154 1017272
dm-4 8.06 807.03 2.34 18630994 54000

avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 0.25 72.32 0.00 26.93

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 12.00 0.00 112.00 0 224
sda 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
dm-1 0.00 0.00 0.00 0 0
dm-2 23.00 0.00 112.00 0 224
dm-3 0.00 0.00 0.00 0 0
dm-4 0.00 0.00 0.00 0 0

avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 0.50 63.41 0.00 35.59

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 18.50 0.00 216.00 0 432
sda 0.00 0.00 0.00 0 0
dm-0 0.00 0.00 0.00 0 0
dm-1 0.00 0.00 0.00 0 0
dm-2 41.50 0.00 224.00 0 448
dm-3 0.00 0.00 0.00 0 0
dm-4 0.00 0.00 0.00 0 0

You can reverse the journal update with the following:

# Delete has_journal option
tune2fs -O ^has_journal /dev/sda10
# Required fsck
e2fsck -f /dev/sda10
# Check fs options
dumpe2fs /dev/sda10 |m

I wonder if adding another processor to the VM would possibly help?

-bash-4.1$ ps aux | grep \ D
oracle 1617 0.5 19.9 1283752 784684 ? Ds 16:19 2:17 ora_dbw0_rhnsat
oracle 1619 0.5 19.9 1282216 781256 ? Ds 16:19 2:16 ora_dbw1_rhnsat
oracle 1621 0.8 0.7 1295304 29636 ? Ds 16:19 3:37 ora_lgwr_rhnsat

#UPDATE
The change to the filesystem journal seems to have helped. I still experience some I/O wait, however the CPU utilization is rather tremendous, as are the I/O values (in comparison to how they were, of course).

[jradtke@rhnsat01 ~]$ iostat 2 3
Linux 2.6.32-131.21.1.el6.x86_64 (rhnsat01.ncell.com) 12/05/2011 _x86_64_(2 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
5.06 0.00 1.09 11.78 0.00 82.07

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 2.55 131.65 12.65 691254 66394
sdb 20.00 337.59 197.35 1772542 1036224
dm-0 3.53 126.94 12.64 666522 66376
dm-1 0.06 0.50 0.00 2616 0
dm-2 16.71 216.64 101.71 1137474 534024
dm-3 12.57 14.43 95.64 75778 502176
dm-4 11.37 106.44 0.00 558850 24

avg-cpu: %user %nice %system %iowait %steal %idle
3.52 0.00 2.76 45.48 0.00 48.24

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 132.50 33188.00 72.00 66376 144
dm-0 0.00 0.00 0.00 0 0
dm-1 0.00 0.00 0.00 0 0
dm-2 11.00 0.00 88.00 0 176
dm-3 0.00 0.00 0.00 0 0
dm-4 131.00 33188.00 0.00 66376 0

avg-cpu: %user %nice %system %iowait %steal %idle
7.91 0.00 2.04 54.85 0.00 35.20

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.00 0.00 0.00 0 0
sdb 131.50 29004.00 292.00 58008 584
dm-0 0.00 0.00 0.00 0 0
dm-1 0.00 0.00 0.00 0 0
dm-2 24.00 0.00 192.00 0 384
dm-3 1.50 0.00 12.00 0 24
dm-4 136.50 29004.00 72.00 58008 144

SDB is the device that hosts my entire RHN Satellite environment (packages, cache and database).

[root@rhnsat01 ~]# pvdisplay -m /dev/sdb
--- Physical volume ---
PV Name /dev/sdb
VG Name vg_satellite
PV Size 100.00 GiB / not usable 4.00 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 25599
Free PE 3583
Allocated PE 22016
PV UUID Q9qv0z-12gu-5aNN-Ae8L-Ua5T-oL1e-s0uKgH

--- Physical Segments ---
Physical extent 0 to 4095:
Logical volume /dev/vg_satellite/lv_rhnsat
Logical extents 0 to 4095
Physical extent 4096 to 6655:
Logical volume /dev/vg_satellite/lv_varcacherhn
Logical extents 0 to 2559
Physical extent 6656 to 22015:
Logical volume /dev/vg_satellite/lv_varsatellite
Logical extents 0 to 15359
Physical extent 22016 to 25598:
FREE

P2V using dd for KVM-QEMU guest

Preface: I have certainly not exhaustively tested this process. I had a specific need and found a specific solution that worked. Situation: I was issued a shiny new laptop running Red Hat Enterprise Linux 7 (with Corp VPN, certs, Authentication configuration, etc...) The image was great, but I needed more flexibility on my bare metal. So, my goal was to P2V the corporate image so I could just run it as a VM. * Remove corporate drive and install new SSD * install corp drive in external USB-3 case * Install RHEL 7 on new SSD * dd old drive to a disk-image file in a temp location which will be an image which is the same size as your actual drive (unless you have enough space in your destination to contain a temp and converted image) * convert the raw disk-image to a qcow file while pushing it to the final location - this step should reduce the disk size - however, I believe it will only reduce/collapse zero-byte blocks (not just free space - i.e. if you de...

Unix Revolution

Search This Blog

Working through a performance issue Linux Oracle

Labels

Comments

Post a Comment

Popular posts from this blog

P2V using dd for KVM-QEMU guest

Sun USS 7100 foo

"Error getting authority: Error initializing authority: Could not connect: No such file or directory (g-io-error-quark, 1)"