Christian Bilien's Oracle performance and tuning blog

An integrated view of Oracle, Unix and storage arrays performance and tuning

I’ll be a presenter at Oracle Open World 2010

I submitted an abstract to the Oracle OpenWorld 2010 that was found to be worthwhile enough to be accepted. Here are the session details:
Speaker(s) Christian BILIEN, BNP PARIBAS Corporate Investment Banking, Head of the DBAs
Monday, September 20, 2:00PM | Moscone South, Rm 236 60 min.
Session ID: S314649
Title: Large-Scale Oracle RAC (and Sun Oracle Database Machine) Scalability Design
Abstract: In this technical session, hear how a large corporate investment bank tames a yearly 50 percent to 100 percent data growth rate with Oracle Real Application Clusters (Oracle RAC) and Sun Oracle Database Machines. With databases totaling hundreds of terabytes, an integrated design of databases, storage, and servers is the cornerstone of scalability. This presentation highlights storage design from an Oracle RAC perspective as well as the specific challenges of Oracle RAC data warehouses and Oracle RAC geographical clusters. Real-life measurements on Oracle Solaris, Linux, EMC² storage, and RDS over InfiniBand (outside and inside the Sun Oracle Database Machine) illustrate performance and scalability perspectives and bottlenecks.

This blog is inactive (at least for the time being)

This page is an introduction to my blog, which is dedicated to Oracle data base performance management. I found over the years sharing knowledge to be an invaluable tool in assisting me in my various consulting assignments. Spending most of my time in capacity planning, tuning, application sizing and performance management, I thought it might be of interest to broaden my knowledge by creating new opportunities to exchange on those ever-changing topics.

A quick word on this blog title (an integrated view of Oracle, Unix and storage arrays performance and tuning) : many Oracle blogs, books and web sites are around, some are extremely good. I choose to focus on what seems to me a seldom-explored topic : correlating Oracle, Unix and storage performance and tuning knowledge to better understand the performance ins and outs of large, high end Oracle data bases.

I will also strive to avoid English incorrectnesses, but I have to ask for clemency in advance.

I also found that some comments end up as spams. I try to screen the spams before they get deleted, but if you do not get an answer to a post, it might just be that I did not see it.

Comments (67)

67 Comments

Christian, I recently read you blogs on array performance and was hoping you could help me out or point me in the right direction. I have been asked to take over a performance management group; however, I have zero experience in that area. My company has 3 data centers using several EMC CX700’s and DMX’s. I use Navisphere and Control Center as my primary tool to monitor performance. My issue is this: I have been unable to get EMC to give me a baseline on performance measurements. I am trying to establish a basic green, yellow, red dashboard and need to understand at what point should I be concerned with response time for example (i.e. 10ms response time is yellow).

I saw the books you recommended and I will be looking at them today. I was actually hoping you knew of a white paper that could provide basic acceptable metrics on disk performance to help me out immediately.

Also, I read your “Join the BAARF party (or not)” and still am not clear on your opinion. Are you a proponent of RAID 5 over RAID 10? My DB guys scream for RAID 10 every day and my SA’s are all about RAID 5. Does the performance of RAID 10 justify the cost?

Any input you can give is greatly appreciated.

Dan Howard

Comment by Dan Howard — March 28, 2007 @ 6:56 pm
Hi,
– Array monitoring: I wrote some ideas on
https://christianbilien.wordpress.com/2007/03/23/storage-array-bottlenecks. Please let me know if this answers your question
– RAID5 vs RAID10: I am not a pro or con RAID5 (sorry!). RAID5 is certainly acceptable when disk configurations are designed to be fast enough to handle RAID5 drawbacks (small write penalty and possibly array processor time) without hindering OS response times, so it is really a mater of throughput and number of writes to the disk and what is acceptable to them that will ultimately lead the choice. Without more information, a rule of thumb could be to keep reads+writes below 100 I/O/s per disk. In case you are really in the dark because you do not know anything from the apps, and as far as oracle is concerned, write hot spots are often log files (for which I/O are synchronous seen from the user transactions, as opposed to data buffers which should be asynchronous), archive logs, and often the temp files. You could limit the risk by putting these files on RAID10 but another trap to avoid is to put all the hot spots on the same RAID10 disks! Exports and backups are also write intensive, but their time to complete might not be as critical as user transactions.

Sorry to be a bit evasive, but no black and white answer exists to this question.

Christian

Comment by christianbilien — March 28, 2007 @ 7:14 pm
#

Hi,
Do you think Sun Cluster 3.2 for RAC has a future? I know for Oracle 9i RAC, it’s popular to configure Oracle 9i RAC with Sun Cluster, but with Oracle 10gR2 RAC, Oracle has its own clusterware. So I am confused, do we really need Oracle 10gR2 RAC (Oracle 10g Clusterware) and Sun Cluster 3.2 installed together? Or just like someone said, just use Oracle 10g Clusterware to keep it simple and easy to manage, and decrease cost. What you recommend? Could you provide me some real world examples of Sun Cluster 3.2 with 10gR2 RAC?

I know Sun Cluster 3.2 provides cluster services to those non-Oracle applications, but in case of Oracle RAC, is the Sun Cluster necessary? Thanks, I look forward to hearing from you.

#

Comment by Amos — May 8, 2007 @ 7:46 am
Hi Amos,

Thank you for your comment.If you are only interested by RAC and have no other clustering need :

Geo Cluster (at least 2 nodes AND 2 storage arrays):

I believe that at the end of the day it all boils down on whether you are comfortable with ASM or not, and especially when failure groups have to be reconstructed. ASM is not much intuitive for the beginner, it is also not so widespread so expertise may be lacking. If it is felt preferable to avoid ASM, I favor the Veritas Cluster file system. The main feature SCS and VCS would bring would be fencing/membership, but I have several years of RAC/Solaris experience without a 3rd party cluster, several failures and no data corruption.

Local cluster: As raw devices can be used and mirroring is usually offloaded to the storage array, ASM can be avoided.

Christian

Comment by christianbilien — May 8, 2007 @ 7:46 am
I’ve never believed in usefulness of load-balancing MPIO.

>However, reads can use only one path, but writes are duplicated, i.e. a host write ends up as one write on each host port.

(from the linked post):

>Because write I/Os always involves both controller for cache mirroring, write performance will be the same regardless of which controller receives the I/O.

Nicely said.
I usually use these two arguments to persuade customers *not* to use MPIO for load balancing. Sadly, many customers believe it’s a very useful feature.

Comment by Sean — May 8, 2007 @ 7:54 am
Hi Sean,

Thank you for your comment.

Christian

Comment by

Comment by christianbilien — May 8, 2007 @ 7:55 am
Hi Christian, thank you very much for your quick reply. So from your own experience managing Oracle RAC, Oracle 10g Clusterware is good enough, no need for 3rd party clusterware? Do you see any projects use Sun Cluster or Veritas Cluster File system with Oracle Clusterware?
I am working as Oracle DBA, I agree with you that we should take a intergrated/holistic view of Oracle & Unix (OS) & Storage, it’ll be very beneficial for DBAs.

Comment by Amos — May 8, 2007 @ 3:29 pm
Hi Amos,

Yes I have experiences of sites where the RAC clusterware was used on top of Sun Cluster and McServiceGuard. The major drawback is that you are actually running 2 clusterwares, and it is not surprising that the 2 gets mixed up and sometimes conflict. It is even worse on 2 nodes geo clusters (an often found configuration) than on local clusters because diverging node eviction strategy is magnified when you have 2 storage arrays. Legacy seems to be one of the reasons for using a third party cluster (it worked on 9i, so why change if you are going to 10g ?).

I also used the Veritas CFS with RAC without a 3rd party cluster. The GAB interconnect is the only add-on, and the only significant action I could see was to redefine the master volumes when the master node has gone down (using vxclustadm), which I believe does not have any influence whatsoever on the CRS.

Regards

Christian

Comment by christianbilien — May 8, 2007 @ 4:34 pm
Hi Christian,

Thank you very much for your quick reply and comments.

Amos

Comment by Amos — May 9, 2007 @ 1:02 pm
Thanks for your article! I am working on a rather large sort, and I found the information really helpful – a real lifesaver! I have a question for you: we currently have PGA_AGGREGATE_TARGET set to 4.2G and _pga_max_size 1G _smm_max_size 500M. I’m sure the sort I’m running cannot be performed in memory at those levels, but instead of seeing disk sorts, I’m getting 4030 errors:
-4030 ORA-04030: out of process memory when trying to allocate
8192 bytes (sort subheap,sort key)
Any idea why this might be happening? We’re running Oracle 9.2.0
Thanks very much again for the great info!
-Steve

Comment by Steve K — May 31, 2007 @ 7:54 pm
Hi Steve,

Sorry for not replying until now: I found your comment in the spam area while routinely checking for spams to delete. I am not sure why wordpress classified it that way.

As I’m sure you guessed, you are running out of memory. Did you solve your problem ?

If no: which Unix platform/version/oracle version are you running on ? What are your RAM and sga size ? What is the ulimit ?

Christian

Comment by christianbilien — May 31, 2007 @ 7:57 pm
Christian,

I found your blog via Jonathan Lewis’ and it looks very interesting indeed. However, when I go to https://christianbilien.wordpress.com it takes me to your posting on Feb 7th, rather than you latest posting, which seems a little strange.

Regards,

Doug

Comment by Doug Burns — July 16, 2007 @ 9:04 pm
Doug,

Thank you for your kind comments. I also like your blog.

As for the opening page, I cannot find a way to remove the post date. I checked other wordpress blogs, including Jonathan’s, they all show the creation date on the first page. I’ll have to live with it I guess.

Christian

Comment by christianbilien — July 16, 2007 @ 9:20 pm
Christian,
Thanks for the reply. Still having memory issues. We’re running Oracle 9.2.6.0 on HP/UX 11.0. I believe the RAM on the box is about 12G. I get the 4030 erros with the following: PGA_AGGREGATE_TARGET = 8.2G, _pga_max_size = 4.2 G and _smm_max_size = 2.1G (apparently the maximum for Oracle 9.2).

When the settings are as follows, I do not get the errors: PGA_AGGREGATE_TARGET = 4.2G, _pga_max_size = 1G, _smm_max_size=500M. But I do get alot of disk sorting with these settings that results in unacceptable performance.

I’m not sure what the ulimit is (probably the default), but this is the only query running on the instance when I get the 4030s.

Any ideas/help would be greatly appreciated.

Thanks again for a great article!
Steve

Comment by Steve K — July 29, 2007 @ 8:28 am
Steve,

1. What are your SGA parameters ? It is likely that sga + allocated sort areas > memory. sga gets allocated upfront, but not the pga.
2. Check swap area I/Os: use vmstat or Glance
3. Check memory size : dmesg| grep Physical

Christian

Comment by christianbilien — July 29, 2007 @ 8:31 am
Hi Christian

Thanks for the link!

Very interesting post. Oracle have been very much peripheral to the ITIL world, but now I think the Kraken wakes.

BTW, just to be pedantic: APMG govern ITIL certification, as an outsourced service provider to the British Government (OGC). Likewise publishing is outsourced (an long-standing relationship) to TSO. TSO has recently been privatised. And under some relationship that nobody quite understands, itSMF is the marketing arm of OGC for ITIL.

But far as I know ITIL is still owned and I guess “maintained” by OGC.

Comment by The IT Skeptic — August 4, 2007 @ 9:24 am
Hi,

Thanks for the clarification regarding the APMG/OGC roles.

Christian

Comment by christianbilien — August 4, 2007 @ 9:25 am
If you get chance to blog a review of “Forecasting Oracle Performance” it would be appreciated. It went onto my amazon wishlist yesterday, which probably means I will read in in about 12 months time ( big backlog of books to catch up on ).

Regards,

Mathew Butler

Comment by Mathew Butler — August 13, 2007 @ 4:59 pm
Mathew,

I’ll modestly try to give a feedback (to be honest, I could not resist to go through it this afternoon, but I need to really read it before commenting).

Cheers

Christian

Comment by christianbilien — August 13, 2007 @ 5:00 pm
You said:

“The checkpoints I/Os are doubly asynchronous, because the I/Os are also asynchronous at the DBWR level.”

Are you saying that DBWR performs delayed(cached) write operations or I misunderstood your “doubly asynchronous” ? If you are, then it is incorrect: DBWR performs synchronous writes that are not complete until the data are on the disk(or on a battery backed up RAID). Of course, those synchronous writes can be initiated asynchronously !

Comment by Val Carey — August 29, 2007 @ 7:15 pm
This why I said “The word asynchronous could be a source of confusion”. Let’s put it this way: asynchronous from the programs (it should be at least), and if used async I/O (use libaio calls = kernelized aio calls on most platforms). Each I/O is synchronous (direct or not).

Comment by christianbilien — August 29, 2007 @ 7:16 pm
Recently Joze Senegacnik was in İstanbul for a 2 Day Seminar on Optimizing for Performance. We also discussed on this topic, in my opinion tkprof is good for basic analysis but lacks of;
* sql statemet hierarchy
* exclusive timings for recursive sql statements
* report for SQLs which were not parsed
* report for cursor #0

Two questions I didn’t have time to investigate yet are;
* also trcsess after 10g may introduced some new pitfalls during merging,
* what about 11g, is there any progress with tkprof pitfalls?

I hope Oracle will invest more in tkprof since 10046 responce time based analysis is very important with root cause analysis of application performance tuning. We may not need anymore the support of trace analysers like itrprof in the future.

ps: my seminar notes.

Comment by H.Tonguç Yılmaz — August 29, 2007 @ 7:25 pm
Tonguç,

Thanks very much for the notes. I mainly use trcsess because of the html output. However it really takes time to compile the reports when the trace is big. I had not realized the recursive stuff I wrote about until I nearly fell into trap.

Christian

Comment by christianbilien — August 29, 2007 @ 7:26 pm
Hi Christian-

I found your blog entry re: geographic RAC clusters very useful, as it’s something that we’re looking at currently.

I am reviewing options right now, but it seems like the Symantec solution is the only one that works. My environment will consist of two sites about 3000 miles apart, and I’d like to be able to implement a failure aware solution for Oracle & Siebel. A cluster would be the ideal environment, but I’m worried about the distance causing heartbeat communication issues.

What are your thoughts on Symantec HA solution? Have you worked with any other solutions that would offer high availability for an Oracle environment across such long distances?

Comment by Khoa — October 10, 2007 @ 3:09 pm
Khoa,

Frankly I would be extremely cautious before considering setting up an infrastructure where a heartbeat is used on such a distance.

However what you are describing may actually be a continental cluster (2 local clusters, manual or semi-automatic switchover) between the 2 sites. I know Symantec/VCS and HP/McServiceGuard have such an option, but I never used them nor do I know anyone using either of them.

Another option would be a data guard without the broker: it is a cheap replication but you may potentially loose some data (because you would use asynch replication). Manual switchover intervention would also be required.

Christian

Comment by christianbilien — October 10, 2007 @ 3:15 pm
Hello,
thanks for the nice article!

How do you get the demands D_cpu and D_io for every transaction running in a Oracle DB? Statspack is the source?

Thanks

Comment by christianbilien — November 12, 2007 @ 7:25 pm
Hi,

The easiest is to use v$sesstat/v$sysstat to get user commits, CPU used when call started and CPU used by this session, physical reads and physical write. physical write is the trickiest because of the delayed writes that also need to be accounted of. Very often this is not a problem as the database has a large read to write ratio.

Cheers

Christian

Comment by christianbilien — November 12, 2007 @ 7:26 pm
When we first hit mutex hangs in 10.2.0.1, we turned them off and never looked back. Perhaps when 10.2.0.4 is finally unleashed upon us, we will take a look at it again, but trying to reproduce it is always dicey.

Thanks for such a clear, detailed write-up.

Comment by Charles Schultz — November 14, 2007 @ 8:22 pm
As a matter of curiosity, do you remember the circumstances in which your instance hanged ?

Thanks

Christian

Comment by christianbilien — November 14, 2007 @ 8:25 pm
#

Our SR (5374131.993) is over a year old, hence I am not able to retrieve it easily via metalink. I will ask our Sale/Tech Consultant for details. I fear that we did not capture quite the detail that you did merely because we were flying blind and clueless at the time. =) Whatever information I can find, I will pass along.

Comment by Charles Schultz — November 17, 2007 @ 5:04 pm
Thanks

Christian

Comment by christianbilien — November 17, 2007 @ 5:06 pm
Apparently, we hit bug 5485914. We were executing DBMS_MONITOR.SESSION_TRACE_ENABLE and were unable to disable the trace and locked up the session. I can email you our SR text if you are curious, but you may get more mileage out of researching that bug (and other related bugs). To my surprise, we left the mutexes on (”_kks_use_mutex_pin” = true) and never applied the bug fix. Interesting….

We had another case where the whole instance would inexplicably hang (waiting for the infamous “cursor: pin S wait on X”); since we were never able to reproduce it for Oracle, I do not have any further information on that particular case.

Comment by Charles Schultz — November 17, 2007 @ 5:06 pm
Sorry, I am stupid. Your whole post was about that very bug. My apologies.

Comment by Charles Schultz — November 17, 2007 @ 5:07 pm
Thanks very much for the info, it was helpful to me: it confirms that it had to do with explain plans and/or tracing. I could also reproduce the problem by using set autotrace traceonly in sqlplus.

Christian

Comment by christianbilien — November 17, 2007 @ 5:08 pm
Christian,
this bug has been around since auto-management of the SGA came out, or more precisely, since 9i. It’s apparently only fixed in 11 or 10.2.0.4.

I find it astonishing that something like this has been without a fix for so long and folks don’t even complain about it. In fact, most are not even aware it exists!

To me, it shows that the vast majority of databases out there are running a single schema or very close to it, or are in much earlier versions of the software. Either of these is not good news…

Comment by Noons — December 6, 2007 @ 9:10 pm
Noons,

I must say I could not believe it either when I found out about this problem. Having said that, this application has been happily running before and after. The bug only showed up when the statistic calculations went thrashing the library cache.

Thanks for your comment.

Christian

Comment by christianbilien — December 6, 2007 @ 9:11 pm
Just dropping in to thank you for mentioning this. One of my customers have been suffering from this problem for a couple of weeks and until today all focus has been on hunting for bugs in the application. I notice the patch is listed in 10.2.0.3 and that it is not marked as a “particularily notable bug” even though it can result in very serious side-effects. This particular application is used by different organizations with their own schemas in the same database and we have experienced both SELECTs and INSERTs being performed in the wrong schema – especially in queries coming from Tuxedo. We just hope that all bad updates have been caught by constraints, but unfortunately we can’t be sure…

Comment by Johan — December 6, 2007 @ 9:11 pm
Thanks Johan. Did you also had library cache invalidations ?

Comment by christianbilien — December 6, 2007 @ 9:12 pm
Could you explain where the “800 reads/s” number came from ? Also, I do not understand the three tables, in particular the “LUN I/O rate” column meaning and the derivation of the cell numbers.

Thanks

Comment by Val Carey — December 6, 2007 @ 9:12 pm
Val,

1. I had dropped one line: I/O rate as seen from the server 1000I/O/s (80% reads)(from glance/ux). Thanks for pointing it out, I added it in the post.
2. The lun I/O rate is the incoming rate the DB would emit if there was no queuing at the disk array. It is 1000I/O/s now but it would increase if the raid was able to deliver at a faster rate. The cells are individual disk I/O rate. In the fist table for example, a RAID 5 12 columns would deliver 139 I/O/s. This computation is made considering no stripe agreggation, hence 1 write = 2 reads + 2 writes evenly split between 2 disks.

Christian

Comment by christianbilien — December 6, 2007 @ 9:13 pm
OK, thank you.

I am still uncomfortable with some figures, so if you could spend a bit of time explaining further I’d be grateful.

Let’s take the LUN IO request arrival rate to be 1000 per sec and consider RAID10 with 10 mirrored disks for simplicity. How did you compute the 61 figure as the individual disk IO request rate ? Why not 10 (one request per each disk) or 100 (a request for the whole stripe), or 50 (absent any info about individual request length distribution) ?

Thanks.

Comment by Val Carey — December 6, 2007 @ 9:13 pm
For Raid 10 (on 20 disks):

– Assuming we have 1000 I/O/s at the OS level, we would have 100*80% = 80 reads/s for each pair of mirrored disk, so each disk would actually only serve 40 reads/s (reads alternates on the mirrors).
– Our rate write is 20/s for each pair of mirrored disk, which translates into 20/s per disk.

That’s 40+20=60 I/O/s per disk.

My spreadsheet was actually based on 22% writes, which I rounded in the post into 20%, hence a slight variation.

Christian

Comment by christianbilien — December 6, 2007 @ 9:13 pm
Ah, ok, that makes sense assuming uniform distribution of io requests amongst disks of the size not more than one stripe block(stripe size).

My comment above should actually have been : “Why not 100 (one request per each disk) or 1000 (a request for the whole stripe), or 500 (absent any info about individual request length distribution) ?”

With your assumptions, 100 translates into 60.

Thanks.

Comment by Val Carey — December 6, 2007 @ 9:14 pm
Dear Cristian,

I am an Oracle DBA, but I have not a very countable experience on Oracle RAC. In our office, one guy has installed Oracle10g Rel 10.2.0.2.0 RAC on 2 nodes, linux-CENTOS 64 bit. Since the installation some times one of the node some time not available. One of our DBA has experienced the same problem on SUN env. and he as applied oracle patch 10.2.0.3.0. Since then the problem has not been occured. Yesterday I have downloaded and applied the patch on our RAC linux env. But till now one of the instance never been up and running. I have tried to start the instance from the node manually, but system given end-of-communication channel error. I would like to solve the problem, but could not understand how to troubleshout the problem. I am looking your help.

Thanks
Asit

Comment by Asit Roychoudhury — February 8, 2008 @ 1:11 pm
Hi,
I have been working on an Oracle DBA/Sysadmin Dashboard. What do you think are the key ingredients of a DBA dashboard?

Your opinion is highly regarded
Regards
Nilesh
Dashboards

Comment by Dashboards — February 13, 2008 @ 1:42 am
Hi Nilesh,

Some generalities for a performance dashboard:
– I believe than for a DBA/sysadmin the best dashboard are the reponse times and key batch jobs durations.
– Indicators can be I/O rate (thoughput vs capacity), vmstat scan rate, cpu busy, etc. but this must be put into a context: I am often involved for example in environments where requirements are very high, such as thoughputs to stock markets, or reactivity to an event where the scales are in the 100th of microseconds, which means you cannot wait for a timeslice.

It’s a bit vague but I do think there is a one size fits all.

Cheers

Christian

Comment by christianbilien — February 13, 2008 @ 12:13 pm
Christian,

The “last postings” link only shows the introduction page, not the postings.
Is this normal?

regards

Freek

Comment by Freek — March 12, 2008 @ 7:50 am
Hi Freek,

No it isn’t. This is weird as it has been working for several months. I removed the link

Cheers

Christian

Comment by christianbilien — March 12, 2008 @ 8:23 am
Nice post.

Thanks

Comment by moonstudio — April 21, 2008 @ 10:18 am
Dear Christian,
I red you blod about ASM. Very Interesting.
I’m Fabio from Napoli, ITALY

I need you comment on the following:
I have to install on two servers Win 2003 Oracel 10 rac with two SAN (emc cx 310c with mirrorview).
Now

Wath you suggest

1) Use ASM in order to replicate on the two SAN (SAN used only for INFRA)
2) Use MirrorView (without ASM)

Waiting for your feedback

Ciao
Fabio

Comment by Fabio from Napoli — June 6, 2008 @ 3:36 pm
Hi Fabio,

I’d suggest you take into account the following things:
– If 10g then beware the ASM disk group mirroring rebuild after a shutdown of one of the array. The rebuild can only be performed one DG at a time. If you do not have too much space to mirror then this might be ok though.
– The SP have to manager Mirrorview. That may be a good point for ASM if you have write intensive applications, especially on RAID5
– Operations may be simpler with Mirrorview
– For heavy writes, MV can bottleneck before ASM does. This is because all synchronous write have to go though the replication ports.
– Bandwidth: if you need a lot of bandwidth you may prefer to take advantage of the 2 arrays: service processors & cache can be shared, reads will be done alternatively on each array.

HTH

Christian

Comment by christianbilien — June 6, 2008 @ 3:50 pm
Any thoughts on optimizing large work unit commits on replicated arrays?

This relates, sort-of, to your log file sync article. I have a single instance (HP DL585, RHES4, Oracle 10.2.0.2, EMC DMX SRDF/S storage – evrything lightly loaded) which is performing well, except where it processes file ETL. I would like to optimize the instance solely to support very large work units, but SRDF/S is getting in the way.

The ETL is basically single threaded, based on file receipt. Each work unit probably generates ~2mb of redo – and the commit times are very long. The redo is SRDF/Synchronous (to a DR site 100km away – ping time 2ms on dedicated fibre). `strace(1)` of log writer shows writes of 512 bytes – so the slow commit seems caused by breaking a large log write to _sequential sync_ writes, that are slow as they are replicated from primary to secondary DMX frame.

I figure the tuning strategy is to increase the redo write block size. The default filesystem here is EXT3 (with async + direct IO). I’ve tried fiddling with blocksize there and using VxFs filesystems, but cannot seem to affect performance. Do you have any thoughts on either the strategy of trying to change the log writes to fewer larger blocks or strategies to implement this?

Comment by Aaron — June 16, 2008 @ 6:43 pm
Hi Aaron,

Sorry, I’m (very) late to answer. I would first make sure to lower as much as possible the number of I/Os outside a COMMIT WORK. Some ideas here: https://christianbilien.wordpress.com/2007/03/28/log-buffer-issues-in-10g-22/

I am fairly ignorant about EXT3, but I would think the 512 bytes VxFS redo write size is because the LGWR emits 512 bytes writes. I suspect you are committing very frequently, fast enough to generate only very small log writes (or it may also be that the Linux LGWR always emits 512 bytes writes – I do not have any Linux to test but it would not be true of Solaris or HP-UX). Assuming you are committing very frequently (say 1 insert then commit work), there’s not much you can do at the OS level. I can see two ways to speed up things: the most recent EMC Engenuity version packages the control and the data in the same frame. Basically that means that the 2ms would 1ms. Sorry I do not have the Engenuity reference with me. Another alternative is to switch to host base mirroring using ASM or a Linux LVM (https://christianbilien.wordpress.com/2007/06/26/log-file-write-time-and-the-physics-of-distance/)
Cheers
Chris

Comment by christianbilien — June 22, 2008 @ 8:44 pm
I saw the books you recommended and I will be looking at them today.
Operations may be simpler with Mirrorview.
I collection links Optimization rush onthe world.

Comment by racer of SEO — June 23, 2008 @ 1:43 pm
thanks, very interesting

Comment by claport — June 26, 2008 @ 9:26 pm
Christian,
We have recently installed Oracle rac on Solaris 10 (x4200 opterons) using crs,asm and we have a BCP cluster running in the same configuration except it is 2 nodes and our primary is 3. We are now concerned with the level of support and knowledge Oracle seem to have on this platform, e.g. to get a bug port for dataguard which has been available on sparc and RHL for 8 months took another 30 days. Can you tell me is you have found the Oracle support and knowledge behind this product mix good or bad? The Db is 400Gb and is mission critical to the Company so I am trying to work out if the architecture will really help us in the long run.

Thanks
John

Comment by John Carroll — July 4, 2008 @ 7:23 am
Hi John,

Well, simply put, I would discourage anyone to go to Solaris/x86. One of the customers I worked for went for a RAC x86. There were major bugs in Solaris, powerpath and Oracle, not to mention hardware problems with the x4600. Oracle also explained that Solaris x86 was in the 1/3rd tier (aka “exotic boxes”) in term of releases. 10.2.0.3 is not even out yet on Solaris x86.

HTH

Christian

Comment by christianbilien — July 4, 2008 @ 7:33 am
Hello Christian:

I found your blogs very interesting, i have just started working with oracle RAC so have less administrative exposure to it, though i have installed a two node 10g RAC on top of solaris 10, with 3.2 clusterware, during the installation i tried to put ASM as the filesystem, and it was done successfully, the problem i seen during ASM configuration that when i tried to assign full raw disk map from SAN, it was unable to detect, so when i map raw disk under sun volume manager they were detected…now ASM is holding disk under sun volume manger.

I want to know is it a potential problem in the future for me…or i can continue with ASM on top of sun volume manager? second, what might be the way to skip sun volume manager, as i already told you, those raw disk were not detecting…?

TIA

Comment by nayyares — October 30, 2008 @ 9:51 am
Hi TIA,

Check if your asm_disktring is correctly configured and the raw devices ownership is oracle/dba (or whatever).

Christian

Comment by christianbilien — November 3, 2008 @ 10:18 am
I have come to have a high respect for your knowledge of both Oracle and the Storage arena – kudos. Where do you go to learn about advances in Storage Technology and how that affects/influences Oracle RDBMS? For example, implementing RAID 6 (or RAID 50 or RAID 60) on EMC DMX? What about throwing Solaris ZFS into the picture?

Keep up the good work! =)

Comment by Charles Schultz — November 6, 2008 @ 2:26 pm
Hi Charles,

Sorry for being the late answer, I took a week off saling. Thank you for your good words, it’s always nice to read. I like your blog too. As for the sources, I have always been in performance tuning since I started working at HP 18 years ago. I do not really have time to blog nowadays as I shifted towards managing a rather large DBA team and 4 different DB technologies. ZFS is full of promises, but we experienced so many bugs that I would not really dare to put it in production. I’m considering some Exadata investigations, as I had an Infiniband pilot (cluster interconnect and storage blades) built up which was quite impressive. I expect the Exadata machine to be even better.

Kind regards,

Christian

Christian

Comment by christianbilien — November 14, 2008 @ 9:25 pm
Hi Christian,
thank you for your site, a very useful resource, very nice blog

Comment by Красивая женщина — January 13, 2009 @ 10:22 pm
Good blog!
Go to Vlolke!

Comment by Lolka — September 13, 2009 @ 4:44 pm
Thanks! very interesting !!!

Comment by Nikolay Putin — October 1, 2009 @ 6:36 pm
Christian:

Nice blog. We recently upgraded our databases to 11.1.0.7 running on Solaris 10. We have clarion storage servers. For the last one week we have been facing severe performance problem that was not noticed before. Log write is reporting periodic slowness in log write time which jumps to over 3000ms on a routine basis. From OEM grid we also notice over 400ms average wait time for top files. SysAdmin reviewed the storage configuration and don’t suspect any issues with the storage. IOstat doesn’t indicate any storage latency (asvc_t < 10ms). SysAdmins also ruled out any possibility of issues with the storage layer. How shall we go about troubleshooting this issue?

Thanks in advance for your help.

Comment by Ganesh — October 4, 2009 @ 5:58 am
Do you have any thoughts on either the strategy of trying to change the log writes to fewer larger blocks or strategies to implement this?

Comment by John — January 20, 2010 @ 9:02 pm
Any ideas as to why I get much better throughput in back-to-back burst writes
when I Disconnect and then Reconnect to the database between bursts.

In my scenario, I write a SINGLE row to a simple table comprised of string data.
One column of a SINGLE row is constantly changed 25,000 times during the burst.
The first burst completes and is committed in about 25 seconds.
The client sleeps for 60 seconds and then repeats the burst.
The second burst takes about 75 seconds.
The third burst about 3 minutes.

If the connection to the database is toggled between bursts,
the burst of writes consistently completes in about 25 seconds.

It was observed that the server-side orasrv process runs nearly
100% utilized during the burst and well beyond the commit, unless
the connection is toggled in which case the orasrv process terminates.

The behavior would imply that the writes are buffered, except that NO
data is lost when the connection is toggled and each successive burst
has a consistent throughput. So the question is, what would cause
a growing write-latency that seems to go away by toggling the connection.

I believe the current ORACLE version is at least 11i, the driver
uses the OCI API, the results are the same when client is local
or remote to the server, and the Oracle install is pretty much
out of the box.

Any insight would be greatly appreciated.
thanks.

Comment by Brian — June 30, 2010 @ 2:27 am

RSS feed for comments on this post. TrackBack URI

Christian Bilien’s Oracle performance and tuning blog

An integrated view of Oracle, Unix and storage arrays performance and tuning

This blog is inactive (at least for the time being)

67 Comments

Links

New posts

Click on the icon to get the RSS feed

Categories

Comments RSS feeds

Blogroll

Archives

Christian Bilien’s Oracle performance and tuning blog

An integrated view of Oracle, Unix and storage arrays performance and tuning

This blog is inactive (at least for the time being)

Share this:

67 Comments

Links

New posts

Click on the icon to get the RSS feed

Categories

Comments RSS feeds

Blogroll

Archives