SCO Unix Performance Tuning
© Anthony Lawrence, aplawrence.com
Comments on older SCO Unix performance tuning kernel parameters
This is a look at SCO Unix performance tuning. You might also want to read SCO Unix memory tuning.
There is more tuning related material in the SCO Unix FAQ also.
You definitely want to read the SCO Unix Performance Guide.
More modern kernels tune for memory automatically. There is also the fact that today's hardware is very fast to begin with - fine tuning is seldom needed and, because the self tuning is probably much more competent than you are, is not usually indicated.
There's even more to consider if you are running virtualized. While memory tuning might still make sense, disk buffer related tuning might not as the virtualization software may already be providing caching. Still, only experimentation will prove anything.
Should you need to fiddle, here are thoughts from people in the newsgroups. To save space, I've eliminated extraneous material and indicated that with "...". Links to the original post or thread are provided where they still work.
High buffs low buffs NHBUF tuning
Linux automatically uses any currently available ram for buffer cache, releasing it back to you when you need it. On SCO OSR5, however, you tune these parameters:
I believe that it has been suggested that 30% of memory is a good starting point. You need to test and see where it becomes either silly (no more gain) or starts to interfere with other memory needs.
SCO also has "high bufs" and "low bufs". High bufs are necessary when there's no more room under 16M. They are "bad" in the sense that non-32 bit controllers can't access them directly. Search in your online help for "PLOWBUFS" and then look at the "Positioning the buffer cache in memory" link.
Practical limit? Brian Wong (see Book Review: Configuration and Capacity Planning for Solaris Servers ) mentions systems configured with 80% of memory for cache! However, he also points out that if you have LOTS of memory, and are doing normal file I/O (as opposed to raw database caches), the normal funtioning of virtual memory keeps your data available in memory anyway (the difference is that the BUF forces disk reads into a buffer- otherwise they are just competing with everything else in memory). So when you have an OS and hardware that supports more memory than the size of your data files, you can stop worrying about NBUF.
The maximum possible setting for NBUF is 450000.
Next, we have someone asking about database tuning. Usually the database vendor will be very specific about what they need, so consider this as general advice should you be stuck with something where youcan no longer get any advice,
The questions referred to are those asked by running "./configure" in /etc/conf/cf.d. That command can also display the value of resources: "./configure -y NHBUF", for example displays the current value of NHBUF. Check the man page for more and see Kernel parameters that you can change using configure..
Message-ID: <[email protected]> From: John Gray <[email protected]> Newsgroups: comp.unix.sco.misc,dropbox Subject: Re: Scenario for the gurus Date: Thu, 15 Apr 1999 21:28:33 -0700 ... Openserver 5.05 will use by default 10% of your total memory for disk buffers with a cap of about 6 meg by default. Depending on how much memory you have you can starve the database engine. This can be changed by running configure in /etc/conf/cf.d. select 1 and then alter NBUF, each bufffer is 1K bytes. NHBUF will auto size so don't worry about that one. If you have more than 64 meg theres a good chance you may want to raise this. By default PLOWBUFS is 30 which will attempt to grab 30% of the space allocated to your disk buffers and place them below 16 meg. If your system is PCI you won't need much except for maybe your floppy. No more than a meg of low buffers will be needed. Depending on how large your buffer cache is and how many temp files get created and removed you may want to increase, decrease or leave alone the BDFUSHR parameter. This is how often the kernel scans the disk buffers and writes them out to disk. The default is once every 30 seconds which may be perfect for your needs but you can always adjust it. For example if you do lots of compiles may raising it to 45 or 60 seconds may work wonders but the down side is you leave your self more vunerable to disk corruption if you have a kernel PANIC. Selecting option 2 will allow you to adjust GPGSLO and GPGSHI. By default the kernel will wait until there are about 25 pages (100k) left before it starts kicking out dormant processes. ( hopfully dormant :-) ) It will then continue until there are 40 pages free (160k). If you are tight on memory this might be fine but if you have more RAM then you may not want to wait around and bounce around the limit of your memory. It's really an art that is system specific but being a little more agressive with the GPGS lo and hi numbers may prevent some disk thrashing. Option 7, since you are running a database, which may or may not have many deamon running with it's user ID you may need to up MAXUP. This is the number of processes that can belong to one single user at a time. When running large compiles I have found that I need to up this to 500 or my builds can fail. Option 12, NSTRPAGES. by default it is set to 500 which is 2K. If there is heavy network traffic this can be raised also. The pages are allocated as needed and this number is a limit. Using netsat -m to monitor the system during peak loads might give you a good idea if it needs to be increased. ( wierd error messages too) Options 13, 15 and 16 (messages queues, semaphores and shared data) probably have some values that your database vendor can specify in the documentation which is platform specific. Each database vendor will use these differently and the kernel defaults are not optimal. There is a good book that was written for 3.2V4 but has lots of good info which details tuning your system for databse performance. It's "The SCO Performance Tuning Handbook" Gina Miscovich and David Simons. ISBN 0-13-102690-9. It was published in 1994. Take this into account when they talk about memory and cpu load. The HW has changed a little. :-) Lastly I can't take much credit for what I have stated here. I have learned most of this from working with some very talented people. hope it helps -john
Unixware kernel parameter tuning
While I wouldn't call SCO OSR5 "user friendly", it was far more so than Unixware. That said, for admins coming from SCO, most of it was just confusion from being different.
This is an example of the difference in tuning kernel paramaters.
From: Dave Noble <[email protected]> Newsgroups: comp.unix.sco.misc Subject: Re: Kernelparameter Date: Wed, 21 Jul 1999 12:39:29 +0100 References: <7n3ntn$52u$[email protected]> UnixWare doesn't use the mtune/stune files like OpenServer does, you will need to edit the files in /etc/conf/mtune.d to change the default kernel parameters. But the safest way to change kernel parameters, if you are not going to use the System Tuner from the GUI, is to use /etc/conf/bin/idtune - see the man page for idtune(1M). Dave. ------------------------------------------------------- Dave Noble | Sphinx CST | www.sphinxcst.co.uk -------------------------------------------------------
That's the maximum number of simultaneous requests that can be queued for each SCSI disk. You can check what's being used with "sar -S" (See Why is my system so slow? for sar on SCO Unix.
From: "James R. Sullivan" <[email protected]> Newsgroups: comp.unix.sco.misc Subject: Re: Performance question Date: Thu, 27 Sep 2001 08:52:06 -0700 References: <[email protected]> ... I can't remember if the NHBUFS get automatically adjusted when you change NBUF, but you should make sure that they are appropriately sized. In the past, you wanted a 4:1 ratio between NBUF and NHBUFS, with NHBUFS being a power of 2. This later changed to a 1:2 ratio on MP systems. Either way, make sure that NHBUFS is the right size for NBUF=100000, probably around 65536 or 32768. I'd set SDSKOUT as high as I could, generally 256, based on the mtune entries. The higher the number, the harder the SCSI bus will be working. I have seen instances where increasing this number caused the system to crash, due to the quality of the SCSI bus/termination. Go neutral, bump it to 128 and see what happens. -- Jim Sullivan Director, North American System Engineers Tarantella! http://www.tarantella.com 831 427 7384 - [email protected]
inconfig network tuning
This was about adjusting tcp/ip download performance with "inconfig". Max value is 65535!
From: Jeff Liebermann <[email protected]> Subject: Re: TCP tuning? Date: Sat, 21 Apr 2001 16:07:42 -0700 Yes. It's easy. Use "inconfig". It will patch the running kernel as well as the /etc/default/inet configuration file. No kernel relink required. Watch out for this inconfig bug: inconfig breaks the symbolic link for /etc/default/inet. Quoted from some email I excavated from my archives: ====== I've noticed that my download performance on my DSL line is seriously sluggish with 5.0.6 BL7 as compared to my other boxes. I tracked the problem down to the maximum receive window size as defined in /etc/default/inet. The current value is: in_recvspace 4096 Running: inconfig in_recvspace 32768 and relinking the kernel resulted in 2.5 times the download performance. With the original 4096 value, I was getting about 40KBytes/sec download from various ftp sites resident within PacHell. After the change, it was 100-120KBytes/sec. Absolute max on my DSL is 150KBytes/sec, but I have other users sharing the bandwidth. Unfortunately, I only have 128Kbits/sec upload bandwidth and therefore cannot test upload performance. (some divel deleted) Notes: 1. I always downloaded everything twice using the 2nd download as the benchmark. The idea was to fill up the ISP's object cache. 2. I suspect (i.e. guess) that in_sendspace also needs to be increased. -- Jeff Liebermann [email protected] 150 Felker St #D Santa Cruz CA 95060 831-421-6491 pager 831-429-1240 fax http://www.cruzio.com/~jeffl/sco/ SCO stuff
NAUTOUP BDFLUSHR disk performance tuning
BDFLUSHR is how often (in seconds) to write the filesystem buffers to the disk. Writes are affected by NAUTOUP and BDFLUSHR; the documentation explains:
Specifies the rate for the bdflush daemon process to run, checking
the need to write the filesystem buffers to the disk. The range
is 1 to 300 seconds. The value of this parameter must be chosen in
conjunction with the value of NAUTOUP. For example, it is nonsensical
to set NAUTOUP to 10 and BDFLUSHR to 100; some buffers would be
marked delayed-write 10 seconds after they were written, but would
not be written to disk for another 90 seconds. Choose the values for
these two parameters considering how long a delayed-write buffer
may have to wait to be written to disk and how much disk-writing
activity will occur each time bdflush becomes active. For example,
if both NAUTOUP and BDFLUSHR are set to 40, buffers are 40 to 80
seconds old when written to disk and the system will sustain a large
amount of disk-writing activity every 40 seconds. If NAUTOUP is
set to 10 and BDFLUSHR is set to 40, buffers are 10 to 50 seconds
old when written to disk and the system sustains a large amount of
disk-writing activity every 40 seconds. Setting NAUTOUP to 40 and
BDFLUSHR to 10 means that buffers are 40 to 50 seconds old when
written, but the system sustains a smaller amount of disk writing
activity every 10 seconds. With this setting, however, the system
may devote more overhead time to searching the block lists.
WARNING: If the system crashes with BDFLUSHR set to 300 (its maximum possible value) then 150 seconds worth of data, on average, will be lost from the buffer cache. A high value of BDFLUSHR may radically improve disk I/O performance but will do so at the risk of significant data loss.
The discussion that follows was about tuning on a RAID system that sufferred from intermitten long pauses as it flushed disk buffers.
I can't find the "tls613" referred to below.
From: Bela Lubkin <[email protected]> Subject: Re: Long pauses Date: Tue, 16 Apr 2002 10:02:33 GMT References: <[email protected]> ... In general I would expect reducing BDFLUSHR to improve this, and reducing NAUTOUP to make it worse. The configuration you tested says: every time a block is written, keep it around in cache until it's at least 5 seconds old; sweep the cache for blocks old enough to write out every 20 seconds. Since it's only being swept every 20 seconds, 20 seconds worth of writes suddenly "come due" at once (blocks written between 20 and 25 seconds ago). If something was busy writing during that period, the performance hit is large. I have been experimenting recently with BDFLUSH=1. This sweeps the cache every second, so any one load of blocks to be written should be reasonably small. This should work with any reasonable NAUTOUP value. e.g. BDFLUSH=1, NAUTOUP=1 will sweep blocks out to disk no more than 2 seconds after they're made dirty. BDFLUSH=1, NAUTOUP=30 keeps most writes around for a full 30 seconds, then sweeps them to disk in small chunks. Modern CPUs are fast enough that it isn't terribly costly to run the sweep every second. A test box which is a busy multiuser system with 2 Pentium III-1000 CPUs has been up for 37 days and bdflush has accumulated about 50 minutes of CPU time, or about 1/1000 of one CPU over the time it's been up. Setting BDFLUSHR to 1 does have a noticable effect on the system. You can _hear_ it writing to the disk every second, like clockwork. Sometimes it's a little write, sometimes big. The overall amount of work being done by your disks is about the same, but it's distributed differently and you will notice the difference. Even with BDFLUSHR=1, you can have big disk hits (but not quite as big). The problem is, the CPU is _much_ faster than the disk. Since the buffer cache decouples write performance from disk performance, a process can "write" data much faster than the disk could accept it. It sits in cache for the specified time (NAUTOUP + cycle time until the next BDFLUSHR), then all comes due at once. Even though it's delayed, it's still faster than the disk can accept it, so the disk gets very busy for a while. What's needed is a way to tell the buffer cache to tone it down: that getting writes out exactly when NAUTOUP and BDFLUSHR conspire to isn't _that_ important -- after all, hardly anyone even knows that they exist, much less how to tune them for a specific application! Even when the buffer cache has a lot of "due" write buffers, it should take time out to let a lot more read requests through. Well, it currently doesn't. Musings for future development. I need to find papers on the subject and read up on what's been done elsewhere. >Bela< ... From: Bela Lubkin <[email protected]> Subject: Re: Long pauses Date: Tue, 16 Apr 2002 18:59:01 GMT References: <[email protected]> FYI, both of these parameters can safely be tuned on a running system. (This is not _generally_ true of all parameters, but I've looked carefully at the kernel source that uses these two parameters and it both re-acquires the values from the main tunable variables, and uses them in a safe manner which won't be harmed by dynamic changes.) To change them, you can use /etc/scodb: # scodb -w scodb> tune.t_bdflushr=1 scodb> v.v_autoup=A scodb> q ^ ^ or whatever values you want; note: hexadecimal! You could also use /etc/pat, which is in tls613 at ftp://stage.sco.com/TLS/: # pat -n /unix tune+20 ........ = 00000001 # tune.t_bdflushr=1: run bdflush often # pat -n /unix v+4c ........ = 0000000A # v.v_autoup=10: flush dirty bufs after 10s `pat` has the advantage that you can put comments into the flow of control. It has the fairly large disadvantage that it doesn't know the shapes of the structures, so you have to manually compute them and could mistakenly patch the wrong field. If you patch BDFLUSHR, bdflush will wake up after the last sleep at the old BDFLUSHR, then start sleeping at the new rate. So for instance you may not notice the once-a-second write cycle until up to old-BDFLUSHR seconds have passed. I believe you could also force the issue by doing a `sync` immediately after tuning it. If you patch NAUTOUP, buffers already in the buffer cache _will_ be affected -- it is checked by bdflush() whenever it wakes up -- the timeout is a property considered by the bdflush algorithm rather than a property stored in each individual buffer. (I suppose you could dynamically tune these parameters to deal with external conditions -- set both to 1 during a lightning storm, trading performance for minimal data loss in the event of a catastrophic failure?) >Bela< From: Bela Lubkin <[email protected]> Subject: Re: Long pauses Date: Thu, 25 Apr 2002 10:26:06 GMT References: <[email protected]> A process doing continuous writes for even one second could probably store up several seconds' worth of write activity. You could experiment with setting BDFLUSHR to 0, which (if my reading of the code is right) will cause it to be executed on every timer tick. Then you should have no more than 1/100s worth of writes come due at once. And likewise, try setting NAUTOUP to 0. The combination means "do all writes pretty much immediately (but waste a fair amount of CPU time figuring out what to do)". NOTE: to any current or future readers, we're talking about a RAID system where presumably the RAID caching controller handles write scheduling. You would _not_ want to configure a system with either of these parameters set to 0 if that meant actually performing all writes immediately. Performance would suffer terribly! >Bela<
raid nhbuf nbuf bdflush performance tuning
More questions about BDFLUSHR and NAUTOUP
From: [email protected] (Stephen M. Dunn) Subject: Re: Raid References: <[email protected]> Date: Mon, 21 Oct 2002 02:58:01 GMT ... Nope. bdflush doesn't write _all_ delayed write data each time it's run; it writes only those blocks which have been waiting for at least NAUTOUP seconds. If BDFLUSHR is 30 seconds and NAUTOUP is (say) 10 seconds, then every 30 seconds, the system writes out everything that's older than 10 seconds (so anything that's 10-40 seconds), but doesn't write out anything under 10 seconds. Set BDFLUSHR to 1 second, for example, and leave NAUTOUP at 10 seconds. Now, every second, the system writes out everything that's older than 10 seconds (so anything that's 10-11 seconds is fair game; nothing is older than 11 seconds because if it were, it would have been written last time, or the time before, or ...), and still doesn't write anything under 10 seconds. In the first case, you only have to put up with bdflush blocking you twice a minute, but since it has to write half a minute's worth of dirty blocks each time, the pauses could be quite long. In the second case, bdflush blocks you every second - but since it only has to write a second's worth of data, the pauses will generally be quite short and, unless you have very heavy write activity (or very slow hard drives), the pauses will probably be less noticeable or objectionable to the user. The first case is slightly more efficient overall, but if the second case generates fewer user complaints, it may be better than the first one even though it's less efficient. And of course you don't have to pick between 30 seconds and 1 second; you can choose whatever value of BDFLUSHR best suits your system (and ditto for NAUTOUP). FWIW, my home PC, which is on a small UPS and doesn't have a caching disk controller, has NAUTOUP at 10 seconds and BDFLUSHR at 1 second. At times when disk activity is heavy (like a batch of incoming news being processed), it doesn't have big pauses like it used to when I used the default settings. This is one of the classic performance vs. reliability tradeoffs. Delayed writes (whether done in the OS or in the disk subsystem) can significantly improve performance, but at the cost of potentially greater data loss in the case of a crash. Also, if you're using a caching disk controller with battery backup, make sure you're monitoring the status of the battery. I can't speak for all such controllers, but many (most?) of them use a lithium battery, which will probably expire 5-10 years down the road. Some might use rechargeable batteries, but they don't have infinite lives, either. Some battery-backed caching controllers will automatically switch to write-through mode if they detect that the battery is dying, to ensure reliability at the cost of performance. Check the manuals for your controller to find out how you can get status reports on the battery - perhaps there will be messages displayed on the console and/or written to syslog, or maybe there's a special monitoring program you have to run in order to find out if there are any problems. -- Stephen M. Dunn <[email protected]> >>>----------------> http://www.stevedunn.ca/ <----------------<<< ------------------------------------------------------------------ Say hi to my cat -- http://www.stevedunn.ca/photos/toby/
vhand plowbufs tuning configure memory low buffers
PLOWBUFs is a percentage- so 5% of 1 gig is 50,000,000 etc. Nowadays there's not much you'd need low buffers for anyway (these are for things that need DMAable memory - see Why do I get "vhand spinning"? for details.)
On 5.0.6, if you specify MORE than 100, it's taken as actual memory to allocate- which eliminates the percentage problem entirely. If you are running 5.0.6, just cd /etc/conf/cf.d; ./confgure; choose Buffers and fix it.
scsi disk performance tuning wio sar
From: "James R. Sullivan" <[email protected]> Newsgroups: comp.unix.sco.misc Subject: Re: Performance question Date: Wed, 26 Sep 2001 10:14:46 -0700 References: <[email protected]> Trimmed and commented: Adrian wrote: > > Hi, > > Thanks for your prompt response. I actually thought I had posted the memory > section of the sar output. I should just post all of the output. See below. > > > Here is the whole sar output (with my comments/questions again): > > =========================== > > SCO_SV thor 3.2v5.0.5 PentII(D)ISA 09/25/2001 > > 10:22:49 %usr %sys %wio %idle (-u) > Average 9 28 45 18 > > 10:22:49 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s (-b) > Average 416 5953 93 83 119 31 0 0 > > 10:22:49 device %busy avque r+w/s blks/s avwait avserv (-d) > Average Sdsk-0 45.84 1.04 54.55 441.62 0.38 8.40 > Sdsk-2 100.00 1.05 122.42 554.50 0.54 10.28 Here is a problem, without a doubt. WaitIO (the wio in the first line), indicates a condition where a process is ready to run, but is blocked waiting for some IO event to clear. In all likelyhood, they are waiting for Sdsk-2 to become free. Some possibilities, drawn from old memories, would be: increase buffer cache. You have spare memory and are not swapping, so increasing the buffer cache would/could help. Your % of Read Cache is generally good. The write % is low, but you're probably writting to different parts of the disk/database so there's little you could do about that. I suspect that the program is writting with a sync of some sort, which may cause the significant waitio number. increase SDSKOUT. This used to be the number of SCSI transactions that the system would queue. A higher number wouuld queue more transactions and may inprove the disk performance. Get a better disk subsystem :-) > 10:22:49 runq-sz %runocc swpq-sz %swpocc (-q) > Average 1.7 100 > > => according to 'man sar' if runq-sz is >2 and %runocc is >90% then the > CPU is heavily loaded and response time will be degraded. > These results seem to concur with the CPU utilization above, suggesting > that CP is the bottleneck. Again, does this make sense? Not with your Disk situation. They're ready to run, but the disk is holding them back. Fix that first. Any time that system is in WaitIO, nothing is happening. In all my performance tuning over the years, I've always focused on reducing WaitIO when I see it. my $0.02, from an old SCO SE. -- Jim Sullivan Director, North American System Engineers Tarantella! http://www.tarantella.com 831 427 7384 - [email protected]
(OLDER) <- More Stuff -> (NEWER) (NEWEST)
Printer Friendly Version