APLawrence.com -  Resources for Unix and Linux Systems, Bloggers and the self-employed

Linux 2.6

© April 2006 Dominique Heger & Philip Carinhas

by Dominique Heger & Philip Carinhas, Fortuitous Technologies https://fortuitous.com

New Features in Linux 2.6 - Performance, Scalability, and Stability

Over the last several years, the Linux operating system has gained acceptance as the operating system of choice in many scientific and commercial environments, respectively. Today, the performance aspects of the Linux operating system has improved significantly, as compared to traditional UNIX flavors. This is particularly true for smaller SMP systems with up to 4 processors. Recently, there has been an increased emphasis on Linux performance in mid to high-end enterprise-class environments, consisting of SMP systems that are configured with 64 CPUs. Therefore, scalability and performance of Linux 2.6 are paramount for applications on large systems that are scalable to high CPU counts. This article highlights some of the performance and scalability improvements of the Linux 2.6 kernel.

The Virtual Memory (VM) Subsystem

Most modern computer architectures support more than one memory page size. To illustrate, the IA-32 architecture supports either 4KB or 4MB pages. The 2.4 Linux kernel used to only utilize large pages for mapping the kernel image. In general, large page usage is primarily intended to provide performance improvements for high performance computing applications, as well as database applications that have large working sets. Any memory access intensive application that utilizes large amounts of virtual memory may obtain performance improvements by using large pages. Linux 2.6 can utilize 2MB or 4MB large pages, AIX uses 16MB large pages, whereas Solaris large pages are 4MB in size. The large page performance improvements are attributable to reduced translation lookaside buffer (TLB) misses. Large pages further improve the process of memory prefetching, by eliminating the necessity to restart prefetch operations on 4KB boundaries.

CPU Scheduler

The Linux 2.6 scheduler is a multi queue scheduler that assigns a run-queue to each CPU, promoting a local scheduling approach. The previous incarnation of the Linux scheduler utilized the concept of goodness to determine which thread to execute next. All runnable tasks were kept on a single run-queue that represented a linked list of threads. In Linux 2.6, the single run-queue lock was replaced with a per CPU lock, ensuring better scalability on SMP systems. The new per CPU run-queue scheme decomposes the run-queue into a number of buckets (in priority order) and utilizes a bitmap to identify the buckets that hold runnable tasks. Locating the next task to execute requires a read from the bitmap to identify the first bucket with runnable tasks, and choosing the first task in that bucket's run-queue.

It should be pointed out that the Linux 2.6 environment provides a Non Uniform Memory Access (NUMA) aware extension to the new scheduler. The focus is on increasing the likelihood that memory references are local rather than remote on NUMA systems. The NUMA aware extension augments the existing CPU scheduler implementation via a node-balancing framework. Further, it is imperative to point out that next to the preemptible kernel support in Linux 2.6, the Native POSIX Threading Library (NPTL) represents the next generation POSIX threading solution for Linux, and hence has received a lot of attention from the performance community. The new threading implementation in Linux 2.6 has several major advantages, such as in-kernel POSIX signal handling. In a well-designed multi-threaded application domain, fast user space synchronization (futex) can be utilized. In contrast to the Linux 2.4, the futex framework avoids a scheduling collapse during heavy lock contention among different threads.

I/O Scheduling

The I/O scheduler in Linux is the interface between the generic block layer and the low-level device drivers. The block layer provides functions that are utilized by file systems and the virtual memory manager to submit I/O requests to block devices. As prioritized resource management seeks to regulate the use of a disk subsystem by an application, the I/O scheduler is considered an important kernel component in the I/O path.

It is further possible to tune the disk usage in the kernel layers above and below the I/O scheduler. Adjusting the I/O pattern generated by the file system or the virtual memory manager (VMM) is now an option. Another option is to adjust the way specific device drivers or device controllers handle the I/O requests. Further, a new read-ahead algorithm designed and implemented by Dominique Heger and Steve Pratt for Linux 2.6 significantly boosts read IO throughput for all the discussed IO schedulers below.

The Deadline I/O scheduler available in Linux 2.6 incorporates a per-request expiration based approach, and operates on five I/O queues. The basic idea behind the implementation is to aggressively reorder requests to improve I/O performance while simultaneously ensuring that no I/O request is being starved. More specifically, the scheduler introduces the notion of a per-request deadline, which is used to assign a higher preference to read than write requests. To summarize, the basic idea behind the deadline scheduler is that all read requests are satisfied within a specified time period. On the other hand, write requests do not have any specific deadlines associated. As the block device driver is ready to launch another disk I/O request, the core algorithm of the deadline scheduler is invoked. In a simplified form, the first action being taken is to identify if there are I/O requests waiting in the dispatch queue, and if yes, there is no additional decision to be made on what to execute next. Otherwise, it is necessary to move a new set of I/O requests to the dispatch queue.

The Anticipatory I/O scheduler's design attempts to reduce the per-thread read response time. It introduces a controlled delay component into the dispatching equation. The delay is being invoked on any new request to the device driver, thereby allowing a thread that just finished its I/O request to submit a new request. This basically enhances the chances (based on locality) that this scheduling behavior will result in smaller seek operations. The tradeoff between reduced seeks and decreased disk utilization (due to the additional delay factor in dispatching a request) is managed by utilizing an actual cost-benefit calculation method.

The Completely Fair Queuing (CFQ) I/O scheduler can be considered as representing an extension to the better known stochastic fair queuing (SFQ) scheduler implementation. The focus of both implementations is on the concept of fair allocation of I/O bandwidth among all the initiators of I/O requests. A SFQ based scheduler design was initially proposed for some network subsystems. The goal to be accomplished is to distribute the available I/O bandwidth as equally as possible among the I/O requests.

The Linux 2.6 Noop I/O scheduler can be considered a minimal I/O scheduler that performs basic merging and sorting functionalities. The main usage of the noop scheduler revolves around non disk-based block devices like memory devices, as well as specialized software or hardware environments that incorporate their own I/O scheduling and caching functionality, and hence require only minimal assistance from the kernel. Hence, for large-scale I/O configurations that incorporate RAID controllers and many disk drives, the noop scheduler has the potential to outperform the other three I/O schedulers.


The Linux 2.6 kernel represents another evolutionary step forward, and builds upon its predecessors to boost (application) performance, through enhancements to the VM subsystem, the CPU scheduler and the I/O scheduler. In addition, this new version of the kernel delivers important functional enhancements in security, scalability, and networking.

This outline highlights the major performance features in Linux 2.6. Please visit the Fortuitous Website https://Fortuitous.com for the full article on Linux 2.6 Performance Enhancements. Fortuitous Technologies provides high quality IT services, focusing on performance tuning, capacity planning, and training.

Got something to add? Send me email.

(OLDER)    <- More Stuff -> (NEWER)    (NEWEST)   

Printer Friendly Version

-> Features of Linux 2.6 kernel


Inexpensive and informative Apple related e-books:

Take Control of Apple Mail, Third Edition

Sierra: A Take Control Crash Course

Take Control of Pages

Take Control of the Mac Command Line with Terminal, Second Edition

Take Control of Numbers

More Articles by © Dominique Heger & Philip Carinhas

Fri Apr 14 11:30:18 2006: 1913   drag

With kernel 2.6 that NUMA and scedualing stuff seems about the most important change in terms of enterprise-level servers.

With the scedualing system in 2.4 Linux had a very hard time scaling well above 4 or so CPUs. When you started to get into 8 or more proccessors the overhead of scedualing was so large that it effectively cancelled out any benifits that you gained from those systems over 4 cpus.

Now the vanilla Linux kernel should scale fairly easily up to 64 cpus. With SGI they have their Altix systems now have single-system-style Linux support up to 512 cpus and up to 128 terrabytes (!) worth of RAM in what they call 'Globally Shared Memory'. (Not sure what that means, but it's certainly very impressive.)

Considuring what AMD is doing with it's opteron systems and new PC-based NUMA stuff it seems a match made in heaven with Linux. With future CPUs coming out with 4 or more cpu cores and with motherboards that are commonly aviable that have up to 4 sockets and combine that with PCI express and such. Linux is posed to take full advantage of these changes in PC-server-land.

IBM did some extensive testing of the kernel for the 2.6 release. The following link shows a example of what they do. It's a 8-way PentiumIII machine in a static webpage benchmarking test. By keeping all the software the same and only upgrading to a 2.6 series kernel from a 2.4 series kernel they showed over a 600 increase in performance.

The main thing people have a problem with is the fast moving nature of kernel development nowadays. Previous kernels you had the 'stable' even series and the unstable 'odd' series kernels. Now there is no 2.7 kernel and all development happens in 2.6-related kernels in secondary GIT (the source code management system originally developed by Linus to replace the propriatory Bitkeeper) repositories called Kernel trees. Stuff like 2.6.17-rc1-mm2 is were the development happen and acceptable patches from that tree will be rolled back into the official 'vanilla' Linux 2.6.17 kernel when it gets released.

The days of going to kernel.org and compiling your own kernel are over except for a adventurous few. Your now very dependant your Distro-supplied kernel if you have many machines to deal with. But they are tend to be much nicer then they were with 2.4 and 2.2 kernels and are stable and security patches are backported to older kernels to avoid breaking 3rd party driver support as much as possible.

Fri Apr 14 11:32:08 2006: 1914   drag

doh, got to remember to spell out percent rather then try to do the sign. I ment that it showed a 600 percent improvement in performance over 2.4

Fri Apr 14 12:00:45 2006: 1915   TonyLawrence

The percent key is part of the spam and "bad" html filtering I do in comments - it's a potentially dangerous character.

However, it is annoying to have to avoid it, so I've translated it to a numeric entity equivalent. You can now type %

Mon Apr 17 05:39:45 2006: 1928   fortuitous

I still favor compiling kernels for systems when practical. This is not burdensome if you have a large number of systems that are similar or identical. The main reason I do this is for security: every module is a potential risk. It may be a low risk, or near zero, but its still risk. I don't dispute that distro kernels are better than they used to be, but there are still potential hazards in using them and certain performance benefits with an trimmed kernel.


Printer Friendly Version

Have you tried Searching this site?

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us

Printer Friendly Version

What happens then? Is there a ticker tape parade and heartfelt thanks from the computer it has reached? No, my friends, there is not. The poor packet is immediately gutted, stripped of its protective layers and tossed into the hungry maw of whatever application (mail, a webserver, whatever) it belongs to. (Tony Lawrence)

Linux posts

Troubleshooting posts

This post tagged:



Unix/Linux Consultants

Skills Tests

Unix/Linux Book Reviews

My Unix/Linux Troubleshooting Book

This site runs on Linode