Copyright (C) 2002, Andrew D. Hwang ---------------------------------------------------------------------------- Date: Tue, 27 Aug 2002 10:27:49 -0400 (EDT) Subject: Operating systems (Part I: The kernel) If any true computer scientists (I'm "honorary":) join the list, I trust they'll correct any errors I make. A computer operating system has three "physical" components: 1. The kernel 2. Libraries 3. Utility programs The Kernel ========== The kernel is the special program that speaks directly to the hardware, and that allocates memory, disk space, processor time, and access to peripherals (mouse, keyboard, screen, video, sound, and network cards...) among all the other programs that run. An analogy: The programs running on your computer are a bunch of subcontractors. Each of them needs building materials ("system resources") in varying amounts that depend on what the program does. An email program has to read from the keyboard and write to the screen, but does not require much time on the processor. An action game requires a huge amount of processor time and video memory. A web browser keeps copies of all the pages you visit, so it requires a lot of disk space and memory, and probably a fair amount of processor time. Now, only limited amounts of each resource are available. The kernel is the warehouse manager, who inventories the resources continually and allots each subcontractor (program) resources it needs according to complicated rules. If the kernel is well-written, each program will get a fair share, and no program will have to wait long. "Linux" is the kernel of GNU/Linux. "Darwin" is the kernel of MacOS X. The Windows kernel does not have a name, probably because Microsoft has designed Windows to be indivisible. ("The browser is an inseparable part of the operating system.") There are dozens of more obscure kernels, many of which run on Intel-family PC hardware. The kernel is always running when your PC is booted. It is the first program to start (this is a small oversimplification), and the last to shut down. A "smart" kernel like Linux can be configured while it is running. Other kernels read their configuration parameters from the hard drive, and must be rebooted every time a change is made to a system parameter, or a piece of hardware is added or removed. In GNU/Linux, you can (and many do!) "compile" or "build" your own kernel from scratch. This is not as difficult as it sounds, but does require detailed knowledge of your hardware and a fairly good understanding of the operating system as a whole. ---------------------------------------------------------------------------- Date: Tue, 27 Aug 2002 11:36:58 -0400 (EDT) Subject: Operating systems (Part II - Libraries) Libraries ========= A modern operating system includes several thousand programs. However, many of the individual functions are common to many programs. For example, most programs have to draw rectangles on the screen and fill them with text, or accept input from the keyboard and mouse. Other functions are less apparent: a program must be able to ask the kernel for more memory when needed, and a well-written program will "free" memory when the space is no longer needed. A program that is not "well-written" in this sense is an extremely bad citizen, and will, in a very short time, either crash the machine (under Windows '95/98) or slow it to a crawl (under Linux). It is wasteful in many ways to have every program contain the code needed to perform common operations. (Aside from the waste of using memory several times for the same bits of code, programmers' time is wasted re-writing essentially the same code over and over.) The solution is to put commonly-used code into a "library". Windows users recognize libraries by the extension "dll" (dynamically-linked library), while *nix users have two types of library: ".a" (for archive, or "static") and ".so" (shared object, or "dynamic"). Libraries are one of the most important (yet arcane) parts of an operating system. If the kernel is the "brain", libraries are the spinal cord and peripheral nerves. Even minor data corruption in a library directory can have disastrous consequences (read: irrecoverable system crash). All computer software is written in a human-readable language. The result is "source code". Source code is then processed by a program called a "compiler" into executable code a machine can run. Each computer language has its own compiler. As a *nix user, you are likely to become familiar with source code and compilers. This is rightly likened to becoming familiar with kitchens, recipes, and food ingredients after eating out of cans your entire life. :) Linking ======= When source code is compiled, the functions provided in libraries are made available in a process called "linking". A program linked to an archive file (e.g., libpcap.a) is "statically linked". A statically linked program is self-contained; it has a built-in copy of all the library functions it requires. Statically-linked programs are often used for system recovery, for the obvious reason that they will run even if system libraries are trashed. A program can also be linked against a shared object (libXaw.so). In this case, when the program is run, the kernel looks to see what functions the program needs, then loads into memory any required functions that are not yet in memory. In this way, the same library functions are potentially available to several programs, which saves memory. Exercise: List some advantages and disadvantages of statically-linked and dynamically-linked programs. Explain why the kernel *must* be statically linked. At some point, you will want to install your own software, and shortly thereafter you'll have your first experience installing libraries. Most libraries are no more difficult to install than software, which is to say "you'll find it easy once you get the hang of it". Exercise: The kernel and most system utilities are written in C. How are the C libraries installed in the first place? What special problems are inherent in upgrading the C libraries? Try to outline a procedure by which the C libraries could be replaced in a running system. (Tricky:) Dependencies ============ Most libraries refer to other libraries; for example, the math library libm.so depends on the C library. Under *nix, libraries are carefully separated, and their dependencies are never circular. Installing one library will never change another library. If some dependency is broken, the new library will not install: short-term inconvenience, but long-term stability. If a library is installed then uninstalled, the system is returned to its previous state. Microsoft Windows is written as a user-friendly, feature rich operating system. Consequently, an application like Word or an Explorer plugin may require non-standard features not provided by the vanilla system libraries. The obvious solution is to allow the application's install wizard to fiddle with the system libraries...right? Let's see how this scheme works in practice. When you install Office, some system libraries get changed, which allows components of Office to provide you with a richer experience. Next you install a driver for your new mouse (which also changes some of the system libraries). If you're unlucky, both operations have updated the *same file*. If you're *really* unlucky (read: you're installing four or five new pieces of software), one of the modified files is also linked to the install wizard. Oh dear...that means if you had installed the mouse first and Office second, your system could be in a different state than it is now. Exercise: List three more things that are horribly wrong, from the point of view of having a stable, predictable system. Address the following: can the system be returned to a known working state? Without re-installing from scratch? Can the helpdesk technician reliably assist you just by knowing what software you tried to install? If you uninstall the mouse driver (the last thing you installed), is the system necessarily in the same state it was in before you installed the driver? How does Microsoft itself keep track of library dependencies? Why does Windows become less stable as more software is installed? ---------------------------------------------------------------------------- Date: Tue, 27 Aug 2002 12:28:23 -0400 (EDT) Subject: Operating systems (Part III - System Utilities) The most familiar part of an operating system (at least, of GNU/Linux) comprises the programs you run to administer the system. These range from a "shell" (command interpreter) to administrator programs that affect the state of the running kernel, to compilers and linkers used to build software from source code. The final part of this trilogy is more political than technical, and attempts to explain why I am careful to say "GNU/Linux" rather than just "Linux" (unless I am referring to the kernel). The GNU project was started by Richard Stallman in 1984 with the ambitious aim of providing a Free Unix-like operating system. The project's name is a self-referential acronym, "GNU's Not Unix!", and is pronounced guh-NEW. The concept of "Free" meant that everyone would have access to the source code of the entire project, and that everyone would be free--even welcome--to make changes and improvements. The only stipulation was to be an enforcement of good community behavior: If you take GNU code, change it, and publish your programs, you must make the source code of your modifications available to the community under the same terms by which you obtained the community's code in the first place. This license is today the famous GNU General Public License, known as "GPL" for short. By 1990, the GNU project had matured into a nearly complete operating system--text editors and processing utilities, system and application libraries, compilers and related utilities, a shell, and so forth--that were in wide use on Unix. However, the project's kernel, the HURD, remained fraught with problems. It was in this atmosphere that Linus Torvalds made his now-famous Usenet posting in late 1990, describing his intention to create from scratch a clone of the Unix kernel that would run on an Intel 386. Within two years, the kernel now known as "Linux" was sufficiently stable and featureful to be in wide use among a community of dedicated hackers [1], and to provide the final piece in the GNU puzzle. For political reasons, the community has settled on the name "Linux", a name that omits the GNU project's foundational role in the Free software community. Very roughly, the community can be divided into the "open source" camp and the "Free softwre" camp. Open source advocates tend toward Libertarianism--no one has the right to restrict what legal activities I do with my computer--and view GNU/Linux highly because of its technical quality. Free software advocates *in addition* take the view that there is a fundamental ethical component to sharing source code: the freedom to share makes high-quality software and robust data formats accessible to anyone with a computer, anywhere in the world. Information is not restricted by commercial licenses designed to concentrate wealth and power in the hands of a few, but is available to anyone with an Internet connection. The high quality of Free software is desirable, but is merely a fringe benefit of the open development model. To Free software advocates, access is a basic right of democracy in the true sense. This concept of freedom has been championed by the GNU project. Understandably, Stallman and others feel slighted by the omission of its name from the operating system they helped create and develop. One of the largest issues facing the community is the creation of a workable business model. The GNU GPL does not prohibit charging money for software, but it does effectively lower the price that can be charged, since anyone may *legally* copy and distribute GPL-ed programs. To many, the GNU project seems anti-business, which partially explains the removal of its name from GNU/Linux. Within the past year (2002), large companies including IBM and HP have begun supporting "Linux" actively, and the PC giant has become increasingly propagandistic and vehement in its attacks on Free software (calling the GPL "viral" and "un-American", and Linux a "cancer", for example). The movement is increasingly influential, and we in academia who depend on freedom of access to information would do well not to forget the foundational role of the GNU project. [1] Hacker: A problem solver, usually who enjoys finding novel uses for technology. Not to be confused with a computer criminal, properly called a "cracker" or "vandal". Andrew D. Hwang ahwang@mathcs.holycross.edu Department of Math and CS http://math.holycross.edu/~ahwang College of the Holy Cross (508) 793-2458 (Office: 320 Swords) Worcester, MA, 01610-2395 (508) 793-3530 (fax)