Main Page
From Linux Troubleshooting
This is a guide to basic (and sometimes not so basic) troubleshooting and debugging on Linux systems. Goals include description and usage of common tools, how to find information, and what to do with that information. Emphasis will be on software issues, but might include hardware as well.
Something isn't working, what do you do?
It happens to the best of us. At some point, our perfectly configured and optimized systems decide to make our lives interesting. A process suddenly will not start up, the database is returning bogus results, or the installation of that new app is being more troublesome than anyone would like. There is a problem.
So what's the next step? Troubleshooting a problem is always an interesting challenge. Generally, it's not obvious what the problem is, so the first task is to start figuring out what is going on. The more information you can figure out about what is happening, the easier it is to figure out why the thing you expect to happen isn't.
Tools
Efficient debugging and troubleshooting is often a matter of knowing the right tools for the job, and how to use them.
oprofile
See in general:
- SourceForge: oprofile (http://sourceforge.net/projects/oprofile/)
- RHEL 3 manual: oprofile (http://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/sysadmin-guide/ch-oprofile.html)
- IBM Developerworks: Smashing performance with OProfile (http://www.ibm.com/developerworks/linux/library/l-oprof.html)
add stuff here
perfmon
Perfmon is the work of the folks at HP Labs and specifically designed for the IA64 platform. See this site (http://www.hpl.hp.com/research/linux/perfmon/) for more information. Perfmon (like the Q-Tools listed below) will be of little to no use to the mainstream x86 Linux community, but its usefulness for the IA64 Linux crowd is without equal on the x86 platform.
add more stuff here
Installing OProfile
OProfile is included in the Linux 2.5 kernel and above and on most newer distributions, including Red Hat 9. You can also download OProfile using the link in the Resources section later in this article. You'll need to recompile the kernel with OProfile enabled; here is how to do that:
Enable OProfile:
- cd /usr/src/linux
- make xconfig/menuconfig
Enable OProfile in the profiling menu and set CONFIG_PROFILING=y and CONFIG_OPROFILE=y in the .config file. Also enable Local APIC and IO-APIC in the "Processor type and features" menu.
Recompile like so: #make dep (use for 2.4 kernel versions )
- make bzImage
Boot the new kernel.
To configure and install the OProfile utilities, enter this: #./configure --with-linux=/usr/src/linux/ --with-qt-dir=/usr/lib/qt/
--with-kernel-support
- make
- make install
For information about system requirements and for more detailed installation instructions, see the links in the Resources section.
Back to top
Overview of OProfile tools
op_help: Lists available events with short descriptions opcontrol: Controls OProfile data collection oprofpp: Retrieves useful profile data op_time: Lists the relative profile values for all images on the system op_to_source: Produces annotated sources, assembly or mixed source assembly op_merge: Merges sample files belonging to the same application op_import: Converts sample database files from the foreign format (abi) to native format
Back to top
Three quick steps to start profiling
Start up the profiler: # opcontrol --setup --ctr0-event=CPU_CLK_UNHALTED
--ctr0-count=600000 --vmlinux=/usr/src/linux-2.4.20/vmlinux
For RTC mode users, use --rtc-value=2048
# opcontrol --start
Now the profiler is running; go do what ever you want to profile.
Dump the profiled data with this option: # opcontrol --stop/--shutdown/--dump
Back to top
OProfile analysis: Cache memory utilization problem
Caches are the memory closest to the processor execution unit. Cache memory is much smaller and faster than main memory and can either be internal or external to the processor chip. Caches contain copies of the most frequently used instructions and data. By allowing fast access to frequently used data, software can run much faster than if it had accessed the data from the main memory. In the Intel IA32 P4, data is stored in cache lines of 32 bytes each.
For multiple CPUs, the cache lines in a CPU's cache are invalidated when a CPU modifies data that is shared between CPUs.
If the data or instruction is not present in the cache, or if the cache line is invalidated, the CPU updates its cache by reading the data from the main memory. The processor event that does this is known as L2_LINES_IN. Reading the data from the main memory requires more CPU cycles. OProfile helps you identify a cache memory problem like the one in Listing 1.
Listing 1. Code with cache memory problem /*
* Shared data being modified by two threads running on different CPUs. */
/* shared structure between two threads which will be optimized later*/ struct shared_data_align {
unsigned int num_proc1; unsigned int num_proc2;
}; /*
* Shared structure between two threads remains unchanged (non optimized) * This is required in order to collect some samples for the L2_LINES_IN event. */
struct shared_data_nonalign {
unsigned int num_proc1; unsigned int num_proc2;
};
/*
* In the example program below, the parent process creates a clone * thread sharing its memory space. The parent thread running on one CPU * increments the num_proc1 element of the common and common_aln. The cloned * thread running on another CPU increments the value of num_proc2 element of * the common and common_aln structure. */
/* Declare global data */ struct shared_data_nonalign common_aln;
/*Declare local shared data */ struct shared_data_align common;
/* Now clone a thread sharing memory space with the parent process */
if ((pid = clone(func1, buff+8188, CLONE_VM, &common)) < 0) {
perror("clone");
exit(1);
}
/* Increment the value of num_proc1 in loop */
for (j = 0; j < 200; j++)
for(i = 0; i < 100000; i++) {
common.num_proc1++;
}
/* Increment the value of num_proc1 in loop */
for (j = 0; j < 200; j++)
for(i = 0; i < 100000; i++) {
common_aln.num_proc1++;
}
/*
* The routine below is called by the cloned thread, to increment the num_proc2 * element of common and common_aln structure in loop. */
int func1(struct shared_data_align *com) {
int i, j;
/* Increment the value of num_proc2 in loop */
for (j = 0; j < 200; j++)
for (i = 0; i < 100000; i++) {
com->num_proc2++;
}
/* Increment the value of num_proc2 in loop */
for (j = 0; j < 200; j++)
for (i = 0; i < 100000; i++) {
common_aln.num_proc2++;
}
}
The above program is profiled for the event L2_LINES_IN. Note the samples collected in func1 and in main:
Listing 2. OProfile data for L2_LINES_IN
- opcontrol --setup --ctr0-event=L2_LINES_IN
--ctr0-count=500 --vmlinux=/usr/src/linux-2.4.20/vmlinux
- opcontrol --start
- ./appln
- opcontrol --stop
- oprofpp -l ./appln
Cpu type: PIII Cpu speed was (MHz estimation) : 699.57 Counter 0 counted L2_LINES_IN events (number of allocated lines in L2) with a unit mask of 0x00 (No unit mask) count 500 vma samples % symbol name 080483d0 0 0 _start 080483f4 0 0 call_gmon_start 08048420 0 0 __do_global_dtors_aux 08048480 0 0 fini_dummy 08048490 0 0 frame_dummy 080484c0 0 0 init_dummy 08048630 0 0 __do_global_ctors_aux 08048660 0 0 init_dummy 08048670 0 0 _fini 080484d0 4107 49.2033 main 080485b8 4240 50.7967 func1
Now, profile the same application (executable) with the event CPU_CLK_UNHALTED, which basically collects samples on the number of cycles the CPU runs without halting. The number of samples collected in the routine is proportional to the the time spent by the processor in executing instructions. The more samples collected, the more time the processor has spent executing those instructions. Note the number of samples collected in main and func1:
Listing 3. OProfile data for CPU_CLK_UNHALTED
- oprofpp -l ./appln
Cpu type: PIII Cpu speed was (MHz estimation) : 699.667 Counter 0 counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) count 10000 vma samples % symbol name 080483d0 0 0 _start 080483f4 0 0 call_gmon_start 08048420 0 0 __do_global_dtors_aux 08048480 0 0 fini_dummy 08048490 0 0 frame_dummy 080484c0 0 0 init_dummy 08048640 0 0 __do_global_ctors_aux 08048670 0 0 init_dummy 08048680 0 0 _fini 080484d0 40317 49.9356 main 080485bc 40421 50.0644 func1
We will now optimize the shared data structure to improve performance by separating the two elements of the shared data structure into different cache lines. In an Intel IA32 P4 processor, the size of each L2 cache line is 32 bytes. By padding 28 bytes of the first element in the shared_data_align structure, the elements of the structure can be separated into two different cache lines. Now, the parent thread modifies num_proc1 of shared_data_align, resulting in the reading in of num_proc1 on one cache line of CPU number 1 during the first access. Future access to num_proc1 by the parent thread results in the reading in of the data from the cache line. The clone thread modifies num_proc2 of shared_data_align, which results in having num_proc2 on another cache line in CPU number 2. The two threads running in parallel modify the elements num_proc1 and num_proc2 respectively, which are on different cache lines. By separating the two elements of the structure into two different cache lines, modification of one cache line does not cause another cache line to be read in again from the memory. In this way the number of cache lines read in are reduced.
Listing 4. Optimized data structures
/*
* The padding is added to separate the two unsigned ints in such a
* way that the two elements num_proc1 and num_proc2 are on two
* different cache lines.
*/
struct shared_data_align {
unsigned int num_proc1; char padding[28]; unsigned int num_proc2;
};
/*
* This structure remains unchanged, so that some cache lines
* read in can be seen in profile data.
*/
struct shared_data_nonalign {
unsigned int num_proc1; unsigned int num_proc2;
};
Note that shared_data_nonalign has not been optimized.
Since you already have OProfile enabled and running (you do, don't you?), here are some profiles you can try running yourself: collect OProfile data for the event L2_LINES_IN and set the count to 500, as shown in Listing 2.
Also try collecting OProfile data for the event CPU_CLK_UNHALTED with the count set to 10000; compare the data collected with and without optimization and note the performance improvement.
Back to top
OProfile analysis: Branch misprediction
Modern processors practice branch prediction (see Resources), since the underlying algorithm and data have regularities. If the prediction is correct, the cost of a branch is lower. But branch prediction is not always correct, and some branches are hard to predict. You can approach this by improving branch prediction in your software, and you can also identify the issues by profiling the application and the kernel for the event BR_MISS_PRED_TAKEN_RET.
The code in Listing 5 shows branch misprediction. This example program creates a cloned thread sharing memory space with the parent process. The parent process running on one processor toggles the value of num_proc1 based on the value of num_proc2 (and branches based on the value of the variable modified by another process). The compiler simply assumes that the value of num_proc2 will be equal to 1 every time, and generates the code for that branch by default. If this prediction of num_proc2 equal to 1 is false, then branch misprediction occurs.
The cloned thread running on another processor toggles the value of num_proc2 based on the value of num_proc1 (and branches based on the value of the variable modified by another process). This causes num_proc2 to not always be equal to 1, and hence branch misprediction occurs in the parent thread. Similarly, toggling the value of num_proc1 by the parent thread causes branch misprediction in the clone thread.
Listing 5. Code showing branch misprediction
/*shared structure between the two processes */ struct share_both {
unsigned int num_proc1; unsigned int num_proc2;
};
/*
* The parent process clones a thread by sharing its memory space. The parent * process just toggles the value of num_proc1 in loop. */
/* Declare local shared data */ struct share_both common;
/* clones a thread with memory space same as parent thread*/
if ((pid = clone(func1, buff+8188, CLONE_VM, &common)) < 0) {
perror("clone");
exit(1);
}
/* toggles the value of num_proc1 in loop */
for (j = 0; j < 200; j++)
for(i = 0; i < 100000; i++) {
if (common.num_proc2)
common.num_proc1 = 0;
else
common.num_proc1 = 1;
}
/*
* The function below is called by the cloned thread, which just toggles the * value of num_proc2 every time in the loop. */
int func1(struct share_both *com) {
int i, j;
/* toggles the value of num_proc2 in loop */
for (j = 0; j < 200; j++)
for (i = 0; i < 100000; i++) {
if (com->num_proc1)
com->num_proc2 = 0;
else
com->num_proc2 = 1;
}
}
Branch misprediction can be shown by compiling the above code without optimization.
- gcc -o branch parent_thread_source_code clone_thread_source_code
Now, profile the branch application for the event BR_MISS_PRED_TAKEN_RET with the count set to 500, as shown in Listing 2. Note the samples collected in main and func1.
Also profile the same executable for the event CPU_CLK_UNHALTED and with the count set to 10000, as shown in Listing 2.
You can optimize branch misprediction by using the -O2 compiler option:
- gcc -O2 -c clone_thread_source_code
- gcc -o branch clone_thread_source_code.o parent_thread_source_code
Now, start profiling the application for the BR_MISS_PRED_TAKEN_RET event and CPU_CLK_UNHALTED as in Listing 2; note the performance improvement.
Let's look at another example of branch misprediction in the next section on kernel profiling.
Back to top
Examples of kernel profiling
The profile data shown below were collected for the event BR_MISS_PRED_TAKEN_RET by the kernbench benchmark for the 2.5.70 kernel. 23,360 samples were collected for vma_merge and 20,717 for do_mmap_pgoff.
Listing 6. Kernel profile data
- oprofpp -l -i /boot/vmlinux | tail -20
c0143510 4719 1.26446 page_add_rmap c0117740 4791 1.28375 schedule c0140320 4825 1.29286 find_vma_prepare c010f720 4862 1.30278 sys_mmap2 c0134fc0 5005 1.34109 __alloc_pages c0123670 5473 1.46649 run_timer_softirq c0134800 5648 1.51339 bad_range c0139250 6571 1.7607 mark_page_accessed c0143bd0 6919 1.85395 __pte_chain_free c013f180 6973 1.86842 do_no_page c0140ec0 7393 1.98096 get_unmapped_area c01400f0 8020 2.14896 vm_enough_memory c0140ff0 9897 2.65191 find_vma c01594e0 10939 2.93111 link_path_walk c0134e70 11467 3.07259 buffered_rmqueue c0117370 11690 3.13234 scheduler_tick c013eeb0 17463 4.67922 do_anonymous_page c01153e0 20322 5.44529 do_page_fault c01408e0 20717 5.55113 do_mmap_pgoff c0140600 23360 6.25933 vma_merge
Branch misprediction can be eliminated by removing branches. In the Intel IA32 processor, branches can be eliminated using SETcc instructions or using P6 processor conditional move CMOVcc or FCMOVcc instructions.
The following line of C code shows conditional branching:
(A > B) ? C1 : C2;
Here are the assembly instructions for the above C code:
Listing 7. Equivalent assembly instructions
cmp A, B ; compare jge L30 ; conditional branch mov ebx, CONST1 jmp L31 ; unconditional branch L30: mov ebx, CONST2 L31:
This code can be optimized to eliminate branches like this:
Listing 8. Equivalent assembly instructions, branches removed
xor ebx, ebx ;
cmp A, B setge b1 ; if ebx = 0 or 1 dec ebx and ebx, (CONST2-CONST1) add ebx, min(CONST2,CONST1) ; ebx = CONST1 or CONST2
The optimized code sets register EBX to zero, then compares A and B. If A is greater than or equal to B, EBX is set to 1. EBX is decremented and ANDed with the difference between the two constant values. This sets EBX to either zero or the difference of the two values. By adding the smaller of the two constant values, the correct value is written to EBX.
I hope you have gained some insight into OProfile and the ways you can optimize the kernel code. The 2.6 kernel release is coming up and there will be a lot to profile.
q-tools
The Q-toolset (normally q-syscollect) is a set of tools developed by David Mossberger for the IA64 version of Linux. It provides a robust set of profiling tools which do not require a source code modification of the program under investigation - and can even function with no blind spots inside the kernel *and* can function when interrupts are disabled. See the Gelato website (http://www.gelato.org) for more information.
strace
Strace is one of the most powerful tools available for troubleshooting. It allows you to see what an application is doing, to some degree.
`strace` displays all the system calls that an application is making, what arguments it passes to them, and what the return code is. A system call is generally something that requires the kernel to do something. This generally means I/O of all sorts, process management, shared memory and IPC useage, memory allocation, and network useage.
examples
The simplest example of using strace is as follows:
strace ls -al
This starts the strace process, which then starts `ls -al` and shows every system call. For `ls -al` this is mostly I/O related calls. You can see it calling stat() on files, opening config files, opening the libs it is linked against, allocating memory, and calling write() to output the contents to the screen.
What files are trying to be opened
A common troubleshooting technique is to see what files an app is reading. You might want to make sure it's reading the proper config file, or looking at the correct cache, etc. `strace` by default shows all file I/O operations.
But to make it a bit easier, you can filter strace output. To see just file open()'s
strace -eopen ls -al
This is a wonderful way to discover any configuration files that might be queried, as well as determining the order of the PATH settings.
What is this thing doing to the network?
To see all network related system calls (name resolution, opening sockets, writing/reading to sockets, etc)
strace -e trace=network curl --head http://www.redhat.com
Rudimentary profiling
One thing that strace can be used for that is useful for debugging performance problems is some simple profiling.
strace -c ls -la
Invoking strace with '-c' will cause a cumulative report of system call usage to be printed. This includes approximate amount of time spent in each call, and how many times a system call is made.
This can sometimes help pinpoint performance issues, especially if an app is doing something like repeatedly opening/closing the same files.
strace -tt ls -al
the -tt option causes strace to print out the time each call finished, in microseconds.
strace -r ls -al
the -r option causes strace to print out the time since the last system call. This can be used to spot where a process is spending large amounts of time in user space or especially slow syscalls.
Following forks and attaching to running processes
Often it is difficult or impossible to run a command under
strace (an apache httpd for instance). In this case, it's
possible to attach to an already running process.
strace -p 12345
where 12345 is the PID of the process. This is very handy for trying to determine why a process has stalled. Many times a process might be blocking while waiting for I/O. with strace -p, this is easy to detect.
Lot's of processes start other processes. It is often desireable to see a strace of all the processes.
strace -f /etc/init.d/httpd start
will strace not just the bash process that runs the script, but any helper utilities executed by the script, and httpd itself.
Since strace output is often a handy way to help a developer solve a problem, it's useful to be able to write it to a file. The easiest way to do this is with the -o option.
strace -o /tmp/strace.out program
Being somewhat familar with the common syscalls for Linux is helpful in understanding strace output. But most of the common ones are simple enough to be able to figure out on context.
A line in strace output is essentially the system call name, the arguments to the call in parentheses (sometimes truncated...), and then the return status. A return status for error is typically -1, but varies sometimes. For more information about the return status of a typical system call invoke `man 2 syscallname`. Usually the return status will be documented in the "RETURN STATUS" section.
Another thing to note about strace is it often shows "errno" status. If you're not familar with UNIX system programming, errno is a global variable that gets set to specific values when some commands execute. This variable gets set to different values based on the error mode of the command. More info on this can be found in `man errno`. But typically, strace will show the brief description for any errno values it gets, e.g.
open("/foo/bar", O_RDONLY) = -1 ENOENT (No such file or directory)
strace -s X
the -s option tells strace to show the first X digits of strings. The default is 32 characters, which sometimes is not enough. This will increase the info available to the user.
More info
Overview of linux system calls (http://www.quepublishing.com/articles/article.asp?p=23618&rl=1)
PDF version of Advanced Linux Programming (http://www.advancedlinuxprogramming.com/alp-folder)
ltrace
ltrace is very similar to strace, except ltrace focuses on tracing library calls.
For apps that use a lot of libs, this can be a very powerful debugging tool. However, because most modern apps use libraries very heavily, the output from ltrace can sometimes be painfully verbose.
There is a distinction between what makes a system call and a call to a library function. Sometimes the line between the two is blurry, but the basic difference is that system calls are communicating to the kernel, and library calls are just running more userland code. System calls are usually required for things like I/O, process control, memory management issues, and other kernel things.
Library calls are by bulk, generally calls to the standard C library (glibc..), but can of course be calls to any library, for example, Gtk, libjpeg, libnss, etc. Luckily most glibc functions are well documented and have either man or info pages. Documentation for other libraries varies greatly.
ltrace supports the -r, -tt, -p, and -c options the same as strace. In addition it supports the -S option which tells it to print out system calls as well as library calls.
One of the more useful options is "-n 2" which will indent 2 spaces for each nested call. This can make it much easier to read.
Another useful option is the "-l" option, which allows you to specify a specific library to trace, potentionaly cutting down on the rather verbose output.
gdb
`gdb` is the GNU debugger. A debugger is typically used by developers to debug applications in development. It allows for a very detailed examination of exactly what a program is doing.
That said, gdb isn't as useful as strace/ltrace for troubleshooting/sysadmin types of issues, but occasionally it comes in handy.
For troubleshooting, it's useful for determining what application created a core file. (`file core` will also typically show you this information too). But gdb can also show you "where" the file crashed. Once you determine the name of the app that caused the failure, you can start gdb with:
gdb filename corefile
then at the prompt type
where
The unfortunate thing is that all the binaries are typically stripped of debugging symbols to make them smaller, so this often returns less than useful information. However, starting in Red Hat Enterprise Linux 3, and included in Fedora, there are "debuginfo" packages. These packages include all the debugging symbols. You can install them the same as any other rpm, so `rpm`, `up2date`, and `yum` all work.
The only difficult part about debuginfo rpms is figuring out which ones you need. Generally, you want the debuginfo package for the src rpm of the package thats crashing.
rpm -qif /path/to/app
Will tell you the info for the binary package the app is part of. Part of that info include the src.rpm. Just use the package name of the src rpm plus "-debuginfo"
FIXME: insert info about debug packages for other systems
top
`top` is a simple text based system monitoring tool. It packs a lot of information unto the screen, which can be helpful troubleshooting problems, particularly performance related problems.
The top of the "top" output includes a basic summary of the system. The top line is current time, uptime since the last reboot, users logged in, and the load average. The load average values here are the load for the last 1, 5, and 15 minutes. A load of 1.0 is considered 100% utilization, so loads over 1 typically means stuff is having to wait. There is a lot of leeway and approxiation in these load values, however.
The memory line shows the total physical ram available on the system, how much of it is used, how much is free, and how much is shared, along with the amount of ram in buffers. These buffers are typically file system caching, but can be other things. On a system with a significant uptime, expect the buffer value to take up all free physical ram not in use by a process. The swap line is similar.
Each of the entries viewable in the system contain several fields by default. The most interesting are RES, %CPU, and time. RES shows the amount of physical ram the process is consuming. %CPU shows the percentage of the available processor time a process is taking, and time shows the total amount of processor time the process has had. A processor intensive program can easily have more "time" in just a few seconds than a long running low cpu process.
Sorting the output
- M : sorts the output by memory usage. Pretty handy for figuring out which version of openoffice.org to kill.
- P : sorts the process by the percentage of cpu time they are using.
- T : sorts by cumulative cpu time used
- A : sorts by age of the process, newest process first
Command line options
The only really useful command line options are:
- b [batch mode] writes the standard top output to stdout. Useful for a quick "system monitoring hack".
e.g.:
top d 360 b >> foo.output
to get a snapshot of the system appended to foo.output every six minutes.
ps
`ps` can be thought of as a one shot `top`. But it's a bit more flexible in its output than top.
As far as `ps` commandline options go, it can get pretty hairy. The Linux version of `ps` inherits ideas from both the BSD version, and the SYSV version. So be warned.
The `ps` man page does a pretty good job of explaining this, so look there for more examples.
One thing to be aware of is that ps behaves differently depending on if a - is prepended to the options:
ps ef
and
ps -ef
are two very different things (either BSD or System V formatting).
examples
ps aux
shows all the processes on the system in a "user" oriented format. In this case meaning the username of the owner of the process is shown in the first column.
ps auxww
the "w" option, when used twice, allows the output to be of unlimited width. For apps started with lots of commandline options, this will allow you to see all the options.
ps auxf
the 'f" option, for "forest" tries to present the list of processes in a tree format. This is a quick and easy way to see which processes are child processes of what.
ps -eo pid,%cpu,vsz,args,wchan
This is an interesting example of the -eo option. This allows you to customize the output of `ps`. In this case, the interesting bit is the "wchan" option, which attempts to show what syscall the process is in which `ps` checks.
For things like apache httpds, this can be useful to get an idea of what all the processes are doing at one time. See the info in the strace section on understanding system call info for more info.
sysstat/sar
Sysstat works with two steps, a daemon process that collects information, and a "monitoring" tool.
The start script is typically called "sysstat", and the monitoring tool is called `sar`, which will normally perform its monitoring via the `sadc` command.
To start it, start the systat daemon:
sysstat start
To see a list of `sar` options, just try `sar --help`
examples
Things to note. There are lots of commandline options. The last one is always the "count", meaning the time between updates.
sar 3
Will run the default sar invocation every three seconds.
For a complete summary, try:
sar -A
This generates a very large pile of info ;->
To get a good idea of disk i/o activity:
sar -b 3
For something like a heavily used web server, you may want to get a good idea how many processes are being created per second:
sar -c 2
Kind of surprising to see how many processes can be created.
There's also some degree of hardware monitoring built in. Monitoring how many times an IRQ is triggered can also provide good hints at what's causing system performance problems.
Show the total number of system interrupts
sar -I SUM 3
Watch the standard IDE controller IRQ every two seconds.
sar -I 14 2
Network monitoring is in here too: Show # of packets sent/receiced. # of bytes transfered, etc
sar -n DEV 2
Show stats on network errors.
sar -n EDEV 2
Memory usege can be monitored with something like:
sar -r 2
This is similar to the output from `free`, except more easily parsed.
For SMP machines, you can monitor per CPU stats with:
sar -U 0
(If your version of sar doesn't support the -U flag, try -P or -u)
where 0 is the first processor. The keyword ALL will show all of them.
A really useful one on web servers and other configurations that use lots and lots of open files is:
sar -v
This will show the number of used file handles, %of available filehandles, and same for inodes.
To show the number of context switches ( a good indication of how much time a process is wasting..)
sar -w 2
vmstat
This util is part of the procps package and can provide lots of useful information when diagnosing performance problems.
Here's a sample vmstat output on a lightly used desktop:
procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 5416 2200 1856 34612 0 1 2 1 140 194 2 1 97
And here's some sample output on a heavily used server:
procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 16 0 0 2360 264400 96672 9400 0 0 0 1 53 24 3 1 96 24 0 0 2360 257284 96672 9400 0 0 0 6 3063 17713 64 36 0 15 0 0 2360 250024 96672 9400 0 0 0 3 3039 16811 66 34 0
The interesting numbers here are the first ones. This is the number of processes that are in the run queue. This value shows how many processes are ready to be executed, but can not be run at the moment because other processes need to finish. For lightly loaded systems, this is almost never above 1-3, and numbers consistently higher than 10 indicate the machine is getting pounded.
Other interesting values include the "system" numbers for in and cs. The in value is the number of interupts per second a system is getting. A system doing a lot of network or disk I/O will have high values here, as interupts are generated every time something is read or written to the disk or network.
The cs value is the number of context switches per second. A context switch is when the kernel has to take the executable code for a program out of memory, and switch in another. It's actually _way_ more complicated than that, but that's the basic idea. Lots of context swithes are bad, since it takes some fairly large number of cycles to perform a context switch, so if you are doing lots of them, you are spending all your time changing jobs and not actually doing any work. I think we can all understand that concept.
A note on Linux memory management
This is one area where the saying "Linux is not Unix" is accurate.
Linux does not manage memory like a traditional Unix (like HPUX). Linux memory management uses a free on demand system, whereby memory isn't actually free unless there is a demand for the pages. The kernel will use all available (dirty) memory for buffer cache until and unless there is memory pressure.
So, if you are coming from a closed Unix to Linux, don't freak out when you see only 10 meg of that 4GB free - it's being used for file system buffer cache.
tcpdump/ethereal
Ethereal will display all the connections it traced during the capture. There are a couple ways to look for bandwidth hogs.
The "Statistics" menu has a couple of useful options. The "Protocol Hierarchy" shows what % of packets in the trace is from each type of protocol. In the case of a bandwith hog, at least what protocol is the culprit should be easy to spot here.
The "Conversations" screen is also helpful for looking for bandwidth hogs. Since you can sort the "conversations" by number of packets, the culprit is likely to hop to the top. This isn't always the case, as it could easily be many small connections killing the bandwidth, not one big heavy connection.
As far as tcpdump goes, the best way to spot bandwidth hogs is just to start it up. Since it pretty much dumps all traffic to the screen in a text format, just keep your eyes peel for what seems to be coming up a lot.
tcpdump can also be used to see if a given service may be unresponsive because your packets are simply not reaching the remote machine. Since tcpdump is a commandline tool, you'll very probably need to add filters - especially when you're firing tcpdump up on a remote machine, where you're logged in via SSH. Otherwise you'll get lots of packet dumps of SSH packets that are telling you of packets dumped that belong to ssh telling you of packets dumped...
tcpdump -l -i eth0 port 25
This will dump all packets aimed at, or originating from, a TCP or UDP port 25. The '-l' is to do line buffering, so we'll actually see each packet as it crosses the wire.
If you're debugging network connections over an SSH connection, the following will probably be the most frequent way that you'll invoke tcpdump:
tcpdump -l not port 22
And to monitor the communication between Server A.local.net (running tcpdump) and the remote server B.remote.net:
tcpdump -l src or dst B.remote.net
The tcpdump filter syntax is actually surprisingly powerful - take 5 minutes and grab your nearest manpage on tcpdump if you need a better filter.
netstat
Netstat is an app for getting general information about the status of network connections to the machine.
netstat
will just show all the current open sockets on the machine. This will include UNIX domain sockets, TCP sockets, UDP sockets, etc.
One of the more useful options is:
netstat -pa
The `-p` option tells it to try to determine what program has the socket open, which is often very useful info. For example, someone nmap's their system and wants to know what is using port 666 for example. Running netstat -pa will show you its satand running on that tcp port.
One of the most twisted, but useful invocations is:
netstat -t -n | cut -c 68- | sort | uniq -c | sort -n
This will show you a sorted list of how many sockets are in each connection state. For example:
9 LISTEN 21 ESTABLISHED
- what process is doing what and to whom over the network
- number of sockets open
- socket status
A quick and dirty way to see what daemons are running and accepting connections on your machine is
netstat -tlpn
for TCP services and
netstat -ulpn
for UDP services. Unix domain sockets are usually more abundant than either of these two and a lot less interesting.
If you're having trouble with network throughput for some reason, try
netstat -s
This will print out a summary of the network stack state counters, going into way more detail than the RX/TX frames dropped counter of ifconfig. By looking at what counters are rapidly increasing, you may be able to find out why your network throughput is misbehaving.
lsof
/usr/sbin/lsof is a utility that checks to see what all open files are on the system. There's a ton of options, almost none of which you ever need.
This is mostly useful for seeing what processes have what file open. Useful in cases where you need to unmount a partition or perhaps you have deleted some file, but its space wasn't reclaimed and you want to know why.
The EXAMPLES section of the lsof man page includes many useful examples. One of the more common usages is to see which services are accepting network connections over TCP:
lsof -i tcp
fuser
Displays PIDs of processes that are using some filesystem object. Kind of like the small brother of lsof.
The most frequent use will be the '-m' option when you're trying to unmount a filesystem and you get an error message telling you that the specified device is busy:
turing:/home/sr# umount /usr umount: /usr: device is busy
turing:/home/sr# fuser -m /usr
/usr: 2522e 2604e 2646e 2652e 2662e 2761e 2764e 2775e 2798e 2804e 2843e 2846e 2849e
2988m 3018m 3740e 3741e 3759m 3772m 3773e 3776e 3779e 3782e 3785e 3789e 3791e
3793e 3828e 3832e 3833m 3869e 3893e 3907e 3908m 3915e 3999e 4124m 4125m 4127m
This list are all the PIDs that are working within the '/usr' mountpoint and keeping you from unmounting the filesystem. Check who's what with 'ps ax | grep [PID]' and kill them gently.
ldd
ldd prints out shared library dependencies.
For apps that are reporting missing libraries, this is a handy utility. It shows all the libraries a given app or library is linked to.
For most cases, what you will be looking for is missing libs. In the ldd output, they will show something like:
libpng.so.3 => (file not found)
In this case, you need to figure out why libpng.so.3 isn't being found. It might not be in the standard lib paths or perhaps not in a path in /etc/ld.so.conf. Or you need to run `ldconfig` again to update the ld cache.
ldd can also be useful when tracking down cases where an app is finding a library, but it's finding the wrong library. This can happen if there are two libraries with the same name installed on a system in different paths.
Since the `ldd` output includes the full path to the lib, you can see if anything is pointing at a wrong path. One thing to look for when scanning for this, is one lib that's in a different lib path than the rest. If an app uses libs from /usr/lib, except for one from /usr/local/lib, there's a good chance that's your culprit.
If you are missing a library, be sure to edit your ld config file (typically /etc/ld.config) and re-run ldconfig.
nm
`nm` is a utility that shows all the library symbols an application expects to find. It can be used in combination with `ldd` and `ldconfig` to try to track down library linking problems.
A common case would be a binary that is compiled against a newer version of a library that has symbols in it that the version of the library the app is dynamlically linking against does not.
file
`file` is a simple utility that tries to figure out what kind of file a given file is. It does this by magic(5).
Where this sometimes comes in handy for troubleshooting is looking for rogue files. A .jpg file that is actually a .html file. A tar.gz that's not actually compressed. Cases like those can sometimes cause apps to behave very strangely.
netcat / nc
Ah, netcat. That wonderful utility which functions just like the normal cat command, but accepts a given interface:port for stdin or stdout.
One common usage is to clone a system over a network. Using only a set of commands similar to "dd | netcat", you can clone a system disk at the bit level. Here's what you do (actual commands to follow, but for now...)
On the slave system, boot from a CD (like Knoppix) and issue a command such as
netcat -l -p 5678 | dd of=/dev/sda
Then, on the master, start sending a bit image over the network
dd if=/dev/sda | netcat <slave_IP> 5678
CHECK THESE COMMANDS - MAY NOT BE 100% ACCURATE
md5sum
`md5sum` is a utilty that calculates a checksum of a file. For troubleshooting purposes, you can assume every unique file will have a unique checksum. md5sum is not 100% secure - it is subject to hash collisions - so for added security, please use `sha1sum` in addition to md5sum as a collision between both sets of results is currently considered to be impossible. The `sha1sum` command functions exactly as the `md5sum` command in these examples.
verifying files
Since an MD5 sum will change if any part of a file changes, it can also be used to verify that a file has not changed. Systems like `tripwire` use this to detect if a file has been compromised in a security breach.
This can be used to see if a file has been modified or corrupted if you know what the MD5 sum is supposed to be.
You can also use it to see if two files are exactly the same or not. A common case is to check to see if a config file has been modified or if it's different from what's in a config management system.
verifying ISOs
Linux distributions are often distributed as CD images or ISOs. An MD5 sum of these images is always provided to verify the integrity of the downloaded ISOs. A few bits missing here and there is enough to make an install a painful experience.
Check the location the ISOs were downloaded to for a text file containing the MD5 sums of the ISOs. It will typically look something like:
2af10158545bc24477381e80412ff209 bar.iso 9761d6ce118a1230bc48b0a59f7b5639 foo.iso
You can run `md5sum` directly on the ISOs:
bash# md5um bar.iso
2af10158545bc24477381e80412ff209 bar.iso
Or you can often use the md5sums text file as input to `md5sum` to tell it what to check and to verify. If the above example was in a file called "iso.md5s":
md5sum -c iso.md5s
That command will check both ISOs and check the computed checksum against what the file lists as correct.
md5sum is also a good way to verify a burned CD. Something like:
find /mnt/cdrom -name "*" -exec md5sum {} \;
will run a md5sum on all the files on the CD mounted at /mnt/cdrom. Since md5sum checks every bit (literally..) of a file, if the CD is bad, there's a good chance this will find it. If the above command causes any errors about the media, chances are the CD is bad. Better to find it now than later.
For recent Red Hat and Fedora based distros, the installer includes an option to perform a mediacheck. This
is essentially the same as verifying the ISO MD5 sum by hand. If you have already done that, you can skip the
media check.
diff
diff compares two files and shows the difference between them.
For troubleshooting, this is most often used on config files. If one version of a config file works, but another does not, a `diff` of the two files can often be enlightening. Since it can be very easy to miss a small difference in a file, being able to see just the differences is useful.
For debugging during development, diff (especially the versions built into revision control systems like cvs) is invaluable. Seeing exactly what changed between two versions is a great help.
For example, if foo-2.2 is acting weird, where foo-2.1 worked fine, it's not uncommon to `diff` the source code between the two versions to see if anything related to your problem changed.
find
For troubleshooting a system that seems to have suddenly stopped working, find has a few tricks up its sleeve.
When a system stops working suddenly, the first question to ask is "what changed?".
find / -mtime -1
That command will recursively list all the files from / that have changed in the last day.
To list all the files in /usr/lib that
changed in the last 30 minutes.
find /usr/lib -mmin -30
Similar options exist for ctime and atime.
To show all the files in /tmp that have been accessed in the last 30 minutes.
find /tmp -amin -30
The -atime/-amin options are useful when trying to determine if an app is actually reading the files it is supposed. If you run the app, then run that command where the files are, and nothing has been accessed, something is wrong.
If no "+" or "-" is given for the time value, find will match only exactly that time. This is handy in several cases. You can determine what files were modified/created at the same time.
A good example of this is cleaning up from a tar package that was unpacked into the wrong directory. Since all the files will have the same access time, you can use find and -exec to delete them all.
`find` can also find files with particular permisions set. To find all world writable files / down:
find / -perm -0777
To find all files in /tmp owned by "alikins":
find /tmp -user alikins
Using find in combo with grep to find markers (errors, filename, etc)
When troubleshooting, there are plenty of cases where you want to find all instances of a filename, or a hostname, etc.
To recursively grep a large number of files, you can use find and its exec options. This will grep for "foo" on all files down from the current working directory:
find . -exec grep foo {} \;
Note that in many cases, you can also use `grep -r` to do this as well. Another common usage is with xargs as such
find / -print | xargs grep "look for this"
ls/stat
while `ls` is one of the first commands linux users learn, do not overlook it's utility in troubleshooting. It's the easiest way to see whats on the file system.
finding sym links and hard links
A simple `ls -al` will show the contents of a directory. But it will also indicate what files are symlinks.
Normally, having a file being a symlink is fine, but some apps, especially security sensitive apps, are picky about what can and can not be a symlink.
The other thing to look for is dangling or broken symlinks. Some apps don't expect to get handed a symlink that doesn't go anywhere.
file system usage
Some simple `ls` invocations useful for troubleshooting.
Show a detailed view of all files, sorted by the last modified time. Quick, easy way to see if an app is modifying files:
`ls -lart`
Show a detailed view of all files in the current directory, sorted by file size. Quick, easy way to see what files are consuming all of your precious disk space.
`ls -larS`
Show some basic info about what type of file each file is. Maybe that directory the app is looking for is a file or vice versa?
`ls -F`
df
Running out of disk space causes so many apps to fail in weird and bizarre ways. A quick `df -h` is a pretty good troubleshooting starting point.
Using it is easy; look for any volume that is 100% full. Or in the case of apps that might be writing lots of data at once, reasonably close to being filled.
It's pretty common to spend more time that anyone would like to admit debugging a problem to suddenly here someone yell "Damnit! It's out of disk space!".
A quick check avoids that problem.
In addition to running out of space, it's possible to run out of file system inodes. A `df -h` will not show this, but a `df -i` will show the number of inodes available on each filesystem.
Being out of inodes can cause even more obscure failures than being out of space, so something to keep in mind.
watch
`watch` is a command that executes another command, displays its output, then repeats. This can be more used to repeatedly watch a reporting process. There is also a "-d" option that will highlight any output that changes between each invocation of the command.
For an example, to watch diskspace useage:
watch -d df
Another example, is to simply watch a `ls -al` output, to look for any tmp files that get created:
watch -d "ls -al"
Note that the above example only runs `ls -al` every two seconds, so will not catch all file creations.
"watch" is often used in combo with commands like "ls", "df", "netstat", "ps".
ipcs/iprm
- anything that uses shm/ipc
- oracle/apache/etc
A lot of apps make fairly extensive use of SysV shm and IPC (oracle, apache, gimp, etc). Most of the time, on current Linux systems, this works pretty well. But it's occasionally useful to be able to take a look at what shm is being used and how it's being used. `ipcs` is the tool for that.
One common usage is to check for Oracle's usage of "shared memory glue" (typically noticed as shm_glue), which is the method they use for large SGA creation when they cannot obtain a single shm segment large enough for their needs. A good rule of thumb is that if you see Oracle with a large number of maximum sized shared memory segments, then you have a problem and need to tune your shm sizes and restart Oracle. shm_glue is a performance killer.
Typically, you will use ipcs -ma on Linux to see both shared memory, semaphores, and message queues. Here's a lightly loaded system example.
# ipcs -ma ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 8093696 root 600 393216 2 dest 0x00000000 8126465 root 600 393216 2 dest 0x00000000 19759106 root 666 262080 1 dest 0x00000000 19529731 root 600 393216 2 dest 0x00000000 19562500 root 600 393216 2 dest ------ Semaphore Arrays -------- key semid owner perms nsems ------ Message Queues -------- key msqid owner perms used-bytes messages
Searching the web for error messages
A pretty common and often very effective approach to tracking down the cause of errors or problems is searching the web. Using search engines like Google or Yahoo can find documentation, FAQ's, web forum posts, mailing list archives, Usenet posts, and other useful resources.
Start searching by quoting the entire error message exactly and searching for it. Be sure to put the message in ""'s. If it's a common problem, there's a good chance you will get some hits. Anything that looks like a FAQ is a good start; mailing list archives can also been a good source. Just be sure to check the archive indexes for other messages in the discussion.
If you are using a commercial distribution, you could also consider looking up their knowledgebase. Both Red Hat and Suse have useful documents for assisting in troubleshooting in their knowledgebase.
source code
For most Linux distros, you have the source code, so it can often be useful to search through the code for error messages, filenames, or other markers related to the problem. In many cases, you don't really need to be able to understand the programming language to get some useful info.
Kernel drivers are a great example for this, since they often include very detailed info about which hardware is supported, what's likely to break, etc.
On RPM based systems, to install the source code, you want to install the source RPM. To see which source RPM corresponds to a given file or utility, use the command:
rpm -qi /path/to/file
there will be a Source field with the name of the source RPM. If you have the source CD, you can install it from there.
Altervatively, you can use up2date or other package tools to get the source RPM.
up2date --get-source packagename
will download the source RPM to /var/spool/up2date.
To install a source RPM, just issue the command:
rpm -Uvh /path/to/package.src.rpm
The source will get installed in /usr/src/redhat/SOURCES, with a spec file in /usr/src/redhat/SPECS, on Red Hat linux systems. Other distros will be similar.
The easiest way to extract the source is:
rpmbuild -bp /usr/src/redhat/SPECS/package.spec
where package.spec is the spec file for the src package installed.
`find` and `grep` are good tools for searching for the markers of interest.
strings
`strings` is a utility that will search through a file and try to find text strings. For troubleshooting sometimes it is handy to be able to look for strings in an executable.
For example, you can run `strings` on a binary to see if it has any hard coded paths to helper utilities. If those utils are in the wrong place, that app may fail.
Searching for error messages can help as well, especially in cases where you are not sure what binary is reporting an error message.
It some ways, it's a bit like grep'ing through source code for error messages, but a bit easier. Unfortunately, it also provides far less info.
syslog/log levels
Syslog is a daemon that mutated out of a sendmail debugging aid into a logfile-catchall for unix. A lot of applications send their log output to syslog, but they have to send it to syslog, otherwise syslog won't know about the stuff that is to be logged. To keep logs apart, during the evolution of syslog, facilities (nothing more than "categories" in syslog-speak) and severeties got introduced. The actual filtering of what gets output where can be defined in syslogs /etc/syslog.conf(5) file.
Getting stuff into Syslog
Syslog generally can receive messages in three ways: - Through the syslog() function most languages provide (after an appropriate call to openlog()) - Through named sockets such as /dev/log which is enabled by default on most distributions - Via UDP on port 514, if syslogd is running with the -r option (this can be a security hole since there is no authentication or authorization implemented in the standard syslog protocol! Caveat emptor!)
Defining Filters in /etc/syslog.conf
The basic syntax of this file is easy, but it contains some subtleties that can lead you into a long, slow suffering (when using synchronous writes on logfiles, more about that below).
- Empty lines and everything behind a hash mark (#) is ignored
- Rules are of the format
<What> <Goes Where>
What
Your basic "what" is a specification of a facility and a severity delimited by a period:
<facility>.<severity>
This will catch all messages belonging to the given facility that have the given severity and higher.
If you only want to catch messages belonging to exactly the given severity, prefix the priority with an equation sign (=):
<facility>.=<severity>
You can also negate the severity selection by prepending an exclamation sign (!):
<facility>.!<severity>
This will select all messages belonging to the given facility and that have a severity lower than the one specified. Note that this also weeds out messages belonging to the given severity - which is logical, since the opposite of >= is <.
Of course this can make things tedious if you have to list all combinations of the 20 facilities and 9 severities by hand. So there are shortcuts, such as specifying an asterisk (*) as a catchall:
<facility>.* -> All messages belonging to <facility> *.<severity> -> All messages of the given <severity> *.* -> All messages
And then, you can specify lists of "whats", where the "whats" are delimited by semicola (;):
<facility>.<severity>;<facility>.<severity>
Or, if you want to process the same severities of different facilities, list the facilities using commas (,) first:
<facility>,<facility>.<severity>
To make matters interesting, there is also a special severity called "none", which implies that no message of the given facility are to be logged with this rule:
*.*;<facility>.none -> Log all messages except those of the given facility
Goes Where
After the "What" part with all it's twists and turns, the "Where" is actually pretty simple:
</path/to/logfile>
will log everything to the given logfile.
Asynchronously
This logging is done with synchronous writes, which means that after each log entry, syslog waits for the operating system kernel to acknowledge that the data has indeed been written to the disk before writing its next entry. This can slow down your system 10-fold for services with extensive logging (especially mail servers!). This factor has been verified in the wild, so only if you can afford to write logs asynchronously, do so.
To indicate to syslog that you want log entries to be written asynchronously, prepend a minus (-) to the logfile:
-</path/to/logfile>
This is basically what is needed in 99% of everyday life.
Note that you can specify the same "What" multiple times pointing to different "wheres" for each. The messages will then be logged to all "wheres" given.
Goes Where Again?
Ok, the "Where" part isn't actually all that simple. You have a couple of other choices: - Remote machines:
@<hostname>
- Named Pipes:
|<path to fifo>
- Terminals by giving their device files as logfiles - Specific users (if they're logged on) using write:
<user>,<user>
- All users logged on:
*
But again, these are things you don't need that often, and if you do, you'd better read up on them in the manpage first!
RPM
RPM is the RPM Package Manager. It's a package tool widely used on many Linux distributions, including Red Hat Enterprise Linux, Fedora, Novell, and Mandriva.
It's commonly used to install, update, and remove software and to keep track of software dependencies. The RPM database also includes a lot of information about the software currently installed, and can often be a useful resource for troubleshooting.
using rpm to verify package contents
`rpm` includes support for verifying a file's contents, size, permissions, mtime, user and group ownership, and selinux context.
If you are having problems with "gaim", you might want to verify if all of the files are correct:
rpm -V gaim
That command will check the ondisk files against the expected values in the RPM database. If a file has been modified, it will show up. See the `rpm` man page for info on decoding the string of chars at the left of the output. But, if the file shows up at all, `rpm` thinks something has changed about the file, which is often enough to know, without decoding the info.
Also useful is verifying all packages. Sometimes you just don't know what's changed and want an overview of files that have been edited or modified from the original:
rpm -Va
That will take a while on most system, but it will print out a list of all files `rpm` thinks have been modified. Note that on most systems, there will be some files that show up and are perfectly acceptable.
using rpm to find config files
A good place to start looking when some software is having trouble is the config files. To see a list of the config files for package "up2date":
rpm -q --configfiles up2date
using rpm to see what was installed recently
One of the bits of information `rpm` keeps track of is when a package was installed. Since most software problems originate when software is updated or installed, this is useful information.
To get a list of all RPM packages install, in order, with the installation date:
rpm -qa --last
The list is sorted so that the newest packages are at the top of the list. If you are troubleshooting a problem that recently appeared, that's a good place to start looking for clues.
resetting file permissions and user/group info
If you think a file from a package has had its perms or ownership changed, an easy way to resolve this is:
rpm --setperms packagename
ksymoops
To quote from the ksymoops web page, "The Linux kernel produces error messages that contain machine specific numbers which are meaningless for debugging. 'ksymoops' reads machine specific files and the error log and does its best to converts the code to instructions and map addresses to kernel symbols. "
See the man page for more info.
Kernel core dumps (netdump, diskdump and crash)
Netdump and diskdump are utilities for logging kernel crashes. `netdump` sends the core image of the kernel (vmcore) across the network to a netdump server, while `diskdump` writes it to disk. The image can be examined with the `crash` utility.
Netdump and diskdump create a vmcore. A vmcore is a representation of what was in the system's memory when the crash occured. The `crash` utility is a modified version of gdb, which automates the basic steps required to analyse a vmcore.
At the time of writing, `netdump` does not work on Itanium or Itanium II architecture systems.
Netdump
Netdump requires another machine to capture the crash from the crashing kernel. The machine that is crashing is considered the netdump client, the machine that is going to host the core is considered the netdump server. One netdump server can capture crashes from multiple clients.
Server Side Configuration
The netdump server does not have to use any specific network card. It must be on the same subnet and the netdump client must be able to have a clear path (No Network Address Translation or packet modification) between the server and the client.
Start the service with the command
service netdump-server start
The server saves the vmcore file in /var/crash. Ensure that there is enough space for the server to send the file. There is a formula that can be applied to find the amount of space necessary.
(RAM on client + SWAP on client * 1.1)
Also note that there is a RAM limit; only the first 4GB of RAM is dumped, so you can feel safe in allocating 5GB per concurrent client dump on your server. For example, if you wanted to have 4 clients dumping at the same time, allocate 20GB of storage for the core files.
The next step is to set the password for the netdump user. Do so with the command
passwd netdump-user
Be sure to set a strong password for this user.
Client Side Configuration
Currently, only a limited set of hardware is able to send a core to a netdump server. The chosen LAN card for sending the crashdump should support one of the following drivers: 3c59x, e100, e1000, eepro100, pcnet32, tg3, tlan, and tulip.
The next step is to modify /etc/sysconfig/netdump and add the following line:
NETDUMPADDR=10.0.0.222
The address 10.0.0.222 should be the IP address of the machine configured as the netdump server.
Notice, you can also set up the netdump server as a syslog server for messages generated by the client during the crash. Don't worry - the messages will only be logged during a crash and not during the client's normal operation. This is a handy thing to know, since interrupts are disabled on the client during a netdump.
Netdump client will now need to connect to the netdump server and create a set of public/private ssh keys. Enter the command:
edit this...not sure what this garbage is, but its not the command NaodW29-pre9ca058bbf031c300000004
You should be prompted for a password. Enter the password of the netdump user on the netdump server.
The next step is to start the netdump service. Run the command
service netdump start
And then you need to test crash your machine. The example given in the Netdump How-To assumes you are using an old 2.4 kernel. For a new 2.6 kernel `crash` module, see this site (http://blog.dkpdev.com) or just copy and paste the below code into a pair of files.
-> Note that $PWD is used in the Makefile, so it would be wise to put these two files in a directory named panic/ <-
Makefile:
obj-m += panic.o
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
panic.c
/*
* Panic kernel module
*/
#include <linux/module.h>
#include <linux/kernel.h>
#define DRIVER_AUTHOR "Thundarr <thundarr@gmail.com>"
#define DRIVER_DESC "A panic module to test NetDump on 2.6 kernels"
MODULE_LICENSE("GPL");
MODULE_AUTHOR(DRIVER_AUTHOR);
MODULE_DESCRIPTION(DRIVER_DESC);
int init_module(void)
{
printk(KERN_INFO "Panic module inserted to force a crash.\n");
panic("Panic module inserted to force a crash.\n");
return 0;
}
void cleanup_module(void)
{
printk(KERN_INFO "How did we get here? Failed to panic?\n");
}
You can now just insmod panic.ko and watch your box die a painful death. Be sure to stop activity and sync your disks before inserting this module.
The crash should end up in /var/crash/ on the netdump server.
Diskdump
- yep, im working on this
Supported cards
Cross platform:
* aic7xxx * aic79xx * megaraid2 * mpt fusion * sata_promise * sym53c8xx
i386, AMD64, EM64T.
ata_piix
i386 only dpt_i2o
Additionally, ata_piix is supported on the i386, AMD64 and IntelĀ® EM64T architectures. dtp_i2o is supported only on i386.
How do you turn on diskdump?
What DEVICE can you use (in /etc/sysconfig/diskdump)? Can you use a device already in use (/var or swap) or must this be a unused partition?
From what I can see you need an unused partition - you need to format the DEVICE:
- service diskdump initialformat
To init the service.
Crash
The crash package can be used to investigate live systems, kernel core dumps created from the netdump or diskdump package
xev
`xev` is a small utilty that can be used to debug problems with X11. In particular, odd behaviour related to keypresses and mouseclick can be tracked down.
`xev` just shows all the X11 "events" that get passed to it. For example, if a keypress doesn't seem to be doing what it is supposed to do, you can check to see if X11 is actually getting the keyclick, and if so, what value it is getting. For basic troubleshooting, no knowledge of X11 is needed, but `xev` can present a ton of information that only the most diehard X11 hacker cares about.
Related, but more low-level are also the files in /proc/bus/input. If you're having trouble getting an input device to be accepted by X, you can check if you're giving the correct device file/protocols in xorg.conf/xfree86.conf by cross referencing your config file with the information in these proc files.
pmap
`pmap` is part of the "procps" suite of tools. It can be used to display the memory map of a process. It is essentialy a wrapper for reading from /proc/PID/maps.
It's useful to be able to see what libraries and modules an app has loaded. `ldd` can show the list of libraries an executable is linked against, but it doesn't know anything about dynamically loaded modules. A variety of large applications make significant useage of dynamic loaded modules, as well as most scripting languages, so `pmap` can come in handy when trying to diagnose issues that might be related to modules.
Scripting languages and shell programming
For more information, see Scripting Languages
Shell scripting and scripting languages are what make Unix and Linux work. They are everywhere, so knowing how to track down problems with scripts is a handy skill.
For more information, see Scripting Languages
Logs
The key to troubleshooting is knowing what is going on. For core system services, there is a significant amount of logging turned on by default, especially for error cases. The trick is knowing where to look.
For more info, see Log Files
Enviroment settings
Allowing Core Files
"core" files are dumps of a processes memory. When a program crashes it can leave behind a core file that can help determine what was the cause of the crash by loading the core file in a debugger.
By default, most Linuxes turn off core file support by setting the maximum allowed core file size to 0.
In order to allow a segfaulting application to leave a core, you need to raise this limit. This is done via `ulimit`. To allow core files to be of an unlimitted size, issue:
ulimit -c unlimited
See the section on GDB for more information on what to do with core files.
LD_ASSUME_KERNEL
LD_ASSUME_KERNEL is an enviroment variable used by the dynamic linker to decide what implementation of libraries are used. For most cases, the most important lib is the c library, or "libc" or "glibc".
The reason "glibc" is important is because it contains the thread implentation for a system.
The values you can set LD_ASSUME_KERNEL to equate to Linux kernel versions. Since glibc and the kernel are tighly bound, it's neccasary for glibc to change its behaviour based on what kernel version is installed.
For properly written apps, there should be no reason to use this setting. However, for some legacy apps that depends on a particular thread implementation in glibc, LD_ASSUME_KERNEL can be used to force the app to use an older implementation.
The primary targets for LD_ASSUME_KERNEL=2.4.20 for use of the NTPL thread library. LD_ASSUME_KERNEL=2.4.1 uses the implementation in /lib/i686 (newer LinuxTrheads). LD_ASSUME_KERNEL=2.2.5 or older uses the implementation in /lib (old LinuxThreads)
For an app that requires the old thread implementation, it can be launched as:
LD_ASSUME_KERNEL=2.2.5 ./some-old-app
see http://people.redhat.com/drepper/assumekernel.html for more details.
glibc enviroment variables
There's a wide variety of enviroment varibles that glibc uses to alter its behaviour, many of which are useful for debugging or troubleshoot purposes.
A good reference on these variables is at http://www.scratchbox.org/documentation/general/tutorials/glibcenv.html
Some interesting ones:
LANG and LANGUAGE
LANG sets what message catalog to use, while LANGUAGE sets LANG and all the LC_* variables. These control the locale specific parts of glibc.
Lots of programs are written expecting to be one in one locale and can break in other locales. Since locale settings can change things like sort order (LC_COLLATE), and the time formats (LC_TIME), shell scripts are particularly prone to problems from this.
A script that assumes the sort order of something is a good example.
A common way to test this is to try running the troublesome app with the locale set to "C" or the default locale.
LANGUAGE=C ls -al
If the app starts behaving when run that way, there is probably something in the code that is assuming "C" local (sorted lists and timeformats are strong candidates).
glibc malloc stuff
Recent (>5.4.23 for libc/>2.0 for glibc) libc implementations offer a small scale malloc debugger by way of the MALLOC_CHECK_ environment variable. MALLOC_CHECK_ can be set to 3 different values:
- 0: ignores any heap corruptions - 1: prints diagnostics on STDERR - 2: calls abort(3) as soon as memory corruption is detected
This will help with the kind of memory corruption that can't be found with the tried and proven software engineering method of "staring at the code", but where electric fence/valgrind would be overkill.
Types Of Problems
Software is complicated and there can be a wide variety of problems that occur. But there are categories of problems that come up often, and it's useful to have tools and techniques for solving them
For more info, see Types Of Problems
App specific troubleshooting info
apache
mod_status
mod_status is an Apache module that can show an HTML page representing various information about the internal status of Apache. This includes number of httpds, their current status, network connections, amount of traffic, etc.
Very useful when trying to track down performance related issues.
module debugging
Some Apache httpd modules include options to enable extra debugging info. Unfortunately, this seems to depend on the module.
log files
Log files, the httpd error logs in particular (typically in /var/log/httpd/error_log), are often the best place to look when troubleshooting. It's also where any module debugging information will log to.
Testing the configuration file for syntax errors
Apache comes with an executable called apachectl(8). This program can run a configuration check on Apache's configuration files by issuing the command
apachectl configtest
Some distros (like RedHat/Fedora) also include this command in Apache's init script and invoke apachectl in the background.
-X debug mode
One of the biggest problems with trying to track down problems with apache httpd is the multiprocess nature of it. It makes it difficult to strace or to attach gdb.
To force httpd to run in a single process mode start it with:
httpd -X
Note that on Red Hat linux boxes you probably need to include the commandline arguments that the init scripts start httpd with. The easiest way to do this is to start httpd normally, then run `ps auxwwww` and cut and paste one of the httpd commandline lines.
PHP
The following assumes that you know PHP coding.
The most informative (but also most disruptive in a visual sense) thing to do is set
error_reporting = E_ALL
in your php.ini (under debian: /etc/php/<calling entity>/php.ini). Remember to restart your webserver/calling entity after changing this setting. If you come from the C corner of things, you'll know that good programming style dictates that you treat warnings and notices as errors. So off you go, clean up that code!
Back and still not working? Ok, now it gets ugly. PHP doesn't come with a debugger like `gdb`. Such things exist, but usually they will be embedded in an IDE that also emulates a web server and costs $$$. So basically you get to do stuff just as in regular shell scripts: debug echos. Echo early, echo often. Hand in hand with echo statements comes the print_r function, which will print arrays/hashes (same thing in PHP) recursively. Drawback here: print_r formats in plain ASCII, not HTML. So you'll either have to look at the page source to see a clean version of the output, or do something ugly like
echo join( "<br>", print_r($myarray) );
FIXME: can you turn on warnings about variables only used once, like in perl? One of my most frequent errors....
X apps
FIXME
nosync stuff
X log
iptables
I have a Windows VPN Client behind an Linux Gateway doing NAT and I can't connect to the server
First things first, you'll want to know what kind of Windows VPN tunnel you're building. The following will assume the standard PPTP tunnel.
FIXME: What about l2tp tunnels?
First things first, you need rules that allow the forwarding of the used connections and rules for NATing. The tricky part here is that the PPTP tunnel uses two connections: one going to tcp/1723 on the server, and one GRE tunnel (meaning you can only have one PPTP NATting session active on the gateway at a time). So you'll need the following rules to allow the forwarding:
iptables -A FORWARD -p tcp --dport 1723 -d vpn-server-address -j ACCEPT iptables -A FORWARD -p gre -d vpn-server-address -j ACCEPT
and the NATting is handled by these rules:
iptables -t nat -A PREROUTING -p tcp --sport 1723 -s vpn-server-address -j DNAT --to-dest vpn-client-ip:1723 iptables -t nat -A PREROUTING -p gre -s vpn-server-address -j DNAT --to-dest vpn-client-ip iptables -t nat -A POSTROUTING -p tcp --dport 1723 -d vpn-server-address -j SNAT --to-source gateway-public-ip iptables -t nat -A POSTROUTING -p gre -d vpn-server-address -j SNAT --to-source gateway-public-ip
If you're still having trouble connecting, and Windows is giving you an error 721 (or, if you're looking at the data flow with tcpdump and you're seeing the 1723/tcp connection working fine, but the GRE tunnel connection not working because for some reason the source IP of the GRE tunnel is the private ip of the machine running the vpn client), you will need to build the PPTP connection tracking module for the linux kernel (as of 2.6.x?) and insert the following to modules:
modprobe ip_conntrack_pptp modprobe ip_nat_pptp
Now everything should be working as expected.
SSH
Most problems occur here when you're trying to set up logins via RSA/DSA keys (and probably without passwords too...). It's usally down to basics: Make sure that your ~/.ssh is owned by your user and set to mode 600. ~/.ssh/authorized_keys has to be set to 0600. If these basic conditions aren't met, sshd will refuse to even look at your authorized_keys file and drop you back to password logins.
Another word about the format of the authorized_keys file: it's one key per row. Make sure that your added keys are in a single row! vi is notorious for adding linebreaks if you have 'tw' set in your ~/.vimrc and use copy and paste to add a new key to the file. Use cat or ssh-copy-id instead.
You can run
ssh -v fred@godot
to see what SSH is up to and where things start hickupping. You can go all the way up to
ssh -vvv fred@godot
if you really want to know about how modulo groups are being prodded. Usually -vv suffices.
I just updated my openssh packages and now I can't login
If the error message is something like "Upsupported Protocol - Remote host closed the connection", it's probably due to an incompatibility between OpenSSH 4.2 and anything pre-4.2. If you have the server under your control, the solution is easy: Update the server to the 4.2 version as well (recommended as there are some nasty zlib buffer overruns in pre-4.2 anyway).
FIXME: What other solutions are there?
sshd -d -D
pam/auth/nss
FIXME
- logging options?
- getent
LDAP
FIXME
Kerberos
When something goes wrong with Kerberos, it's usually down to a few things: - Something in the network topology changed, mandating that you re-check your /etc/krb5.conf - Your Kerberos server is unreachable - You entered a wrong password while generating a keytab file or the associated user/service name is not known to the server.
Unfortunately, tools like kinit(1) do have a -v option for verbose output, but this only starts outputting useful information after they aquire a TGT from the KDC. It's more useful to watch the logs of the KDC and see what (if anything) actually happens there.
/etc/krb5.conf
This configuration file is read and used by the Kerberos libraries, so any settings here affect everything on your system that uses Kerberos. The most important setting is
[realms]
<YOUR DEFAULT REALM> = {
kdc = <IP of your KDC>
}
There may be several realm definitions within the [realms] section. Be sure that you set the correct IP here. Otherwise your Kerberos requests will just hang there and time out after a while.
The second most important setting is
[libdefaults] default_realm = <YOUR DEFAULT REALM>
This specifies what realm Kerberos tools will use if no explicit realm is given for a request.
Finally, if you're fooling around with a KDC that resides on a Windows2003 server, be sure that you've enabled arcfour-hmac-md5 and des-cbc-crc as cryto algorithms for the settings default_tgs_enctypes, default_tkt_enctypes and permitted enctypes in the [libdefaults] section. Otherwise your keytab files will be unreadable.
OpenSwan/IPSEC
FIXME
sendmail
FIXME
Desktop Enviroments
Gnome
- http://dcs.nac.uci.edu/~strombrg/Troubleshooting-a-gnome-problem-early-in-the-login.html
- http://docs.sun.com/app/docs/doc/817-1740
Links
- Linux server system tuning (http://people.redhat.com/~alikins/system_tuning.html) Similar concepts.
- glibc env variables explained (http://www.scratchbox.org/documentation/general/tutorials/glibcenv.html)
- Mac OSX debugging (http://developer.apple.com/technotes/tn2004/tn2124.html)
- Unix and Linux Troubleshooting (http://aplawrence.com/Unixart/troubleshooting.html)
- Linux Troubleshooting Tutorials - Solving Problems (http://www.tutorialized.com/tutorial/Solving-Problems/4521)
- Unix Debugging Tips at sial.org (http://sial.org/howto/debug/unix/)
- How To Be A Programmer (http://samizdat.mines.edu/howto/HowToBeAProgrammer.html) Info on debugging strategies
Credits
Comments, suggestions, hints, ideas, critisicms, pointers, and other useful info from various folks were used to create the original version of this document. Check the history for more.
- Adrian Likins
- Mihai Ibanescu
- Chip Turner
- Chris MacLeod
- Todd Warner
- Nicholas Hansen
- Sven Riedel
- Jacob Frelinger
- James Clark
- Brian Naylor
- Drew Puch
- Ted Johnson
License
This work is licensed under a Creative Commons Attribution 2.5 License (http://creativecommons.org/licenses/by/2.5/)
If folks are interested in also applying other licenses (GNU FDL, etc), let Adrian know.
How to Help
See How To Help for more info.
