How to debug coredumps and collect snapcore and errpt logs in AIX ?

AIX provides rich set of error logging and debugging tools, this is because of RAS (Reliability, Availability and Serviceability) features. Any issue or errors in operating system it should be capable of withstanding the errors and AIX Operating System will be available.

AIX Operating System provides multiple option to check system errors, debugging tools and log collecting tools. Below are the usage details

Syslog 

Syslog is a log collection daemon that is present in Linux and Unix flavors. The daemon process will look for debug, console, notice, info and critical information and logs in to file entry in /etc/syslog.conf file as root user.

Add a new debug log file /var/log/debuglog1.log to syslogd daemon to collect debug logs for a day and then rotate it

echo "*.debug /var/log/debuglog1.log rotate time 1d" >> /etc/syslog.conf
stopsrc -s syslogd
startsrc -s syslogd
# tail -f /var/log/debuglog1.log
Jan 18 04:21:14 partition1 syslog:info syslogd: restart
Jan 18 04:21:14 partition1 daemon:info src[11010414]: The syslogd subsystem was requested to STARTED by user root

Similarly we can add *.info , *.console, *.notice and *.crit logs and redirect it to file of root user choice. By looking at AIX developer perspective where syslog api are also provided please refer the manual pages of syslog for more info.

Errpt 

Error reporting tools provides processed reports of failures in Operating System. Below are the example of errpt shutdown event of AIX operating System

# errpt | grep -i shutdown
69350832   0118040420 T S SYSPROC        SYSTEM SHUTDOWN BY USER
# errpt -aj 69350832
---------------------------------------------------------------------------
LABEL:          REBOOT_ID
IDENTIFIER:     69350832

Date/Time:       Mon Jan 18 04:04:30 2020
Sequence Number: 6419
Machine Id:      00C3FEE84C00
Node Id:         partition1
Class:           S
Type:            TEMP
WPAR:            Global
Resource Name:   SYSPROC

Description
SYSTEM SHUTDOWN BY USER

Probable Causes
SYSTEM SHUTDOWN

Detail Data
USER ID
           0
0=SOFT IPL 1=HALT 2=TIME REBOOT
           0
TIME TO REBOOT (FOR TIMED REBOOT ONLY)
           0
PROCESS ID
              11862344
PARENT PROCESS ID
              11207126
PROGRAM NAME
reboot
PARENT PROGRAM NAME
ksh
#
# errclear 0 
# errpt

errclear 0 is to delete all the errpt entries


Cores


In case of VIOS partition the core may be present in home directory or it may be present in path /home/ios/logs

list down the the cores if any by executing  
ls -l /home/ios/logs/*core*
core.12345678.12345678

Get the binary name that has dumped the core for example ls 
# cd /home/ios/logs/
# file core.* 
ls
# which ls
/usr/bin/ls


Enter in to debugging mode of the core file and check from where did the core get from the backtrace of function and registers entries during the core dump
# dbx `which ls` core.12345678.12345678
(dbx) where
(dbx) registers


Collect the snap core of the core and store it in /tmp
# snapcore /home/ios/logs/core.12345678.12345678 /usr/bin/ls
Creating directory /tmp/snapcore ...

Core file "/home/ios/logs/core.12345678.12345678" created by "/usr/bin/ls"

pass1() in progress ....

                Calculating space required .

                Total space required is 598715 kbytes ..

                Checking for available space ...

                Available space is 4977380 kbytes

pass1 complete.

pass2() in progress ....

                Collecting fileset information .

                Collecting error report of CORE_DUMP errors ..

                Creating readme file ..

                Creating archive file ...

                Compressing archive file ....

pass2 completed.

Snapcore completed successfully. Archive created in /tmp/snapcore.

List the system dumps
sysdumpdev -L

KDB


Kernel Debugger is a facility when ever there is a major failure / hang in AIX Operating System the system goes to Kernel Debugger mode. With this mode user is able to debug the root cause of failures in OS.

Below are the ways to enable and disable the kdb mode

Disable KDB
bosboot -a -d /dev/device123; reboot

Enable KDB and Loads the low level debugger
bosboot -a -d /dev/device123 -D; reboot

Enable KDB and Loads and invokes the low-level debugger
bosboot -a -d /dev/device123 -I; reboot


Once the AIX Operating System is in KDB mode below are the commands that we can use to get the data

System Configuration
stat

CPU Status
status

Most recent VMM errorlog entry
vmlog

Machine State Save Area
mst 

Function back-trace     
f

VMM Statistics
vmstat

Thread
th

Virtual I/O Ethernet Interface
vioent
vioent setacs ent0
vioent phypbuflist

SCSI disk list
scsidisk

Virtual Fibre Channel adapter statistics
vfcs
vfcs fcs0

Registers
dr
dr iar

Error Record
errpt


Dialing for support

For General information like how to and advices
1-800-IBM-4YOU (1–800–426–4968)
1-800-IBM-CALL (1–800–426–2255)

For software support hang, loop, Operating System Problems
1-800-IBM-SERV (1–800–426–7378)

Before dialing in execute below commands and keep it handy
# lsconf | grep -i "Machine Serial Number”
# lsconf | grep -i "System Model"
# oslevel –s