Sunday, June 21, 2015

System Panics During the Boot Sequence with "Can't invoke /etc/init, error "


Symptoms

During the boot sequence an error message with the following synopsis is reported on the console.
panic[cpu<n>]/thread=<thread_id>: Can't invoke /etc/init, error <errno>
where <n> represents the CPU number, <thread_id> represents the thread currently on the CPU at the time of the panic and <errno> represents an error code.
For example,
panic[cpu0]/thread=300004f9d00: Can't invoke /etc/init, error 2
The boot sequence is restarted and the message reported again resulting in a  panic cycle which will loop indefinitely.

Cause

Interpreting the Error Code:

In order to arrive at a resolution it is necessary to understand what the error message is stating. The error code reported may be found in the /usr/include/sys/errno.h file, delivered with the "SunOS Header Files" (SUNWhea) package, and this should give a good indication as to the nature of the problem.




Solution

Since the system will not boot, one can either obtain this file from another system or, if you have a Solaris CD-ROM or network Boot Server available, boot to the single-user run level and view the file as follows.
  1. Boot from the CD or boot server to single-user.
    ok boot cdrom -s

    OR
    ok boot net -s
  2. Mount the /usr file system, e.g.,
    # /usr/sbin/mount /dev/dsk/c0t0d0s0 /a

    Note : This may not be on the same disk partition as the root (/) file system.
  3. List the errno.h file contents, e.g.,
    # /usr/bin/cat /a/usr/include/sys/errno.h
Note : It is advisable, but not necessary, to boot from a CD-ROM or Boot Server which contains the same Solaris image as the one originally used to install the system as you may need to restore the /sbin/init binary from this image.
Example 1
As a first example, consider the following message. Examination of the error codes listed in the /usr/include/sys/errno.h file reveals that this means "No such file or directory", i.e, the file contains the following line.
panic[cpu0]/thread=300004f9d00: Can't invoke /etc/init, error 2

Examination of the error codes listed in the /usr/include/sys/errno.h file reveals that this means "No such file or directory", i.e, the file contains the following line.

#define ENOENT  2       /* No such file or directory            */

It is important to realise that /etc/init is actually a symbolic link to the /sbin/init binary file. For example, if we look at a host running Solaris 9 4/04 we see,
# /usr/bin/ls -l /etc/init
lrwxrwxrwx 1 root root 12 Apr 26 07:37 /etc/init -> ../sbin/init
You will therefore see "error 2" reported if either /etc/init and/or /sbin/init are missing from the system.

In order to determine the exact nature of the problem you will need to boot to the single-user run level from CD-ROM or a network Boot Server and mount the root (/) partition as follows.

  1. Boot from the CD-ROM or Boot Server to single-user.
    ok boot cdrom -s

    OR

    ok boot net -s

  2. Mount the root (/) file system, e.g.,
    # /usr/sbin/mount /dev/dsk/c0t0d0s0 /a

    Note: If your /usr file system is on a different partition to the root (/) file system and you have already mounted this on /a you will first need to unmount it with the umount(1M) command.
  3. Check whether /etc/init and /sbin/init exist.
    # /usr/bin/ls -l /a/sbin/init
    # /usr/bin/ls -l /a/etc/init
If /etc/init does not exist you can recreate the symbolic link by running:
# /usr/bin/ln -s ../sbin/init /a/etc/init

If the /sbin/init file does not exist you can either restore it from backup or, since you are currently booted from a CD-ROM or Boot Server, obtain the file from the SUNWcsr package as follows.
    1. Copy the "none.bz2" bzip2(1) compressed file to a temporary location.
      # /usr/bin/cp /cdrom/Solaris_[0-9]*/Product/SUNWcsr/archive/none.bz2 /tmp/root/var/tmp

    2. Uncompress the bzip2 file.
      # /usr/bin/bunzip2 /tmp/root/var/tmp/none.bz2

    1. Extract /sbin/init from the cpio archive.
      # /usr/bin/cat /tmp/root/var/tmp/none | /usr/bin/cpio -icd "sbin/init"

      This will create an sbin directory below the /tmp/root/var/tmp directory which will contain the "init" binary.

  1. Examine the checksum for the extracted file.
    You can make sure that the file which you are restoring is the same as the original binary by examining the /var/sadm/install/contents file if it exists, e.g.,
    # /usr/bin/grep -w '/sbin/init' /a/var/sadm/install/contents
    /sbin/init f none 0555 root sys 912888 3681 1047665212 SUNWcsr


    In the example above the checksum is shown as "3681" and we can use the sum(1) command to check this value for the file we have obtained from the CD-ROM or Boot Server, i.e.,
    # /usr/bin/sum /tmp/root/var/tmp/sbin/init
    3681 1783 /tmp/root/var/tmp/sbin/init
    The first value shown is the one we are interested in and clearly, in this example, it matches the value in the "contents"file so we are OK to restore the file.

    Note: Even if the checksum is different you may still be able to use the binary in order to get the system back up. However, you should
    endeavour to restore the correct binary as soon as possible otherwise pkgchk(1M) will report a "cksum" error.

  2. Copy the file to the /sbin directory.
    # /usr/bin/cp /tmp/root/var/tmp/sbin/init /a/sbin
    # /usr/bin/ls -l /a/sbin/init
Example 2
As a second example, let's change the permissions on the /sbin/init file on a system running Solaris 9 12/02 and see what happens.
First we make sure that the "init" symbolic link is in place in the /etc directory, i.e.,
# /usr/bin/ls -l /etc/init
lrwxrwxrwx 1 root other 12 Apr 21 10:50 /etc/init -> ../sbin/init
The current permissions on the /sbin/init file are may be viewed using the ls(1) command, i.e.,
# /usr/bin/ls -l /sbin/init
-r-xr-xr-x 1 root sys 913676 Jun 19 2002 /sbin/init

So, let's change the permissions using the chmod(1) command so that this binary is no longer executable, e.g.,
# chmod 0444 /sbin/init
# /usr/bin/ls -l /sbin/init
-r--r--r-- 1 root sys 913676 Jun 19 2002 /sbin/init
If the system is now rebooted the following error is reported.
panic[cpu0]/thread=300004f9d00: Can't invoke /etc/init, error 13

Looking at the /usr/include/sys/errno.h file we see that this means "Permission denied", i.e.,
#define EACCES  13       /* Permission denied                    */

In order to resolve this you would need to boot to the single-user run level from CD-ROM or a network Boot Server, mount the root (/) file system and then correct the permissions using the
chmod(1) command.

Example 3
As a final example let's see what happens if we turn /sbin/init into an ASCII file, e.g.,
# /usr/bin/echo "this is not binary" >/sbin/init

When we try to reboot the system the following error is reported.
panic[cpu0]/thread=30000523d00: Can't invoke /etc/init, error 8

The /usr/include/sys/errno.h file shows that this means "Exec format error", i.e.,
#define ENOEXEC 8       /* Exec format error                    */

In order to resolve this we would need to restore the /sbin/init binary using one of the methods discussed in example 1 above.

No comments:

Post a Comment