Friday, January 10, 2014

solaris 9 server rebooting again and again ( bootblk missing)

After a routine reboot the system, encountered the problem of rebooting the server again and again.
There was problem of missing bootblk on the system. system had 2 mirrored disk having metadevices.
That was causing to reboot the system again and again.
The error was as below:
SunOS Release 5.9 Version Generic_118558-28 64-bit
Copyright 1983-2003 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Cannot mount root on /pseudo/md@0:0,0,blk fstype ufs
panic[cpu0]/thread=140a000: vfs_mountroot: cannot mount root
0000000001409970 genunix:vfs_mountroot+70 (0, 0, 0, 200, 14a8270, 0)
%l0-3: 000000000149bc00 000000000149bc00 0000000000002000 00000000014e5b00
%l4-7: 00000000014eb800 0000000001422568 000000000149c400 000000000149f400
0000000001409a20 genunix:main+90 (1409ba0, f000d348, 1409ec0, 399604, 2000, 500)
%l0-3: 0000000000000001 000000000140a000 0000000001423728 0000000000000000
%l4-7: 0000000078002000 000000000039c000 00000000014fdb18 000000000106a128
skipping system dump – no dump device configured
rebooting…
Resetting …
Rebooting with command: boot rootmirror
Boot path: /ssm@0,0/pci@18,600000/scsi@2/disk@1,0:a  Boot args:
Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54.
FCode UFS Reader 1.12 00/07/17 15:48:16.
Loading: /platform/SUNW,Netra-T12/ufsboot
Loading: /platform/sun4u/ufsboot
SunOS Release 5.9 Version Generic_118558-28 64-bit
Copyright 1983-2003 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Cannot mount root on /pseudo/md@0:0,0,blk fstype ufs
panic[cpu0]/thread=140a000: vfs_mountroot: cannot mount root
0000000001409970 genunix:vfs_mountroot+70 (0, 0, 0, 200, 14a8270, 0)
%l0-3: 000000000149bc00 000000000149bc00 0000000000002000 00000000014e5b00
%l4-7: 00000000014eb800 0000000001422568 000000000149c400 000000000149f400
0000000001409a20 genunix:main+90 (1409ba0, f000d348, 1409ec0, 399604, 2000, 500)
%l0-3: 0000000000000001 000000000140a000 0000000001423728 0000000000000000
%l4-7: 0000000078002000 000000000039c000 00000000014fdb18 000000000106a128
skipping system dump – no dump device configured
rebooting…
Resetting …
–> Tried to boot into single user(maintenance) mode but no luck as same problem was repeated.
init S
we were getting below error while trying with Single user mode.
WARNING: /ssm@0,0/pci@18,600000/scsi@2/sd@0,0 (sd30):
Error for Command: read(10)                Error Level: Retryable
Requested Block: 286637632                 Error Block: 286637632
Vendor: SEAGATE                            Serial Number: 0534424F0E
Sense Key: Media Error
ASC: 0×11 (unrecovered read error), ASCQ: 0×0, FRU: 0xf
WARNING: /ssm@0,0/pci@18,600000/scsi@2/sd@0,0
--> Tried with interactive boot, luckily it booted but problem could not be fixed as file system was mounted as read only :(
OK> boot -a
root@server1 # mount -f ufs /dev/dsk/c0t1d0s0 /mnt
root@server1 # cd /mnt
root@server1 # ls
bin             etc             mnt             sbin            vol
cdrom           export          net             sysadm          xfn
configcron.bak  home            nsr             tmp
depl_admin      kernel          opt             users
dev             lib             platform        usr
devices         lost+found      proc            var
we were not able to see any metadb earlier but after the above command we could see the metadb.
root@server1 # metadb -i
flags           first blk       block count
a        u         16              8192            /dev/dsk/c0t1d0s7
a m  p  luo        16              8192            /dev/dsk/c0t0d0s7
a        u         8208            8192            /dev/dsk/c0t1d0s7
a        u         16400           8192            /dev/dsk/c0t1d0s7
But again same problem arises ( server rebooted again)
root@server1 # reboot
syncing file systems… done
rebooting…
Resetting …
obp-tftp
Sun Fire V1280
OpenFirmware version 5.20.9 (02/26/08 13:13)
Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
SmartFirmware, Copyright (C) 1996-2001.  All rights reserved.
32768 MB memory installed, Serial #*******.
Ethernet address *******, Host ID: *******.
Rebooting with command: boot
Boot path: /ssm@0,0/pci@18,600000/scsi@2/disk@0,0:a  Boot args:
Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54.
FCode UFS Reader 1.12 00/07/17 15:48:16.
Loading: /platform/SUNW,Netra-T12/ufsboot
Loading: /platform/sun4u/ufsboot
SunOS Release 5.9 Version Generic_118558-28 64-bit
Copyright 1983-2003 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Cannot mount root on /pseudo/md@0:0,0,blk fstype ufs
panic[cpu0]/thread=140a000: vfs_mountroot: cannot mount root
0000000001409970 genunix:vfs_mountroot+70 (0, 0, 0, 200, 14a8270, 0)
%l0-3: 000000000149bc00 000000000149bc00 0000000000002000 00000000014e5b00
%l4-7: 00000000014eb800 0000000001422568 000000000149c400 000000000149f400
0000000001409a20 genunix:main+90 (1409ba0, f000d348, 1409ec0, 399604, 2000, 500)
%l0-3: 0000000000000001 000000000140a000 0000000001423728 0000000000000000
%l4-7: 0000000078002000 000000000039c000 00000000014fdb18 000000000106a128
skipping system dump – no dump device configured
rebooting…
Resetting …
server had 2 disks as below:
AVAILABLE DISK SELECTIONS:
0. c1t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/ssm@0,0/pci@18,600000/scsi@2/sd@0,0
1. c1t1d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/ssm@0,0/pci@18,600000/scsi@2/sd@1,0
Specify disk (enter its number): Specify disk (enter its number):
Solution:
We booted the system with the CDROM, unluckily we could not find the Solaris 9 DVD to boot with coz data center was in Europe
and onsite person was having only Solaris 10 U9 DVD. we tried with the net -boot options also but no luck.
So lets try to recover Solaris 9 OS with Solaris 10 DVD.
{0} ok boot cdrom -s
Boot path: /ssm@0,0/pci@18,700000/ide@3/cdrom@0,0:f  Boot args: -s
hsfs-file-system
Loading: /platform/sun4u/boot_archive
ramdisk-root ufs-file-system
Loading: /platform/SUNW,Netra-T12/kernel/sparcv9/unix
SunOS Release 5.10 Version Generic_142909-17 64-bit
Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved.
os-io Hardware watchdog enabled
Booting to milestone “milestone/single-user:default”.
Configuring devices.
Unretrieved lom log history follows …
12/13/11 2:12:09 AM Domain Reboot A: Initiating keyswitch: on, domain A.
Using RPC Bootparams for network configuration information.
Attempting to configure interface ce5…
Skipped interface ce5
Attempting to configure interface ce4…
Skipped interface ce4……………………………………………………
output truncated…
SINGLE USER MODE
we checked the current update of Solaris 10 OS at following location:
/cdrom/Solaris_10/Product/SUNWsolnm
# cat release
Oracle Solaris 10 9/10 s10s_u9wos_14a SPARC
Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
Assembled 11 August 2010
It was much more higher version what we required to recover the boot block on the server, we needed below U6.
Being no choice we stick with the same version and tried to recover and later we did..
Steps 1:Tried to mount the root file system but got below error:
# mount /dev/dsk/c0t0d0s0 /a
mount: /dev/dsk/c0t0d0s0 write-protected
# mount -o ro /dev/dsk/c0t0d0s0 /a
NOTICE: mount: not a UFS magic number (0×0)
mount: /dev/dsk/c0t0d0s0 is not this fstype
# mount -o ro /dev/dsk/c0t1d0s0 /a
mount: /dev/dsk/c0t1d0s0 or /a, no such file or directory
Then we checked the available disks and it was changed scenario:
AVAILABLE DISK SELECTIONS:
0. c1t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/ssm@0,0/pci@18,600000/scsi@2/sd@0,0
1. c1t1d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/ssm@0,0/pci@18,600000/scsi@2/sd@1,0
Step2:mounted with new device path:
# mount -o ro /dev/dsk/c1t0d0s0 /a
# df -k
Filesystem            kbytes    used   avail capacity  Mounted on
/ramdisk-root:a       201463  178943    2374    99%    /
/devices                   0       0       0     0%    /devices
ctfs                       0       0       0     0%    /system/contract
proc                       0       0       0     0%    /proc
mnttab                     0       0       0     0%    /etc/mnttab
swap                 32244232     344 32243888     1%    /etc/svc/volatile
objfs                      0       0       0     0%    /system/object
sharefs                    0       0       0     0%    /etc/dfs/sharetab
swap                 32244480     592 32243888     1%    /tmp
/tmp/dev             32244480     592 32243888     1%    /dev
fd                         0       0       0     0%    /dev/fd
/devices/ssm@0,0/pci@18,700000/ide@3/sd@0,0:f
2224620 2224620       0   100%    /cdrom
df: cannot statvfs /platform/sun4u-us3/lib/libc_psr.so.1: Operation not applicable
df: cannot statvfs /platform/sun4u-us3/lib/sparcv9/libc_psr.so.1: Operation not applicable
swap                 32243896       8 32243888     1%    /tmp/root/var/run
/dev/dsk/c1t0d0s0    8268461 2305007 5880770    29%    /a
Steps3: mounted /a as RW to make necessary changes in vfstab
# mount -o remount,rw /dev/dsk/c1t0d0s0 /a
Step4: copied available metadevices configuration file to system kernel.
# cp /a/kernel/drv/md.conf  /kernel/drv/
Step5:
# update_drv -f md
devfsadm: mkdir failed for /dev 0x1ed: Read-only file system
# Above error can be ignored.
Step6: copy the current bootpath:
root@server1 # prtconf -vp|grep bootpath
bootpath:  ‘/ssm@0,0/pci@18,600000/scsi@2/disk@0,0:a’
Step6: commented all MD entries in /a/etc/vfstab and made a new entry of original device paths for mounting  / and /var as per the previous metastat:
#device         device          mount           FS      fsck    mount   mount
#to mount       to fsck         point           type    pass    at boot options
#
fd      -       /dev/fd fd      -       no      -
/proc   -       /proc   proc    -       no      -
#/dev/md/dsk/d1 -       -       swap    -       no      -
#/dev/md/dsk/d0 /dev/md/rdsk/d0 /       ufs     1       no      logging
#/dev/md/dsk/d3 /dev/md/rdsk/d3 /var    ufs     1       no      logging
#/dev/md/dsk/d5 /dev/md/rdsk/d5 /export/home    ufs     2       yes     logging
#/dev/md/dsk/d4 /dev/md/rdsk/d4 /var/crash      ufs     2       yes     logging
swap    -       /tmp    tmpfs   -       yes     -
# Following metadisks are soft partitions created from d100 mirror (slice 6)
#/dev/md/dsk/d103        /dev/md/rdsk/d103       /opt/xacct      ufs     2
yes     logging
#/dev/md/dsk/d104        /dev/md/rdsk/d104       /var/XX ufs     2
yes     logging
#/dev/md/dsk/d105        /dev/md/rdsk/d105       /var/X   ufs
2       yes     logging
#/dev/md/dsk/d106        /dev/md/rdsk/d106       /opt/XXX     ufs     2
yes     logging
#/dev/md/dsk/d107        /dev/md/rdsk/d107       /var/XXXX     ufs     2
yes     logging
/dev/dsk/c0t0d0s0 /dev/rdsk/c0t0d0s0 / ufs 1 no logging
/dev/dsk/c0t0d0s3 /dev/rdsk/c0t0d0s3 /var ufs 1 no logging
“/a/etc/vfstab” 20 lines, 1120 characters
Step7: Remove all metadevices:
metaclear -r ( recursively cleans all metadevices)
Encountered error while removing metadevice d0 as dumpdevice was installed on it, so first removed the dumpdevice and then deleted the metadevice.
Step8:Remove all metadatabases and rootdev entry from /etc/system:
# metadb -f -d  /dev/dsk/c0t1d0s7
# metadb -f -d  /dev/dsk/c0t0d0s7
# metadb -i
Earlier it was like below:
# metadb -i
flags           first blk       block count
a m  p  luo        16              8192            /dev/dsk/c0t1d0s7
a    p  luo        16              8192            /dev/dsk/c0t0d0s7
a    p  luo        8208            8192            /dev/dsk/c0t1d0s7
a       luo        16400           8192            /dev/dsk/c0t1d0s7
Step9: Install the bootblk from the current disk (not with CDROM as it was of Solaris 10 U9)
# /a/installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/dsk/c0t0d0s0
dd: /dev/rdsk/c0t0d0s0: open: Read-only file system
* this error encounterd becuase currently disks are showing c1t0d0s0, so we corrected the command as per current device.
# /a/usr/sbin/installboot /a/usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/dsk/c0t0d0s0
# echo $?
0
It worked.. :)
Step10: unmount /a and run fsck on root file system.
# umount /a
# fsck /dev/rdsk/c1t0d0s0
** /dev/rdsk/c1t0d0s0
** Last Mounted on /a
** Phase 1 – Check Blocks and Sizes
** Phase 2 – Check Pathnames
** Phase 3a – Check Connectivity
** Phase 3b – Verify Shadows/ACLs
** Phase 4 – Check Reference Counts
** Phase 5 – Check Cylinder Groups
CORRECT BAD CG SUMMARIES FOR CG 0? yes
CORRECTED SUPERBLOCK SUMMARIES FOR CG 0
CORRECTED SUMMARIES FOR CG 0
FRAG BITMAP WRONG
FIX? y
CG 164: IMPOSSIBLE NUMBER OF CYLINDERS IN GROUP (0 is less than 1)
REPAIR? y
84447 files, 2296800 used, 5955245 free (92141 frags, 732888 blocks, 1.1% fragme
***** FILE SYSTEM WAS MODIFIED *****
#  fsck /dev/rdsk/c1t0d0s0
** /dev/rdsk/c1t0d0s0
** Last Mounted on /a
** Phase 1 – Check Blocks and Sizes
** Phase 2 – Check Pathnames
** Phase 3a – Check Connectivity
** Phase 3b – Verify Shadows/ACLs
** Phase 4 – Check Reference Counts
** Phase 5 – Check Cylinder Groups
84447 files, 2296800 used, 5955245 free (92141 frags, 732888 blocks, 1.1% fragme
# echo |format
Searching for disks…done
AVAILABLE DISK SELECTIONS:
0. c1t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/ssm@0,0/pci@18,600000/scsi@2/sd@0,0
1. c1t1d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/ssm@0,0/pci@18,600000/scsi@2/sd@1,0
Specify disk (enter its number): Specify disk (enter its number):
# init 0
# syncing file systems… done
NOTICE: f_client_exit: Program terminated!
debugger entered.
Step11: The system comes up with the customized root file system in /etc/vfstab.
{b} ok boot rootdisk
Resetting …
Copying IO PROM to CPU DRAM
.{/N0/SB0/P0} @(#) lpost        5.20.9  2008/02/26 13:14
{/N0/SB0/P0} Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
{/N0/SB0/P0} Use is subject to license terms.
{/N0/SB0/P1} @(#) lpost         5.20.9  2008/02/26 13:14
.{/N0/SB0/P1} Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
.{/N0/SB0/P1} Use is subject to license terms.
…………………………………………..
output truncated
obp-tftp
Sun Fire V1280
OpenFirmware version 5.20.9 (02/26/08 13:13)
Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
SmartFirmware, Copyright (C) 1996-2001.  All rights reserved.
32768 MB memory installed, Serial #######
Ethernet address######, Host ID: ######.
Rebooting with command: boot rootdisk
Boot path: /ssm@0,0/pci@18,600000/scsi@2/disk@0,0:a  Boot args:
Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54.
FCode UFS Reader 1.12 00/07/17 15:48:16.
Loading: /platform/SUNW,Netra-T12/ufsboot
Loading: /platform/sun4u/ufsboot
SunOS Release 5.9 Version Generic_118558-28 64-bit
Copyright 1983-2003 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
WARNING: forceload of misc/md_trans failed
WARNING: forceload of misc/md_raid failed
WARNING: forceload of misc/md_hotspares failed
WARNING: forceload of misc/md_sp failed
WARNING: forceload of misc/md_stripe failed
WARNING: forceload of misc/md_mirror failed
Hardware watchdog enabled
configuring IPv4 interfaces: ce1 ce3.
Hostname: server1
The system is coming up.  Please wait.
………………………………………….
output truncated….
console login: root
Password:
Step 12:Now time to create the original metadevices and restore the server as original.
First created the root file system, since /var was part of root so did not create metadevice for /var.
metainit -f d10 1 1 c0t0d0s0    # created metadevice d10
metaroot d10             # this will make entry into /etc/system and /etc/vfstab
Create metadatabase also
metadb -f -a c 3 c0t0d0s7 c0t1d0s7
Step 13:
lockfs
sync
init 0
Now server comes up with root mounted with metadevice
Step 14: Attach the 2nd mirror to root md d0
metattach d0 d20
Step 15: Now its time to create all other metadevices and mount.
metainit d41 1 1 c0t0t0s4 like this all..
metattach
Then all metadevices will be resync
Stel16: mount all file systems after uncommenting from /etc/vfstab
mountall
Step17: create the dumpdevice as it was earlier

No comments:

Post a Comment