Unix Administration: VCS questions

1:       How do check the status of VERITAS Cluster Server aka VCS?
Ans:   hastatus –sum

2:       Which is the main config file for VCS and where it is located?
Ans:   main.cf is the main configuration file for VCS and it is located in      /etc/VRTSvcs/conf/config.

3:       Which command you will use to check the syntax of the main.cf?
Ans:   hacf -verify /etc/VRTSvcs/conf/config

9:       How to switchover the service group in VCS?
Ans:   # hagrp –switch -to

10:     How to online the service groups in VCS?
Ans:   # hagrp –online -sys

11:     How to set the VCS configuration Read-Only?
Ans:   # haconf –dump –makero

12:     How to set the VCS configuration Read-Write?
Ans:   # haconf -makerw

13:     How to display the list of all snapshots?
Ans:   # hasnap –display –list

14:     How to add a user with cluster administrator/Operator access?
Ans:   # hauser –add <user> -priv Administrator/Operator

15:     How to add a user with group administrator/Operator access?
Ans:   # hauser –add <user> -priv Administrator/Operator –group <service group>

Querying Service Groups, & Resources
16:     How to display the status of a service group on a system?
Ans:   # hagrp –state <service group> -sys <system>

17:     How to display the resources for a specific service group?
Ans:   # hagrp –resources <service group>

18:     How to display the service group dependencies?
Ans:   # hagrp –dep <service group>

19:     How to display information about a service group on a system?
Ans:   # hagrp –display <service group> -sys <system name>

20:     How to display resource dependencies?
Ans:   # hares –dep <resource name>

21:     How to display information about a resource?
Ans:   # hares –display <resource name>

22:     How to display resources of a service group?
Ans:   # hares –display –group <service group>

23:     How to display resources of a resource type?
Ans:   # hares –display –type <resource type>

24:     How to display attributes of a system?
Ans:   # hares –display –sys <system name>

25:     How to display all resources type?
Ans:   # hatype –list

26:     How to list the systems in the cluster?
Ans:   # hasys –list

27:     How to display information about a particular system?
Ans:   # hasys –list <system name>

28:     How to display information about the cluster?
Ans:   # haclus –display

29:     How to display the status of all service groups including resources in cluster?
Ans:   # hastatus

30:     How to display the status of cluster faults, including faulted service groups, systems, links and agents?
Ans:   # hastatus –summary

Administering Service Group:
31:     How to add a service group in a cluster?
Ans:   # hagrp –add <service group>

32:     How to delete a service group from a cluster?
Ans:   # hagrp –delete <service group>

33:     How to modify a service group attribute such as SystemList, AutoStartList, parallel etc?
Ans:
          (A) How to populate the SystemList attribute of service group groupX with SystemA and B.
# hagrp –modfy groupX SystemList –add SystemA 1 SystemB 2

(B) How to populate the AutoStartList attribute of service group groupX with SystemA and B.
# hagrp –modify groupX AutoStartList –add SystemA SystemB

          (C) How to define the service group as a parallel?
          # hagrp –modify <service group> Parallel 1

34:     How to bring a service group online?
Ans:   # hagrp –online <service group> -sys <system name>

35:     How to take a service group offline?
Ans:   # hagrp –offline <service group> -sys <system name>

36:     How to take a service group offline if all resources are probed?
Ans:   # hagrp –offline <service group> -ifprobed –sys <system name>

37:     How to switch a service group from one system to another system?
Ans:   # hagrp –switch <service group> -to <system name>

38:     How to freeze a service group?
Ans:   # hagrp –freeze <service group> -persistent

39:     How to unfreeze a frozen service group?
Ans:   # hagrp –unfreeze <service group> -persistent

40:     How to disable a service group?
Ans:   # hagrp –disable <service group> -sys <system name>

41:     How to enable a service group?
Ans:   # hagrp –enable < service group> -sys <system name>

42:     How to enable all resources in a service group?
Ans:   # hagrp –enableresources <service group>

43:     How to disable all resources in a service group?
Ans:   # hagrp –disableresources <service group>

44:     How to clear faulted, non-persistent resources in a service group?
Ans:   # hagrp –clear <service group> -sys <system name>

45:     How to clear resources in ADMIN_WAIT state in a service group?
Ans:   # hagrp –clearadminwait <service group> -sys <system name>

46:     How to flush a service group?
Ans:   # hagrp –flush <service group> -sys <system name>

47:     How to link a service group with another?
Ans:   # hagrp –link <parent service group> <child service group> <gd_category> <gd_location> <gd_type>

          gd_category =        Category of group dependency (online/offline)
          gd_location   =        The scope of dependency (local/global/remote)
          gd_type        =        type of group dependency (soft/firm/hard)

48:     How to unlink a service group with another?
Ans:   # hagrp –unlink <parent service group> <child service group>

VCS agents:
VCS agent use to communicate to the HAD (high availability daemon) with all the configuration and attributes.For an example, Mount agent should be able to monitor the filesystem and able to mount/umount the filesystem according to the given commands.

Agents are multi-threaded processes that provide the logic to manage resources. VCS has one agent per resource type. The agent monitors all resources of that type; for example, a mount agent manages all mounts resources.When the agent is started, it obtains the necessary configuration information from VCS. It then periodically monitors the resources, and updates VCS with the resource status.
Above Notes from https://sort.symantec.com.

You may have the below questions in your mind about VCS agents.Its better know what it is and what actually it does .

Who provides the VCS agents ?
The blow mentioned resource agents are shipped along with VCS. If the agents are not starting then ,you need to contact Symantec to fix the issue.
1.Oracle
2.Sysbase
3.LDOM
4.IP
5.NIC
6.Zone
7.Application
and many more.
Symantec also provides additional developed agents to support various applications on VCS. For an example,you want to cluster the SAP application ,you need a SAP agents .But in some cases original software vendor will provide the agents for VCS(independent software vendor ISV).

What are the VCS agents Properties?
1.Only one agent daemon runs on a system for each configured resource type.
2.An agent runs a single operation on a resource at one time.
3.Agents can perform operation on multiple resources of the same type in parallel.
4.If there are no resource of a particular type anywhere in the cluster,the agent for that type is not started.
5.A resource cannot be managed without an agent.

What are the VCS agents functions ?
1.Start a specified program
2.Stop a specified program
3.Monitor the program
4.Clean up after a fault.

How it formed ?
1.An agent binary file,which contains all necessary function within single binary to control the resource.
2.An agent binary and a collection of scripts that correspond to agent function not included in the binary.

Issues with VCS agents:
In Production environment ,we may face the below issue very often.Here we will see how to fix that issue.
VCS-Switch over issue due to failed agent
VCS error: V-16-1-10195, resulting from failed fail-over of service group.
The failover of a service group fails, and hastatus -sum shows that an agent is 'failed' on the system the service group is to failover to.

Example of hastatus -sum output:

-- AGENTS FAILED
-- Type System
I IPMultiNIC node1

This is issue can be resolved by issuing the following command.It will just stat VCS agent if it failed abnormally or stopped for unknown reason.

# haagent -start IPMultiNIC -sys node1

This command is applicable for all the VCS agents which was showing failed in hastatus output.Here i have just shown the example with "IPMultiNic".Resourse agent can be anything like Mount,DiskGroup,Volume etc...

1:       Set up the hardware
Before adding a node to an existing cluster, node must be physically connected with the cluster.
      1: Connect the VCS private Ethernet controllers
      2: Connect the node to the shared storage

2:       Install the VCS software in the node
          Install the VCS software and install the license.

3:       Configure LLT and GAB
Create the LLT & GAB configuration files (/etc/llthosts, /etc/llttab and /etc/gabtab) in the new node and update the files on the existing node.

4:       Add the node to an existing cluster
We have to perform below given tasks in any of the existing node of a cluster
         1:Make to cluster configuration R/W
          # haconf –makerw

          2:Add the new node to the cluster
          # hasys –add <new node name>

          3:Copy main.cf file from an existing node to new node
          # scp /etc/VRTSvcs/conf/config/main.cf new_node:/
          /etc/VRTSvcs/conf/config/main.cf

          4:Start vcs on the new node
          # hastart

          5:Now make the configuration again read only.
          # haconf –dump –makero

5:       Start VCS and verify the cluster
          1:Start VCS on the new node
          # hastart

6:       Run the GAB configuration command on each node to verify that port a and port h   include the new node in the membership.
          # /sbin/gabconfig -a

Q-2 How to remove a node from an existing cluster?
Ans:   Removing a node from a cluster includes many steps, which are given below:

1:       Backup the configuration file
          # cp /etc/VRTSvcs/conf/config/main.cf /etc/VRTSvcs/conf/config/main.cf.orig

2:       Check the status of the nodes and the service groups
          # hastatus –summary

3:       Switch service group which is online on the node leaving the cluster
          # hagrp –switch <service group> to <node name>

4:       Delete the node from the VCS configuration
          1:       Make the cluster configuration R/W
                    # haconf –makerw

          2:       Stop the cluster on leaving node
                    # hastop –sys <node>

          3:       Delete the leaving node from the service group’s SystemList attribute.
                    # hagrp –modify <group> SystemList –delete <node>

          4:       Delete the node from the cluster
                    # hasys –delete <node>

          5:       Now again make the cluster configuration Read Only.
                    # haconf –dump –makero

5:       Modify the LLT and GAB configuration files to reflect changes
Modify /etc/llthosts, /etc/llttab and /etc/gabtab files on the remining node on the cluster.

6:       Remove VCS configuration on the node leaving the cluster
                    1:       Unconfigure and unload LLT and GAB
                              # /sbin/gabconfig –U
                              # /sbin/lltconfig –U

                    2:       Unload the LLT and GAB modules
                              # modunload –i <gab_module>
                              # modunload –I <llt_module>

                  3:       Rename the startup files to prevent LLT, GAB and VCS from
                            starting up in future.
                            # mv /etc/rc2.d/S70llt /etc/rc2.d/s70llt
                            # mv /etc/rc2.d/S92gab /etc/rc2.d/s92gab
                            # mv /etc/rc3.d/S99vcs /etc/rc3.d/s99vcs

                 4:       Remove VCS package from the node

Some General Questions:
Q-1     How to shutdown a node in VCS cluster?
Ans:   Shutting down a VCS node is multi step process.

1) Make the cluster configuration Read/Write
          # haconf –makerw

2) Either Switchover or failover all the service group which are online on shutting down node to remaining node
          # hagrp –switch <service group> -to <node name>

3) Freeze all the service group which are online in the cluster.
          # hagrp –freeze <service group> -persistent

4) Stop the cluster on the node that is going to be down.
          # hastop –local –force

5) Rename the VCS startup script
          # cd /etc/rc3.d
          # mv S99vcs s99vcs

6) Now reboot the box.

Once the system will come up after reboot, Follow the below given instructions.

1) Start the VCS on this node
                    # hastart –force
2) Make the service group online if they were made offline before the system down.
                    # hagrp –online <service group> -sys <node name>

3) Unfreeze all the service groups which are frozen.
                    # hagrp -unfreeze <service group> -persistent

4) Now make the cluster configuration Read-Only
                    # haconf -dump –makero

5) Now again move back the VCS startup script
                    # cd /etc/rc3.d
                    # mv s99vcs S99vcs

Q-2     How do check the status of VERITAS Cluster Server?
Ans:   hastatus –sum

Q-3     Which is the main config file for VCS and where it is located?
Ans:   main.cf is the main configuration file for VCS and it is located in       /etc/VRTSvcs/conf/config.

Q-4     Which command you will use to check the syntax of the main.cf?
Ans:   hacf -verify /etc/VRTSvcs/conf/config

Q-5     How will you check the status of individual resource of VCS cluster?
Ans:   hares –state <resource>

Q-6     What is the service group in VCS?
Ans:   Service group is made up of resources and their links which you normally requires to maintain the HA of application.

Q-7     What is the use of halink command?
Ans:   halink is used to link the dependencies of the resources

Q-8     What is the difference between switchover and failover?
Ans:   Switchover is an manual task where as failover is automatic. You can switchover service group from online cluster node to offline cluster node in case of power outage, hardware failure, schedule shutdown and reboot. But the failover will failover the service group to the other node when VCS heartbeat link down, damaged, broken because of some disaster or system hung.

Q-9     What is the use of hagrp command?
Ans:   hagrp is used for doing administrative actions on service groups like online, offline, switch etc.

Q-10   How to switchover the service group in VCS?
Ans:   hagrp –switch <service group> to <node>

Q-11   How to online the service groups in VCS?
Ans:   hagrp –online <service group> -sys <node>

Q-12   How to access the VCS cluster management console?
Ans:   VCS cluster management console can be accessed by the below given URLs:
          http://Servername:8181/cmc/
                              or
          https://Servername:8443/cmc

Q-13   How to access the Cluster Manager Java Console?
Ans:   #/opt/VRTSvcs/bin/hagui

Q-14   What is Jeopardy?
Ans:   When a node in the cluster is having only one interconnected link remaining, then it’s very difficult for GAB to discriminate between system or network failure. A special membership category takes effect in this situation, called jeopardy membership. This memebship prevent cluster from split brain condition. When a system is placed in jeopardy membership, two actions occur:
1:       Service groups running on this node placed in auto disabled state. A service group in auto disabled state may failover on a resource or group fault but can’t failover on system fault.
2:       VCS operates the cluster as a single node cluster. Other systems in the clusters are partitioned off in a separate cluster membership.

Q-15   What is the main daemon of VCS?
Ans:   had (high availability daemon) which is started by hashadow daemon.

Q-16   What is GAB?
Ans:   Group Membership Services/Atomic Broadcast (GAB) is responsible for cluster membership and reliable cluster communication. GAB has two major functions:
          1: Cluster membership
GAB maintains cluster membership by receiving heartbeat from LLT. When a system no longer receives heartbeats from a cluster peer, GAB marks the node as down.
          2: Cluster communication
GAB provides the guranteed delivery of messages to all the systems. The atomic broadcast functionality is used by HAD to ensure that all systems within the cluster receive configuration change messages.

Q-17   What is LLT?
Ans:   Low Latency Transport (LLT) is used for all cluster communication. LLT has 2 major functions:
          1: Traffic Distribution
LLT works as a backbone for GAB. LLT distributes all inter communication across all configured network links. If a link is failes, traffic is directed to the remaining link.
          2: Heartbeat
                    LLT is responsible for sending and receiving heartbeat signals.

Q-18   How many network links are supported in LLT?
Ans: 8 links are supported.

Q-19   How many nodes can join a Cluster?
Ans:   Maximum of 32 nodes is supported in VCS.

Q-20   What is heartbeat?
Ans:   Heartbeat is an Ethernet broadcast packet. This packet notifies all othe nodes that sender is functional. This is the only broadcast traffic generated by VCS. Each node sends 2 hearbeat packets per second per interface. Heartbeat is used by GAB to determine cluster membership.

Q-21   What is split brain condition?
Ans:   When all the cluster interconnected links fail, it is possible for one cluster to separate into 2 subclusters, each of which doesn’t know about the other subcluster. The two subclusters could each carry out recovery actions for the departed system. For example two systems could try to import the same storage and cause data corruption.

Q-22   How do you shutdown a Veritas Cluster Server, leaving the applications running from the command line?
Ans: # hastop -all -force

Q-23   What is coordinator disk?
Ans:   Coordinator disks are three standard disks or LUNs set aside for I/O fencing during cluster reconfiguration. Coordinator disks do not serve any other storage purpose in the VCS configuration. These disks provide a lock mechanism to determine which nodes get to fence off data drives from other nodes. A node must eject a peer from the coordinator disks before it can fence the peer from the data drives. This concept of racing for control of the coordinator disks to gain the ability to fence data disks is key to understanding prevention of split brain through fencing.

Q-24   What is IO fencing and how to configure IO fencing?
Ans:   IO fencing is a feature that prevents data corruption in the event of a communication breakdown in a cluster. IO fencing is used to remove the risk associated with split brain condition. I/O fencing allows write access for members of the active cluster and blocks access to storage from non-members; even a node that is alive is unable to cause damage.

Q-25   How to upgrade VCS?
Ans:
1) Removing the deprecated resource type
2) Start the installvcs program which is under the directory cluster_server

Q-26   How to perform minimal downtime up-gradation in VCS?

Q-27   How to upgrade Solaris OS in which VCS is running?
Ans:   To upgrade a Solaris OS in which VCS is running, Follow the below instruction:

1) Stop VCS on this node
Make the VCS configuration R/W
# haconf –makerw

Move all service groups from this node to another node and freeze this node:
# hasys –freeze –persistent –evacuate <node name>

# Make the cluster configuration Read/Only?
# haconf –dump –makero

# Stop the cluster on this node
# hastop –force –local

2) Stop, unconfigure and unsinstall LLT and GAB on this node
Unconfigure GAB
# gabconfig –U

Unconfigure LLT
# lltconfig –U

Now remove GAB and LLT packages
# pkgrm VRTSgab VRTSllt

3) Now upgrade Solaris and switch to single user mode

4) Now Install and configure LLT and GAB
# pkgadd –d . VRTSgab VRTSllt

5) Now switch to multi user mode and start VCS
# init 3
# hastart

6) Now unfreeze this node
# hasys –unfreeze –persistent <node name>
# haconf –dump –makero

Unix Administration

Thursday, November 28, 2013

VCS questions

No comments:

Post a Comment

Translate

About Me