9 Troubleshooting
This section covers various areas where problems might occur and offers suggestions for
troubleshooting and fixing the issues.
9.1 Network Configuration Issues
Cluster configuration problems are often related to improper network configuration. Some areas
to check are:
•
Active Directory—The head node and each compute node must be members of the same
domain before the Compute Cluster Pack is installed.
•
Firewalls—The Windows Firewall can occasionally prevent nodes from being accessed. This
depends on your exact network configuration, so turning the Windows Firewall on or off
in the HPC Management Console may help. The firewall setting for the Private and
Application network should be turned off.
•
File Shares—If the compute nodes are isolated on the private network (without direct access
to your public network), then all access to a public network file server from compute nodes
is through the Network Address Translation (NAT) or Routing and Remote Access on the
head node. The file server will see these connections through the NAT as coming from a
single node and will therefore allow only one connection at a time. This can cause jobs to
fail if multiple compute nodes require access to the same server at the same time. If compute
nodes are not on the public network, HP recommends configuring the file share on a node
on the private network. This can be the head node, a compute node, or a dedicated file
server.
9.2 Provisioning Issues
Provisioning can fail for any number of reasons. Watch the provisioning log on the head node
to see each node's progress. If the node fails to provision, check the following items:
•
Watch the node console as the node boots. The node should PXE boot and get an IP address
through DHCP from the head node. If neither of these happen, check the following:
— The BIOS is selecting the correct NIC to PXE boot. This must be the NIC connected to
the private network.
— The head node is connected to the private network, and the Private NIC on the head
node is properly configured with a private IP address. For preconfigured clusters, this
should be 10.1.3.1.
— Verify the NIC is obtaining an IP address through DHCP. It should indicate the NIC
obtained an address and list the head node's private NIC IP as the DHCP server. For
preconfigured clusters, this is 10.1.3.1.
— Verify the DHCP service is running on the head node.
•
Continue monitoring the compute node console. If the correct NIC is PXE booting and getting
a correct IP address through DHCP from the head node private IP, verify the boot record
is being sent to the compute node:
— If you see an error indicating no boot record/file was sent, verify that the head node
WDS (Windows Deployment Server) service is running.
— Verify the firewall on the head node for the private network is turned off.
— Check the event log for any WDS errors.
— Check that the head node is actively provisioning the specified compute node. If not,
right-click the node and assign the desired template to the node for preconfigured
nodes. For non-preconfigured nodes, click
Add Node
and wait for the node to respond.
Then select and assign a template.
9.1 Network Configuration Issues
63