data:image/s3,"s3://crabby-images/7e155/7e155a8a02c3b8ec36c906b1eac87940b2a8ad13" alt="Graphcore IPU-POD128 Build And Test Manual Download Page 99"
IPU-POD128 build and test guide
Step 12: Restart IPU-M2000s on lrack2
12a)
Restart the IPU-M2000s on lrack2 by using the BMC controller to manually power cycle them with the
following commands run on lrack2:
rack_tool.py power-cycle
rack_tool.py run-command c reboot d bmc
This will use the old
/home/ipuuser/.rack_tool/rack_config.json
file.
12b)
Save the old
rack_config.json
file on lrack2 as this is not required any more:
cp /home/ipuuser/.rack_tool/rack_config.json /home/ipuuser/.rack_tool/rack_config.json_pod64
12c)
Update the
rack_config.json
file on lrack2 to an IPU-POD
128
setup:
cp /home/ipuuser/.rack_tool/rack_config.json_pod128 /home/ipuuser/.rack_tool/rack_config.json
More details about the contents of
rack_config.json_pod128
can be found in
12d)
Use
rack_tool
on lrack2 to verify that the new lrack2 IPU-M2000 IP addresses have been set up correctly:
rack_tool.py status
Step 13: Verify IPU-M2000 interface access
Note:
this step is only required if the IPU-POD
128
is NOT fully connected with spine switches.
Run the following
rack_tool
command on lrack1 to verify that there is no access to the IPU-M2000 interfaces
on lrack2 from lrack1:
rack_tool.py status
The IPU-M2000 RNICs on lrack2 will fail as they are not reachable from lrack1 unless there are spine switches.
Step 14: Create V-IPU cluster on lrack1
Create a new V-IPU cluster on lrack1 and add all the V-IPU agents from both IPU-POD
64
racks (lrack1 and lrack2)
to the cluster. There is one V-IPU agent per IPU-Gateway on each IPU-M2000 so there will be 32 in total. Make
sure the cluster is added as a torus if the IPU-Link and GW-Link cables are connected as a loop. To do this you
need to use the
--cluster-topology looped
and
--topology torus
options. The
--cluster-topology
argument
defines the GW-Link topology (horizontal torus) and the
--topology
argument defines the IPU-Link topology
(vertical torus).
For example:
vipu-admin create cluster cl128 --num-ilds 2 --topology torus --cluster-topology looped --agents ${ALL_IPUM_NAMES_FROM_
˓
→
vipu_list_agents}
Version: latest (2021-11-25)
95