ASIC Bitmain Antminer S9K Maintenance Manual Download Page 12

 

                                                                                      S9k S9SE Maintenance Guide 

 

12 

 

V. Fault Type 

Common fault types of the S9K S9SE computing board: 

1. Cooling fin falls, shifts and deforms 

The cooling fin on the PCB board on the back of the computing board chip is not allowed to shift or collide before power on, especially the cooling fin with 

different voltages. The contact of the cooling fins in different voltage domains means that there is a possibility of short circuit at different voltage points. 

Moreover, determine that each of the cooling fin on the computing board has good heat conduction and is firmly fixed. 

When replacing or re-installing the cooling fin, clean the residual adhesive on the cooling fin and chip and then coat again. The residual thermally conductive 

adhesive can be cleaned with absolute alcohol. 

2. Impedance imbalance in each voltage domain 

When the impedance of some voltage domains deviates from the normal value, it indicates that there are open and short circuits in the abnormal voltage 

domain. Generally the chip is the most likely to cause it. But there are three chips in each voltage domain, and often only one has problem when fault occurs. 

The method of finding the problem chip can detect and compare the ground impedance of test points of each chip to find the abnormal point. 

If there is a short circuit, first remove the cooling fin on the same voltage chip, and then observe whether the chip pin’s tin is connected. 
 If a short-circuit point cannot be found on the appearance, search the short-circuit point according to the resistance method or current cut-off method. 

3. Voltage imbalance in voltage domain  

When the voltage in some voltage domains is too high or too low, there is usually an abnormal IO signal in the abnormal voltage domain or adjacent voltage 

domain, which causes the next voltage domain to work abnormally and the voltage to be unbalanced. The abnormal point can be found by detecting the 

signal and voltage of each test point, and some need to find the abnormal point by comparing the impedance of each test point. 

Note that the CLK signal and the NRST signal are the two most likely to cause a voltage imbalance. 

4. Lack of chips 

The lack of chips means when the test box is being checked, not all of the 60 chips are detected, and often not all the chips are actually detected. The actually 

lost (undetected) abnormal chips are not in the displayed position. At this time, it is necessary to accurately locate the abnormal chip through testing. 

The locating method can use the RI cutoff method to find the location of the abnormal chip. That is, ground the RI signal of a chip, for example, after the RI 

output of the 50th chip is grounded in the voltage domain, theoretically, if all the chips in the front are normal, the test box should display that 50 chips are 

detected. If not all 50 chips are detected, it means that the abnormality is before the 50th chip; if 50 chips are detected, it means that the abnormal chip is 

after the 50th chip. Use this dichotomy to find out where the abnormal chip is located. 

5. Broken chain 

A broken chain is similar to lack of chips, but in a broken chain, not all chips that cannot be found are abnormal, but all the chips after the abnormal chip are 

invalid due to a certain chip abnormality. For example, a chip itself can work, but it will not forward other chip information; at this time, the entire signal 

chain will come to an abrupt end, and lose a large part of it, which is called broken chain. 

The broken chain port information can be displayed. For example, when the test box detects the chips, only 30 chips are detected. If the number of preset 

chips is not detected in the test box, it will not run, so it will only display how many chips are detected, at this time, according to the displayed number "30", 

the problem can be found by detecting the voltage and impedance of each test point before and after the 30th chip. 

6. No running 

No running means that the test box cannot detect the chip information of the computing board, but displays 

NO hash board

; this phenomenon is the most 

common and the fault range involved is also wide. 

1) No running caused by abnormal voltage in a certain voltage domain; the problem can be found by measuring the voltage in each voltage domain. 
2) A chip abnormality causes an abnormality that can be found by measuring each test point signal. 

CLK signal: 0.9V; the signal is output from chip U1 chip to chip U60. In the current version, there are only two crystal oscillators, Y1 is transmitted from 

the first chip to the 30th chip, and Y2 is transmitted from the 31st chip to the 60th chip, and the CLKO signal is abnormally searched according to the 

direction of signal transmission. 

CO signal: 1.8V; this signal is transmitted through chips U1, U2,,,,, U60, when a certain point in the binary method is abnormal, it can be detected 

forward. 

RI signal: 1.8V; this signal is returned from chips U60,,,,,, U2, U1, confirm the cause of the fault through the chip signal trend; when S9K S9SE 

computing board does not run, the signal is the highest priority, first search for this signal. 

BO signal: 0V, this signal can be lowered to high level when the chip detects that the RI return signal is normal, otherwise it is high level. 
NRST signal: 1.8V; after the computing board is powered and the IO signal is inserted, the signal is transmitted from U1, U2,,,,, and U60 to the last chip. 

3

LDO 0.8V, 1.8V abnormality maintenance

 

The normal value of the ground impedance of the LDO 0.8V IC output is 50-100 

Ώ, 

and the normal impedance of the LDO 1.8V IC output is 

0.9K

Ώ.

 

There are six LDO 1.8V single computing boards and twelve LDOs 0.8V (for example, the power supply of domain 1 U1-U10 is U61 LDO 1.8V , 

the power supply of U1-U5 is 0.8V U117, and the power supply of U6-U10 is 0.8V U79), Since the LDO is operated in series, the LDO ground 

short-circuit can be repaired by using the two-fifth method. First, take the middle chip, remove them one by one, and find the problem chip to 

replace it; 

4

Single board Patter NG repair 

Serial port print log (logo information), single-chip and whole-chip computing board none recovery rate needs to reach 98%, if noce response 

rate is lower than 98%, report Patter NG; according to serial port print log, give priority to the replacement of the chip with the lowest single 

chip noce recovery rate; 

5) The whole machine J: 4 maintenance 

1: J: 4 does not store the temperature sensing chip position, and needs to test with the test jig once, the temperature sending information is written 

into the EEPROM chip IC through the single board test jig; 

2: The single board jig configuration file is wrong (the chip of the computing board, the BIN level does not match the jig configuration file), 

resulting in the whole machine reporting J: 4; 

 
 
 
 
 
 
 

Summary of Contents for Bitmain Antminer S9K

Page 1: ...es of the single board test jig program III Principle and Structure 1 Principle overview 1 S9K S9SE computing board is composed of 6 voltage domains connected in series There are 10 BM1393 chips in ea...

Page 2: ...chip domain distribution signal path and circuit distribution of the S9K S9SE signal board DC DC input IO J4 IO Block J4 Clamping circuit EEPROM EEPROM chip IC Domain voltage signal level shifting IC...

Page 3: ...from chip U60 to chip U1 and then returns to the control board from the pin J4 8 at IO port when the IO signal is not inserted the voltage is 1 8V and the voltage is 1 8V when computing Signal BO BI...

Page 4: ...2 Schematic diagram of DC to DC circuit 2 2 3 Schematic diagram of EEPROM IC single board test will change the magic number temperature sensing information and CRC information in the EEPROM 2 2 4 Sche...

Page 5: ...S9k S9SE Maintenance Guide 5 2 2 5 Schematic diagram of PIC U102 2 2 6 Signal test points of each chip as shown below after amplified...

Page 6: ...S9k S9SE Maintenance Guide 6 1 3 5 Signal test points in Domain 1 3 5 2 4 6 Signal test points in Domain 2 4 6 2 2 7 Pin circuit diagram of each chip in Domain 1 3 and 5 1 3 5 2 4 6...

Page 7: ...S9k S9SE Maintenance Guide 7 2 2 8 Pin circuit diagram of each chip in Domain 2 4 and 6 2 2 9 Circuit diagram of J4 at IO port...

Page 8: ...S9k S9SE Maintenance Guide 8 2 2 10 0 8V 1 8V circuit schematic diagram 2 2 11 Schematic diagram for Level signal conversion...

Page 9: ...S9k S9SE Maintenance Guide 9 2 2 12 Schematic diagram for Y1 Y2 crystal oscillator 2 2 13 LDO 0 8V 1 8V and crystal oscillator measurement...

Page 10: ...tage set by the test program of PIC jig and boosts as it works Then the jig outputs WORK and returns to noce after computing At this point the normal voltage of each test point should be CLKO 0 9V CO...

Page 11: ...nd several of the signals CLK CO BO NRST are transmitted forward U1 U60 and an abnormal fault point is found through the power supply sequence 5 When locating to the faulty chip the chip needs to be r...

Page 12: ...mple a chip itself can work but it will not forward other chip information at this time the entire signal chain will come to an abrupt end and lose a large part of it which is called broken chain The...

Page 13: ...nt signal apart from the metal exposed at the contact end the other parts of the test lead must be sealed with a heat shrinkable tube so as to prevent the test lead from contacting with the cooling fi...

Reviews: