# ALMA MATER STUDIORUM - UNIVERSITÀ DI BOLOGNA

### SCHOOL OF ENGINEERING AND ARCHITECTURE

DEPARTMENT of ELECTRICAL, ELECTRONIC AND INFORMATION ENGINEERING "Guglielmo Marconi" DEI

## **MASTER'S DEGREE in**

Advanced Automotive Electronic Engineering

MASTER'S THESIS in Power Electronics

# Lifetime Modeling of SiC Power Modules in Automotive Traction Inverters

CANDIDATE

Matteo Totaro

SUPERVISOR

Chiar.mo Prof. Alessandro Chini

HOST COMPANY SUPERVISOR

Ing. Nicola Fercodini

Academic Year 2021/2022

Session II

#### Abstract

One of the key issues to build a sustainable society is the reduction of CO2 emissions. Many industries in the field of electrical power production, home appliances and transportation, are saving energy and reducing their emissions. Nowadays electronics is widely used in many different application fields, such as industrial, consumer, automotive, aerospace, etc. In the automotive sector, electric vehicles (EVs) are an effective solution.

EVs include power electronic components, such as inverters, motors and batteries. Power semiconductor devices are fundamental components of all electronic systems that generate, manage and distribute energy. The increasingly demand for reduction in size and cost, as well as higher efficiency and power capability are common for all these applications. In fact, downsizing the inverters has been strongly demanded, but the output power for EV traction is large. Therefore, both power density and power efficiency are key performance factors.

Silicon has dominated power semiconductor industry for several decades, thanks to many years of development and well-established fabrication technology leading to high manufacturing capability and extremely low cost. Nevertheless, Si-based devices have almost approached their performance limits; consequently, a different strategy is needed to overcome this issue.

In the field of power electronics, the power device responsible for switch control is the key to performance. As a result, SiC and GaN, two materials that are part of the wide-bandgap (WBG) semiconductors (also known as third-generation semiconductors) have gradually entered people's field of vision. The three main advantages that are provided by SiC MOSFETs in the electric drive inverters are superior high-voltage, high-temperature, and high-frequency performance.

Silicon Carbide is a promising material for the realization of high voltage, high power devices, offering improved efficiency, reduced size, and lower overall system cost.

In the SiC field, the recent actions of other leading electric vehicle companies and new vehicle manufacturers have also been attracting attention.

One example is Tesla, who recently released a new version of their model Model S Plaid using an SiC inverter; while launching the "BYD Han", its first model with SiC technology, BYD Company also announced that by 2023 it will achieve full replacement of silicon Si-based IGBTs with SiC automotive power semiconductor devices.

NIO has also expressed that it will adopt an electric drive system based on SiC technology on its new ET7 models for 2022.

In the interest of climatic change and the increase of the average temperature of the globe, the European Union set carmakers a target to cut carbon dioxide emissions by 40% between 2007 and 2021. EU lawmakers also agreed in December 2018 a further cut in CO2 emissions from cars of 37.5% by 2030 compared with 2021 levels. That's why carmakers are completely switching to hybrid/electric or taking action to start to produce also hybrid and electric vehicles.

Electric vehicles are one of the biggest addition to the market but also one of the newest; to this end, one of the main concerns of the customers, but also from the manufacturers, is how long will these vehicles last. Now guidelines and validation procedures are being established since we are building experience on these new electric systems. New challenges that differ from the thermal combustion engine must be faced.

In this context, the aim of this thesis work is the formulation of a lifetime model regarding SiC Power MOSFETs as power modules inside of an automotive inverter, both through software simulations and experimental characterizations. The goal is to estimate how the module of the inverter will degrade overtime throughout its lifetime inside the inverter of an electric vehicle.

The formulation of the model and the tests were performed inside a famous italian luxury carmaker during my internship in the Research and Development branch of the Hybrid department.

# Contents

| 1 | Silicon Carbide                               |                                                         |    |  |  |  |  |  |
|---|-----------------------------------------------|---------------------------------------------------------|----|--|--|--|--|--|
|   | 1.1                                           | SiC Technology                                          |    |  |  |  |  |  |
|   | 1.2                                           | Silicon Carbide Devices                                 | 7  |  |  |  |  |  |
|   |                                               | 1.2.1 SiC diodes                                        | 7  |  |  |  |  |  |
|   |                                               | 1.2.2 SiC Power MOSFETs                                 | 9  |  |  |  |  |  |
|   | 1.3                                           | DSC SiC MOSFET                                          |    |  |  |  |  |  |
|   |                                               | 1.3.1 Dual-side Cooling                                 | 19 |  |  |  |  |  |
|   |                                               | 1.3.2 NTC Thermistor                                    | 22 |  |  |  |  |  |
| 2 | Lifetime Modeling                             |                                                         |    |  |  |  |  |  |
|   | 2.1                                           | ECPE AQG 324 "Automotive Qualification Guideline"       | 24 |  |  |  |  |  |
|   | 2.2                                           | Lifetime Testing                                        | 27 |  |  |  |  |  |
|   |                                               | 2.2.1 Power Cycling                                     | 28 |  |  |  |  |  |
|   | 2.3                                           | MATLAB Script Implementation                            | 33 |  |  |  |  |  |
|   |                                               | 2.3.1 Lifetime Estimation                               | 52 |  |  |  |  |  |
|   |                                               | 2.3.2 Script Results                                    | 58 |  |  |  |  |  |
| 3 | Proposed Hardware Tests 6                     |                                                         |    |  |  |  |  |  |
|   | 3.1                                           | Test Bench Setup for Power Cycling                      |    |  |  |  |  |  |
|   | 3.2                                           | Test Bench Setup for $R_{th}$ Measurement               |    |  |  |  |  |  |
| 4 | 3.2 Test Bench Setup for $R_{th}$ Measurement |                                                         |    |  |  |  |  |  |
|   | 4.1                                           | .1 Temperature measurement with the internal body-diode |    |  |  |  |  |  |
|   | 4.2                                           | 2 Diode Characterization in Temperature                 |    |  |  |  |  |  |
| 5 | Con                                           | Conclusions 75                                          |    |  |  |  |  |  |

# Chapter 1

# Silicon Carbide

## 1.1 SiC Technology

Silicon Carbide (SiC) is a wide band-gap semiconductor material that crystallizes in a wide variety of structures, each of which exhibits unique electrical, optical, thermal, and mechanical properties. It is a very robust and reliable material whose physical properties are very important subjects of academic study as well as critical parameters for accurate simulation of devices.

SiC is a compound semiconductor, which means that only a rigid stoichiometry, 50% silicon (Si) and 50% carbon (C), is allowed. It has different polymorphic crystalline structures called polytypes, i.e. it can have more than one crystal structure different for the stacking order of succeeding layers of Carbon and Silicon atoms. Each polytype has its specific physical features. Among all structures, 3C-SiC, 4H-SiC and 6H-SiC are the most studied for electronic applications.

For power devices, 4H-SiC is considered to be ideal.

In recent years, Silicon Carbide Power devices, mainly power diodes and MOSFETs, have become commercially available and have begun to replace their Silicon counterpart in many application areas.

| Properties                                    | Si      | 4H-SiC    | GaN       |
|-----------------------------------------------|---------|-----------|-----------|
| Crystal Structure                             | Diamond | Hexagonal | Hexagonal |
| Energy Gap $E_G[eV]$                          | 1.12    | 3.26      | 3.5       |
| Electron Mobility $\mu_n [cm^2/Vs]$           | 1400    | 900       | 1250      |
| Hole Mobility $\mu_p \ [cm^2/Vs]$             | 600     | 100       | 200       |
| Breakdown Field $V_{BR} [V/cm*10^6]$          | 0.3     | 3         | 3         |
| Thermal Conductivity [W/cm°C]                 | 1.5     | 4.9       | 1.3       |
| Saturation Drift Velocity $v_s [cm/s * 10^7]$ | 1       | 2.7       | 2.7       |
| Relative Dielectric Constant $\varepsilon$    | 11.8    | 9.7       | 9.5       |

Table 1.1: Comparison between electrical properties of Silicon, Silicon Carbide and Gallium Nitride

#### Wide band-gap

The physical and electrical properties of wide band-gap materials determine the functional and application characteristics of the power semiconductors built with them. From a physical standpoint, all solid-state elements have electrons that either are tied to the nucleus of the element or are free to move at a higher energy level (respectively, the valence band and the conduction band). The energy gap between the valence and conduction bands is an essential physical parameter for defining and framing a widebandgap semiconductor (band-gap).

WBG materials' vast band-gap translates to a higher breakdown electric field, higheroperating-temperature capability, and lower susceptibility to radiation. Silicon Carbide has a band-gap that is approximately 3 times wider than that of Si. As the operating temperature rises, the thermal energy of the electrons in the valence band increases accordingly and passes into the conduction band once a specific threshold temperature is reached. The threshold temperature required for the transition from the valence band to the conduction band is 150°C in the case of silicon. Because of their high energy gap, WBG semiconductors can reach much higher temperatures without the need for electrons to accumulate energy. Thus, the greater the band-gap, the higher the sustainable semiconductor operating temperature enabling temperatures up to 200°C.

#### Low on-resistance/area

$$R_{ON} = \frac{4 * V_{BR}^2}{\mu * \varepsilon * E_G^3}, [\Omega * cm^2]$$
(1.1)

SiC is considered as a promising material for power devices that can exceed the limit of Si: with dielectric breakdown electric field strength approximately ten times higher than that of Si, SiC can achieve very high breakdown voltage from 600 V to thousands of volts. Having an higher energy gap unlocks various advantages in terms of density of power, compared to Silicon: we can have the same breakdown voltage level in a very small form factor, up to ideally 1000 times smaller. This translates to a lower  $R_{ON}$  as the formula 1.1 suggests. It's an increase in terms of density of power, which is not always a good thing when there's lots of heat that must be dissipated from a small package.



**Figure 1.1:** *Relationship between field strength and depletion layer width of Si SBD and SiC SBD [source: Fuji]* 

Fortunately, SiC has also a very high thermal conductivity that helps a lot to keep it cool with heat-sinks properly designed. Small  $R_{ON}$ -per-area means also that we can have less capacitances around the chip which is the reason why it can switch very fast. Doping concentrations can be made higher than those in Si devices, and drift layers can be made thin. Nearly all of the resistance component of a high voltage power device is the resistance of the drift layer, and the resistance value increases in proportion to the thickness of the drift layer. When using SiC, the drift layer can be made thin, and so a device with a high voltage and extremely low turn-on resistance per unit area can be fabricated. Theoretically, for a given high voltage, the drift layer resistance per unit area can be reduced to 1/300 of that for Si.

### Advantages

Currently, IGBT modules that combine the Si IGBT and FRD (Fast Recovery Diodes) are widely used for power modules that can handle a large current. IGBTs in particular are the very competitors of SiC MOSFETs. Being a bipolar device, the IGBT needs lots of effort to be switched off, that's why a large switching loss is caused by its tail current; this can be significantly reduced with SiC modules being unipolar devices. The following are the main advantages of implementing a SiC MOSFET:

- 1. Improvement in the power supply efficiency and simplification of the cooling structure due to reduction in the switching loss:
  - Lower diode losses, thanks to the fast recovery response of the body diode
  - Lower switching losses, since it can switch on and off faster, so less heat generation
  - Lower conduction losses, thanks to a lower  $R_{DS,on}$  per area

(example: downsizing of heat-sink, replacement of water cooling/forced air cooling with natural air cooling)

- 2. Downsizing of peripheral parts due to increase in the operation frequency (example: downsizing of reactors, capacitors, etc.) Their applications extend to various uses, including power supplies for industrial equipment and power conditioners for solar power generation.
- 3. Fast-switching capabilities thanks to a small  $R_{DS,on}$ .
- 4. High thermal conductivity, which helps pushing away the heat generated, very useful especially when the chip is very small.

Whenever size, weight and volume are key constraint, high-switching high-power SiC MOSFETs are the way to go.

#### Disadvantages

Despite the fast progress in device technology experienced recently, which allowed the fabrication of devices with increasing performances, there is still margin for quality and cost improvements. In fact, a wider spread of these devices could not be achieved without a deep analysis of the elements that might affect their reliability. SiC modules also have an higher risk of failure linked to their higher switching frequency: it's a trade-off that is strongly balanced by their robustness as a material.

Also, the ability to go very high in temperature and cool down quickly thanks to its high thermal conductivity means that it is susceptible to high thermal jumps.

As a matter of fact, temperature is the main concern of these type of modules: among all reliability, durability and lifetime testing, temperature is almost always present as a critical parameter of the test.

In general though, SiC is not a technology as mature as Silicon, so improvements are expected in the following years.

# **1.2 Silicon Carbide Devices**

## 1.2.1 SiC diodes

Silicon Carbide unlocks high breakdown voltage diodes above 1200 V which can be realized using the Schottky barrier diode (SBD) structure, in fact SiC diodes are mostly Schottky diodes; the Si-based ones can reach about 200 V.



Figure 1.2: Diode's reverse recovery phenomena comparison [source: ST]

In Si-based high speed PN diodes (FRD, fast recovery diodes), a large transient current flows momentarily when the direction switches from forward to backward, causing a large loss due to transition to the reverse bias state during this period. This is attributed to a contribution to electrical conduction by minority carriers that have been stored within the drift layer during a forward conduction until they disappear (storage time).

The larger the forward current and the higher the temperature, the longer the recovery time and the larger the recovery current, resulting in a significant loss (Reverse Recovery phenomena).

In contrast, since SiC SBD are majority carrier devices (unipolar devices), which means they use only electrons for electrical conduction, no accumulation of minority carriers occurs in principle. Only a small current flows that is sufficient to discharge the junction capacitance, and the loss can be significantly reduced compared with Si-based fast diodes. Since this transient current is mostly independent of the temperature and the forward current, a stable and high speed recovery can be achieved in any environment. These characteristics translate to a significant power-loss reduction thanks to their fast recovery characteristics with respect to Silicon as Figure 1.2 shows. They are commonly used in hard-switching applications such as high-end-server and telecom power supplies, while also intended for solar inverters, motor drives, power-factor correction circuits (PFC) and uninterruptible power supplies (UPS).



Figure 1.3: Charge stored during recovery [source: Infineon]

The quasi "reverse recovery" charge  $Q_C$  and the switching power losses of SiC Schottky diodes are not only ultra low. Compared to silicon ultra fast diodes, where losses strongly depend on dI/dt, current level and temperature, they are more or less independent on these boundary conditions as shown in Figure 1.3.

A dependency of  $Q_C$  on these parameters can not be seen at the same scale as with a benchmark Si diode approach. This is due to the capacitance like behavior of SiC device in reverse direction.

## **1.2.2** SiC Power MOSFETs

It's the Silicon Carbide that is making possible the simultaneous attainment of these three main features of a power device, an high breakdown voltage, a low on-resistance and high speed switching. In this way the converter can achieve high-frequency and high-efficiency performance. The intrinsic properties of this material explained before increase the power density of the chip, with smaller volume of passive components and reduced cooling requirements.

|            | Voltage Breakdown | Switching Speed | Power    |
|------------|-------------------|-----------------|----------|
| SiC MOSFET | high              | medium          | > 6-10kW |
| Si IGBT    | high              | low             | > 6-10kW |
| GaN        | mid-low           | high            | < 3-4kW  |

**Table 1.2:** Comparison of the main properties of tipical power modules

In a Si power device, IGBTs (insulated-gate bipolar transistors) and other minoritycarrier devices (bipolar transistors) have mainly been used in the past in order to alleviate the increase in turn-on resistance that accompanies higher breakdown voltages. However, the large switching losses give rise to heat generation problems, imposing limits on high-frequency driving.

Using SiC, such fast majority-carrier devices as Schottky barrier diodes and MOS-FETS can be designed for high voltages, making possible the simultaneous attainment of these three main features of a power device, namely "high breakdown voltage", "low on-resistance", and "high speed".

As expected from these comparisons, silicon-based power semiconductor devices have almost reached material theoretical limits; therefore, new alternatives are continuously taking the place of the Silicon to satisfy high-performance power systems requirements (SiC, GaN, etc). It is still largely used thanks to its lower cost and it covers the majority of the voltage spectrum since 'it does the job adequately'.

While GaN have their spot at lower to mid voltages where they grant high frequencies, efficiency and power density, SiC stands out at higher voltages with these same advantages as previously said: at 650V already they can reach higher temperatures than GaN/IGBTs, they have one of the lowest  $R_{DS,ON}$  (lower conduction losses), which combined with lower switching losses thanks to their fast switching will increase efficiency dramatically. On the other side, a GaN device that work between 650V to 1200V needs to be extended laterally since its voltage capabilities are dictated by the gate-to-drain gap. This means that the die will have a bigger size and so an higher on-resistance. Also, since GaN can't avalanche, it must be extended even further, that's why SiC outperforms the GaN in terms of power density, size and cost.



**Figure 1.4:** *Distinction in terms of Power outlet and switching frequency of Si, SiC and GaN power modules.* 

#### **Device features**

An high breakdown voltage and low resistance can be simultaneously achieved in the SiC-MOSFET, which has a high speed device structure. Therefore, replacing the IGBT with the MOSFET can significantly reduce the switching loss and downsize cooling structures such as a heat sink. The high frequency MOSFET drives can also contribute to downsizing of passive components, which is impossible with the IGBT ones. Furthermore, the SiC MOSFET also have advantages over the Si MOSFET in the range between 600 V and 900 V, such as smaller chip area (enabling mounting on smaller packages) and a very small recovery loss of the body diode as explained in the previous section.

Since the dielectric breakdown field strength of SiC is approximately 10 times higher than that of Si, high breakdown voltage can be achieved with a drift layer having a low specific resistance and a thin film thickness. Therefore, compared at the same breakdown voltage, a device with a smaller standardized on-resistance (RonA: onresistance per unit area) can be manufactured. When compared at the breakdown voltage of 900 V, for example, the SiC MOSFET can realize the same on-resistance with a chip size approximately 1/100 of that of the Si MOSFET. This enables a reduction of the on-resistance with a smaller package as well as the gate charge  $Q_g$  and the capacitance.

#### $V_{DS} - I_D$ characteristics

Since there is no rising voltage like the IGBT, the SiC MOSFET can realize a low conduction loss in a wide current region from small to large current. Furthermore, although the on-resistance of the Si MOSFET is increased by two times or more at 150°C from room temperature, a relatively low increase rate makes thermal design easier with the SiC MOSFET, and a low on-resistance can be realized even at high temperature.



**Figure 1.5:**  $V_{DS} - I_D$  characteristics at 25°C (left) and 150°C (right) [source: ROHM]

The drift layer resistance of the SiC MOSFET is lower than that of the Si MOSFET. However, since the mobility in the MOS channel part is limited by the current level of technology, the resistance in the channel part is higher compared with the Si devices. Therefore, a lower on-resistance can be obtained with a higher gate voltage (gradually saturated over  $V_{GS} = 20V$ ). The SiC MOSFET cannot exhibit the original performance regarding the low on-resistance if the driving voltage  $V_{GS} = 10 - 15V$ , which is the range used for the general IGBT and Si MOSFET.

Therefore, manufacturers recommend driving the SiC MOSFET around  $V_{GS} = 18$  V in order to obtain a sufficiently low on-resistance (source: ROHM [9]).

#### **Temperature coefficient of on-resistance**



**Figure 1.6:** *Temperature characteristics of standardized* R<sub>DS,on</sub> *of 650 V SiC MOSFET, Si MOSFET, and Si IGBT [source: ROHM [9]]* 

The on-resistance of the general Si high breakdown voltage MOSFET significantly increases at high temperature. The reason for this increase is as follows: the resistance of the drift layer (REPI), which accounts for 90% or more of the on-resistance of a device, tends to increase by approximately two times when temperature is increased by 100°C. As with the trend for Si, the resistance of the drift layer for SiC also increases in the same way. However, the increase rate of the on-resistance of a whole device is lower compared with the Si MOSFET. This is because the drift layer accounts for a small proportion of the on-resistance in a SiC device and many other resistance components are contained in the on-resistance.

#### **Recovery characteristics of body diode**

The high speed recovery performance of the body diode of a SiC MOSFET can reduce the turn-on loss  $(E_{on})$  by several points in percentage. It is similar in terms of speed to a SiC SBD (Schottky Barrier Diode). Although the body diode of the SiC MOSFET is a pn diode, the storage effect of minority carriers is scarcely observed due to a short lifetime for the minority carriers, and a very high speed recovery performance (several tens of nanoseconds) similar to the SBD is obtained. The most notable feature of the SiC MOSFET is that the tail current, which is observed in the IGBT, is not generated in principle. With SiC, the high speed MOSFET structure can be manufactured even at a breakdown voltage of 1200 V or greater.

Therefore, reduction in the turn-off loss  $(E_{off})$  by approximately 90% compared with the IGBT can be achieved, contributing to energy conservation of the circuit as well as simplification and downsizing of the cooling mechanism.

While the tail current in the IGBT increases at higher temperature, almost no temperature dependence exists in the MOSFET.

It is known that the breakdown voltage increases with temperature in SiC as much as Si-based MOSFETs. Instead, the switching characteristics aren't affected by the temperature as this Figure 1.7 from ROHM [9] shows:



 $V_{\rm DS}$ = 600V ,  $I_{\rm D}$ = 20A ,  $R_{\rm G_{EXT}}$ = 0 $\Omega$ 

Figure 1.7: Temperature dependence of switching loss [source: ROHM [9]]

### Reliability

Since the SiC power devices are used for industrial equipment with a generally long service life, the device performance should be maintained over a long period. Furthermore, since they are often used in harsh environments subjected to very large thermal and electrical stress, they must be evaluated under various conditions and the required period for maintaining the performance must be secured (life design). Ideally, the reliability should be evaluated under the stress conditions to which the devices will be subjected in the operating environments.

However, since the products with a product life over 10 years cannot be evaluated under the exact same conditions, alternative testing methods, such as the accelerated test, are generally used.

In this case, it is important to understand the operating environments correctly and set the conditions including the acceleration accordingly.

For what concerns the chip structure of the SiC MOSFET, the reliability test must be performed in anticipation of various types of external stress including thermal, electrical, and mechanical ones. In addition, since the factor is often compounded under the actual operating conditions, it is important to evaluate the reliability under the closest conditions possible to the actual use.



**Figure 1.8:** Failure mode due to  $\Delta T_{vj}$  power cycling [source: ROHM [9]]

When the MOSFET is switched ON and OFF within a few ms to a few seconds, the junction temperature of the internal chip  $(T_{vj})$  may be varied by a certain amount  $(\Delta T_{vj})$ , even when the case temperature  $(T_c)$  appears to be constant.  $\Delta T_{vj}$  generated in such a short period causes thermal stress on the bonding surface through difference in the linear expansion coefficient of the source wire and the SiC chip.

Then, if the number of  $\Delta T_{vj}$  cycles exceeds a certain value, the interface is cracked and the bonding strength is reduced as shown in Figure 1.8. The repetition of this ON/OFF behaviour performed N times is a specific lifetime test called Power Cycling, it will be investigated in the next chapter and it is the base of the lifetime modeling of power electronics components.

Finally, separation of the source wire or increase in the contact resistance of the bonding surface can cause  $R_{DS,on}$  to increase, leading to damage to the power device through increase in heat generation. In fact, the  $V_{DS}$  is exactly one of the two parameters that are monitored in these lifetime tests [1] since it can quantify how much the module has degraded after some test. For the power cycle rating of the source wire, the life tends to shorten exponentially as  $\Delta T_{vj}$  increases. To secure a product life of 10 years or longer, it is necessary to attempt to decrease  $\Delta T_{vj}$  sufficiently (i.e., by adjusting the driving conditions, selecting elements with low  $R_{DS,on}$ ) as well as the cooling design during the design phase. We will dive more into this concept in the next chapter since the lifetime modeling is the goal of this work thesis.

#### **Inverter Implementation**

A three-phase inverter converts a DC input into a three-phase AC output. Its three arms are normally delayed by an angle of 120° so as to generate a three-phase AC supply. The goal is to use a steady state DC-voltage and by the means of six switches (e.g. transistors) emulate a three-phased sinusoidal waveform where the frequency and amplitude are adjustable.

An automotive inverter system has DC-link capacitor for energy storage and voltage stabilization connected on one side to the battery of the vehicle, and on the other side to the inverter bridge: the six transistor (Power MOSFETs) are placed in three pairs connected in parallel where each couple links two transistor in series to form a leg of the inverter. The middle node between each transistor of a leg individuates the phase of the leg. From each phase, an output terminal toward the electric motor brings each of the three phase of the inverter to a three-phase electric motor.



Figure 1.9: Typical Three-Phase Inverter Circuit

The controller that we see in Figure 1.9 measures, among hundreds of parameters, the speed of the motor with any speed/position sensor in order to feedback the information, process it and act accordingly through the gate driver.

We can simplify the schematic with a simple ON/OFF switch for each transistor in order to understand the formulation of the sinewave. The gate driver will open/close the switches based on the requests of the micro-controller of the ECU.



Figure 1.10: Simplified version of a Three-Phase Inverter [25]

Each of the three binary digits shown in Figure 1.10 refers to one bridge leg where the value 1 indicates that the top transistor is closed whereas the value 0 indicates that the bottom transistor is closed. The switching combinations of these 3 legs will generate a sinewave based on the PWM modulation controlled by the gate driver. The speed of turn-on and turn-off of the switches is the switching frequency and SiC power modules in particular are capable of very high switching speeds compared to Silicon IGBTs.

This thesis is focused on the lifetime of a single power module and since our DUT has two MOSFETs that form a leg, we will study the damage only on one phase out of three of the inverter.

# **1.3 DSC SiC MOSFET**

The DUT (device under test) that we refer to is a high-power density, high-efficiency dual-side cooled (DSC) Silicon Carbide power module that is part of a three-phase automotive inverter system. The package of DUT contains two MOSFETs in series that form a leg of the inverter, an high-side and a low-side module, with the phase node between them. In Figure 3.10 are shown the internal body diodes of each transistor.



Figure 1.11: DUT circuit schematic

### **Body Diode**

Current SiC MOSFET modules are divided into two types, namely those that have only SiC MOSFETs in parallel in the module and modules composed of SiC MOSFETs and SiC SBDs (Schottky Barrier Diode) in parallel. However, the thermosensitive electrical parameter measurement is only applicable to the first type of module. Our SiC DUT implements its body-diode as anti-parallel diode for its blocking capabilities without the need of a Schottky Barrier Diode (SBD). This choice unlocks the measurement of the internal body-diode's voltage of the SiC junction.

This measurement is possible only in the case of no additional diode in parallel because the threshold voltage of the Schottky diode is lower than the one of the internal junction  $(V_{th,SBD} < V_{th,BODY})$  which means that if a Schottky diode was present it would have been impossible to measure the voltage of the body-diode since the Schottky would start conducing already at lower voltages.

As previously anticipated, the voltage reading of the body diode will be very useful since we are going to use it as an internal temperature sensor: in the next sections we will going to investigate the voltage measurement of the diode from which we can estimate directly the junction temperature of the power module.

## 1.3.1 Dual-side Cooling

The fundamental purposes of heat sinks and airflow is to allow high power dissipation levels while maintaining safe junction temperatures. This module takes cooling to another level implementing a water cooling on both sides, a dual-side cooling (DSC).



Figure 1.12: Single vs Dual-side cooling

During normal operations hundreds of ampere will flow through the power module without any issue; being a device for the automotive industry it has been built with the intention of sourcing large amounts of power.

For this reason, an ad-hoc cooling system for its dual side parts has been properly designed.



Figure 1.13: Typical Dual-side cooled Power Module [source: OnSemi [26]]

Experimental results [11] have shown a reduction by 35% of thermal resistance implementing a DSC. In high-speed switching modules this means to have lower switching losses.



**Figure 1.14:** *Experimental results of thermal resistance using single-sided and double-sided directcooling structures* 

In particular for this power modules, the AQG324 Guideline [1] cites that temperature measurements must be performed with simultaneous cooling from both sides, two heat sink temperatures  $T_{S1}$  and  $T_{S2}$  must be measured. The sensors must be placed in blind holes on each side, centrally below the DUT. Each blind hole must have a diameter of 2.5 mm and end  $2\pm 1$  mm below the heat sink surface, see Figure 1.15.

The thermal resistance  $R_{th,j-s}$  must therefore be determined using the famous relation between power, thermal resistance and temperature  $R_{th} = \frac{\Delta T_j}{P_{loss}}$  which can be written as follows:

$$R_{th,j-s} = \frac{T_{vj} - (\frac{T_{S1} + T_{S2}}{2})}{P_{loss}}$$
(1.2)



**Figure 1.15:** *Reference points for determining the heat sink temperatures*  $T_{S1}$  *and*  $T_{S2}$  *for double sided cooling modules [source: AQG 324 [1] ]* 

## **1.3.2** NTC Thermistor

The device under test is equipped with a NTC thermistor, so the temperature inside the module case can be monitored. A Negative Temperature Coefficient is a type of resistor whose resistance is strongly dependent on temperature, more so than in standard resistors. NTCs have less resistance at higher temperatures.

The resistance value at 25°C and the B constant (temperature coefficient) of a thermistor are specified in its product specifications. The NTC has been chosen by the manufacturer and it has its own conduction model. The resistance value of a thermistor at temperature T1 is derived from the equation 1.3 below.

$$R(T_1) = R(T_0) * e^{B_{T_0/T_1} * (\frac{1}{T_1} - \frac{1}{T_0})}$$
(1.3)

- $T_0$ : reference temperature, 25°C in general
- $T_1$ : temperature of the thermistor being detected
- $R(T_1)$ : resistance value of the thermistor read from the instrument

 $R(T_0)$ : resistance value of the thermistor at the reference temperature

 $B_{T_0/T_1}$ : constant for the specific NTC thermistor



**Figure 1.16:** *NTC Thermistor of* 68  $k\Omega$  [source: Vishay [24]]

It's important to specify that the thermistor is mounted on the isolated substrate of the power module, so it's not on the main heat radiation path. This means that a certain thermal resistance exists between the NTC and the SiC MOSFET junction. This difference also depends on the external cooling conditions since the thermistor tries to estimate the temperature of the junction.

Generally the maximum temperature of the junction of a SiC MOSFET is around 200°C.

However, the NTC thermistor can't be use to monitor the junction temperature in real-time or in the transient state since it can't be physically put too close to the MOS-FET's junction, which means that its measurement gives us a qualitative measurement of the overall temperature inside the package, but not an estimation of the virtual junction temperature. That's why the electrical measurement performed through the voltage readings from the diode are very useful.

# Chapter 2

# **Lifetime Modeling**

## 2.1 ECPE AQG 324 "Automotive Qualification Guideline"

The AQG324 represents an industry guideline based on best practices and outstanding requirement engineering alignment through the automotive supply chain for power electronics converter units. The original version is based on the supply specification LV 324 which has been developed by German automotive OEMs together with representatives from the power electronics supplier industry in a joint working group of ECPE and the German ZVEI association. The ECPE is the European Center for Power Electronics, ZVEI is the association of the electrical and digital industry and promotes the industry's joint economic, technological and environmental policy interests on a national, European and global level.



This document defines requirements, test conditions and tests for validating properties, including the lifetime of power electronics modules and equivalent special designs based on discrete devices, for use in power electronics converter units (PCUs) of motor vehicles up to 3.5 tonnes gross vehicle weight.

The tests that the document propose concern the module design as well as the qualification of devices on module level (i.e. the assembly), but not the qualification of semiconductor chips or manufacturing processes. In particular, these tests are left to the manufacturer and subjected to different standards.

The requirements, test conditions and tests listed in the main document essentially refer to power modules based on Si power semiconductors while the specialities of SiCbased power modules are addressed in Annex III.A of the guideline. Future releases of the AQG 324 Guideline will address further wide band-gap power semiconductors (e.g. GaN). The tests listed in this document also apply for validating power module properties when using a thermal interface between the power module and the cooling system on PCU-level, if this interface is not a part of the module structure as a result of the design.

In our specific case, we are going to test the module and build the lifetime model following what the guideline says about our DUT: a SiC-based Power Module with Dual-Side Cooling (DSC) and a thermal paste as a Thermal Interface Material (TIM).

The AQG 324 guideline describes different typologies of testing:

- Module test, which determines the electrical and mechanical parameters,
- Characterization testing, for stray inductances, thermal resistance, short-circuit capability,
- Environmental testing like Thermal shock and Vibration,
- Lifetime testing like Power Cycling, High/Low Temperature/Humidity storage.

These tests cover almost completely the majority of the behaviour that the DUT will encounter along its life and aim to stress the most common failure modes identified from literature or from the manufacturer's experience.

Besides general characterization tests to assess the performance limits of the DUT, environmental testing are needed since variations in temperature will cause different types of damages based on the material chosen by the manufacturer, vibrations that simulate different road conditions can cause cracks on the PCBs or near the joints of the soldering.

Since the scope of this project is to estimate the damage of the power module after some typical working condition, we are interested in the lifetime model and how it compares to a real-life scenario. It is important not to change the conditions, the devices or any other material directly involved in the test. All changes concerning the module and semiconductor design must be reported as well as the any process-related changes.

A new technology qualification will be needed if one of those conditions varies.

In this regard, we are going to investigate the Lifetime testing section in order to build and validate, as a final goal, a more reliable model based on real-life data.

The AQG Guideline considers the following lifetime tests:

- Power cycling (shorter, *PCsec*)
- Power cycling (longer, *PC<sub>min</sub>*)
- High-temperature storage (HTS)
- Low-temperature storage (LTS)

- High-temperature reverse bias (HTRB)
- High-temperature gate bias (HTGB)
- High-humidity High-temperature reverse bias (H3TRB)

As their name suggest, these lifetime tests are all focused on how the temperature of the device impact the device under test: keeping the device at high/low storage temperature will simulate the behaviour of the car during air/sea transports; the offtime that the car will experience in garages from Miami to Helsinki are taken into account in the high-temperature high-humidity test.

The Power Cycling test (both  $PC_{sec}$  and  $PC_{min}$ ) instead focuses on the thermal jump history of the device: it reproduces on a repetition a cyclic behaviour of heating up  $(t_{ON})$  and cooling down  $(t_{OFF})$  thanks to a large current that flows into the module for a specific  $t_{ON}$ , up to (85% - 90% of the maximum current).

It is the basis for verification of the lifetime model provided by the module manufacturer for the DUTs; since these two tests themselves can also be used to support creating the lifetime model, we are primarily going to focus on these. In our case, our goal is to create a model that will be improved over time with experimental data , so to this regard we internally started designing an hardware test rig to perform the Power Cycling ( $PC_{sec}$ ,  $PC_{min}$ ) tests.

# 2.2 Lifetime Testing

Lifetime testing has the objective of triggering the typical degradation mechanisms of power electronics modules. This process primarily differentiates between two failure mechanisms – fatigue of close-to-chip interconnections (chip-near) and fatigue of interconnections with a wider distance to the chip (chip-remote).

Both failure mechanisms are triggered by thermo-mechanical stress between the different materials, with different thermal expansion coefficients, in each case.

The reliability of both, chip-near and chip-remote interconnections, depends on the thermal interface to the cooling system. For this reason, module qualification tests relating to these interconnections can only be tested using an application-based setup for modules without direct connection to the cooling system (connection, e.g. without base plate via TIM). The number of devices under test (DUTs) for environmental and lifetime testing must be agreed upon between the PCU manufacturer and the module manufacturer in advance, following the test flow chart defined in Annex I.A of the AQG 324 document. In particular, the test flow chart attempts to minimize the number of DUTs by using them multiple times in case of non-destructive testing.



Figure 2.1: Test flow chart

## 2.2.1 Power Cycling

The objective of this test is to generate targeted stress situations in a power electronics module under strongly accelerated conditions which lead to signs of wear and degradation on the module. The test consists of a high pulsed current that flows into the DUT in a short period of time: this pulse will dramatically increase the temperature of the DUT from  $T_{vj,min}$  to  $T_{vj,max}$ . In a general power module structure, if the chip temperature varies during the operation, stress generated due to difference in the linear expansion coefficient of the aluminum wire and the SiC chip causes a crack on the wire bonding surface. If this crack advances, a failure finally occurs in the separation mode.



Figure 2.2: Current and temperature curve for PCsec

Figure 2.2 shows  $\Delta T_{vj}$ , which is the virtual junction temperature and  $T_{c/s}$ , the temperature of the generalised contact surface "c" or "s".

By varying the key parameter  $t_{ON}$  (on-time of the load current) one can obtain two different test conditions:

- $t_{ON} < 5s \rightarrow PC_{sec}$ , which exert targeted stress on the chip-near interconnections (die-attach and top-side contacting), numbered with 7 in the Figure 2.3 below.
- $t_{ON} > 15s \rightarrow PC_{min}$ , which will apply a to the chip-remote interconnection (system soldering) as well as to the chip-near interconnection technology (die-attach, top-side contacting), numbered respectively with 8 and 7 in the Figure 2.3 below.

The chip-near interconnection technology describes a design of the chip topside connection as well as the chip backside connection with the substrate.

The chip-remote interconnection technology describes a design of the connections which

do not directly include the chip. For this, a differentiation must be made between electrical and thermal interfaces. As a result of the design, chip-remote interconnection technology can be electrical as well as thermal. Signs of stress can then be found in various locations of the chip interconnections:

- chip-near: wire bonding, copper clip, sintering technology, chip soldering
- chip-remote: soldering between substrate and base-plate or between module and cooling system



**Figure 2.3:** On the left, sample cross section of a power module: 1. Base plate, 2. Solder, 3. Ceramic insulator, 4. Copper, 5. Chip, 6. Bonding wire, 7. Chip-near connection, 8. Chip-remote connection. On the right, a schematic diagram of the thermal expansion coefficient in the individual layers

The results of this test are the reliability data for the module-specific, chip-near interconnection technology as well as the marking of the data in the numerical representation of the lifetime curve  $N_f = f(\Delta T_{vj}, T_{vj,max}, t_{on})$ , the number of cycles to failure of the DUT, curve that is obtained with module's parameters which must be provided by the manufacturer and, as we will see, it is function of:

- the thermal jump  $\Delta T_{vj}$  of the device junction
- the maximum junction temperature  $T_{vj,max}$
- the interval of time  $t_{on}$  where this  $\Delta T_{vj}$  occur.

| Parameter                                                        | Symbol     | Value              |
|------------------------------------------------------------------|------------|--------------------|
| On-time of the Load Current                                      | ton        | $< 5s (PC_{sec})$  |
| -                                                                | ton        | $> 15s (PC_{min})$ |
| Value of Load Current                                            | $I_L$      | $> 0.85 * I_{DN}$  |
| Gate Voltage                                                     | Vgate      | 20 V               |
| Coolant Flow-rate                                                | $Q_{cool}$ | constant           |
| <i>I</i> <sub>DN</sub> : Drain Nominal current of the SiC-MOSFET |            |                    |

| Tal | ble | 2 | .1 | : Limits | for | test | parameters | PCsec / | PC <sub>mir</sub> |
|-----|-----|---|----|----------|-----|------|------------|---------|-------------------|
|-----|-----|---|----|----------|-----|------|------------|---------|-------------------|

Any change in terms of limits of the parameters or test condition must be chosen by the manufacturer beforehand the test begins and they must be documented accordingly to the AQG Guideline. During the operation in actual applications, while the case temperature  $T_c$  of the module varies relatively moderately with a longer period, variation in the junction temperature  $T_{vj}$  can be steep with a shorter time cycle, as shown in Figure 2.4. The main reasons for the short time cycle include the acceleration and deceleration operations of the device. The cycle is also generated constantly as the operation of the circuit topology. In contrast, the long time cycle is caused by the starting and stopping times of the device, among other factors.



**Figure 2.4:** Power cycle operation mode:  $T_c$  vs  $T_{vj}$ 

#### End of life criteria (EOL)

Power Cycling is the basis of the lifetime modeling analysis of a power module, which means that the degradation of the device is evaluated in terms of "failure criteria". The AQG Guideline identified two parameters that are needed to be monitored in order to say that the device is broken: the forward voltage of the diode  $(V_F)$  or MOSFET  $(V_{DS,on})$ and the thermal resistance of the internal silicon junction. It's interesting to note that these criteria are the very same for both Silicon and Silicon-Carbide semiconductors. The failure criteria are defined as follows:

| Parameter                               | Change from standard value |
|-----------------------------------------|----------------------------|
| Increase of forward voltage $V_{DS,on}$ | +5%                        |
| Increase of thermal resistance $R_{th}$ | +20%                       |

 Table 2.2: End Of Life (EOL) criteria for PCsec / PCmin

As soon as one of these two parameters increases by its specific drift offset, we can claim that the DUT is failed, KO. This means that even though the device might still function and switch, current may still flow accordingly to its nominal value, it must be considered as broken. For reliability purposes, this means that it is not safe to use the device since it can have an abnormal behaviour. Cracks on the soldering joints, in the substrates or in the die parts are expected after the end of the test which suggest to use scanning measurements like acoustic microscopy and tomography. The  $V_{DS,on}$ will be the only parameter that we will investigate in the HW test bench that we are designing.

For what concerns the  $R_{th}$  of the module instead, it is left to the manufacturer to characterize it in the way the AQG Guideline indicates. In our case, the  $V_{DS,on}$  is more sensitive and easier to monitor so for this project we will going to investigate on this parameter only.

As we will further discuss in the next section, we are not going to include the  $R_{th}$  measurement in our test bench; instead we will use raw data of each layer of the chip to build its thermal network and simulate in every iteration of the script the junction temperature  $T_{vj}$  based on the current power loss.

The lifetime data  $Nf = f(\Delta T_{vj}, T_{vj,max}, t_{on})$  determined for the individual DUTs during the test must be marked in the reliability curve for the power electronics module provided by the manufacturer.

It must be ensured that only DUTs are used whose failure patterns have identical failure mechanisms. DUTs with deviating failure mechanisms must be removed and the test must be repeated with new DUTs. Failures of the semiconductor which cannot be clearly attributed to the aging of the assembly and interconnection technology are excluded from the analysis, not taken into account.

The parameters are verified with the use of a module test, the electro-mechanical characterization of the DUT, which must be conducted before and after this qualification test. The results and parameters of the test must be documented so that a lifetime data can be created.
## 2.3 MATLAB Script Implementation

A major life test described in the AQG 324 is the power cycling, which induces several wear out mechanisms on the power modules in different areas. These tests are used both to create and verify the lifetime model of the DUT, that is provided by the power module manufacturer.

In the Annex II.D of the AQG 324 guideline called "Guideline for Lifetime Calculation of Power Modules" the steps of mission profile simulation are described, that make use of the lifetime model that is verified with the Power Cycling tests.



Figure 2.5: MATLAB Script Flow

The input of the lifetime model is a mission profile generated from the Reliability department. The company were this model was created, as every car manufacturer, has its own mission profiles that are based on the experience and projections for that specific car model, its specifications and performance requirements.Starting from a mission profile, the model estimates the damage of the power module under test. The first step is to calculate the losses profile in time, which are fundamental to estimate the junction temperature through the thermal model of the chip. Once we have a thermal-jump history of the DUT, we can use the variations of temperature per switching cycle  $\Delta T_{vj}$ , its length in time and how many times it occurs in order to calculate the acceleration factor (AF) of the current mission profile under observation. Finally, the AF quantifies the amount of damage that the DUT has experienced. It is fundamental to know:

- the parameters of the application:  $I_L, V_{DC}, m, cos(\phi), f_{sw}, T_{vj}$ ,
- the electrical characteristics of the chip:  $V_{DS,on}, V_{diode}, E_{on}/E_{off}, E_{cond}, E_{diode},$
- the electro-thermal behaviour of the module: thermal network, the cooling profile of the fluid,
- a precise lifetime model characterized from experimental data, information given by the manufacturer.

The type of electrical drive and the topology of the drive system will influence different parameters.

### **Power Loss**

The mission profile of the vehicle is composed by the switching frequency, the varying phase currents, the DC voltages, the modulation index and the  $\cos(\phi)$  of the inverter. Additionally mission profiles can be generated by requirements on number of specific load cycles, e.g. boost, recuperation, motor start during engine operation, warm starts.

In order to evaluate the losses in the power electronics components, a complete knowledge on the electrical drive is necessary: while the mission profiles are internally generated, the manufacturer provides the forward on-voltage of the MOSFETs  $V_{DS,on}$ , the turn-on and turn-off energies  $E_{on}/E_{off}$  as well as the limits and the thermal resistances and capacitances of the layers that constitute the power module.

The switching losses are strongly influenced by the inverter design, in particular by the gate driver circuit and the choice of the duty-cycle and the dead-time between the turn-on of the two internal MOSFETs. Starting from the static electrical characteristics of the module, the power loss distribution is then computed and it is function of these parameters that are specific for each mission profile:

$$P_{loss} = f(I_L, V_{DC}, m, cos(\phi), f_{sw}, T_{vj})$$

$$(2.1)$$

As we can see from the formula 2.1, not only the load current  $I_L$  and the bus DC voltage  $V_{DC}$  affect the power losses, the modulation index and power factor have a stronger influence on the loss sharing between MOSFET and diode. Also, the switching frequency and the junction temperature must be considered.

In fact, among all these parameters, the junction temperature is fed back into the power loss calculation in a closed loop shown in Figure 2.6:



Figure 2.6: Loop between Ploss - Zth - Tjunc

The total power loss of a SiC MOSFET takes into account both the static and dynamic part. Double Pulse Test (DPT) are used to capture the switching transients of a semiconductor as a method to determine the power losses (efficiency) of the chip. Two consecutive impulse are used since it's important to build up current in the complementary device or diode so that when the switch turns on, the effects of any reverse recovery current can be evaluated.

Beside the reverse recovery effect of the diode, parasitic elements and temperaturedependent parameters are two of the other effects that must be taken into account when considering a power loss estimation method.



Figure 2.7: Components of SiC MOSFET's losses

• Turn-on parameters: Turn-on delay  $(t_{d,on})$ , rise time  $(t_r)$ ,  $t_{on}$  (turn-on time),  $E_{on}$  (On-Energy),  $\frac{dv}{dt}$ , and  $\frac{di}{dt}$ . Energy loss is then determined.

$$E_{on} = \frac{1}{2} * V_{ds} * I_d * t_{on}$$
(2.2)

• Turn-off parameters: Turn-off delay  $(t_{d,off})$ , fall time  $(t_f)$ ,  $t_{off}$  (turn-off time),  $E_{off}$  (Off-Energy),  $\frac{dv}{dt}$ , and  $\frac{di}{dt}$ . Energy loss is then determined.

$$E_{off} = \frac{1}{2} * V_{ds} * I_d * t_{off}$$
(2.3)

Switching losses will then be obtained with:

$$P_{sw,on} = E_{on} * f_{sw}$$

$$P_{sw,off} = E_{off} * f_{sw}$$
(2.4)



Figure 2.8: ON state switching transition

• On-state parameters: the equivalent resistance  $R_{DS,on}$  of the device when it's turned on, the effective value of the through current  $I_{rms}$ 

$$P_{cond} = I_{rms}^2 * R_{DS,on} * f_{sw}$$

$$\tag{2.5}$$

• Reverse-recovery parameters: time  $t_{rr}$ , current  $I_{rr}$ , charge  $Q_{rr}$ , energy  $E_{rr}$ ,  $\frac{di}{dt}$ , and  $V_{sd}$  (forward on voltage).

Compared to conduction and switching losses, the leakage current of a SiC device in the blocking state is negligible.

Thanks to this property, the energy losses experienced by the diode are consid-

ered negligible [8], thanks to the very low leakage current that characterize SiC Power Modules. where dt is the dead-time between the turn-off of one MOSFET and the turn-on of the other in the same leg.



Figure 2.9: Reverse Recovery charge stored during t<sub>rr</sub>

The total power loss per switching period of one single MOSFET will then be the product between the energy losses and the switching frequency in that particular period:  $P_{xy,loss} = E_{xy,loss} * f_{sw}$ . Iterating this process both for diode's and MOSFET's contributes and for the whole mission profile, the total power loss will be the sum of the following:

$$P_{loss} = P_{sw,on} + P_{sw,off} + P_{cond} \tag{2.6}$$

These calculations are performed for every switching cycle to create a power loss profile in time that will be passed through the second part of the script, which will take care of the thermal network of the chip to estimate the temperature.

Generally, the key parameters of the device can be found in the datasheet given by the manufacturer but the measurement environment of the manufacturer and working state of the device are very different from the actual application, thus there is a certain degree of error between the actual dynamic and static characteristics of the device and the characteristics of the datasheet. To ensure the accuracy of the loss simulation model, it is necessary to test some key parameters of the devices.

The manufacturer has provided the majority of the parameters conducted in a reallife testing scenario and these have been used to interpolate the data that the script produces. For example, this paper [8] has simulated and measured the saturation voltage drop of the SiC MOSFET and the forward voltage drop of the anti-parallel diode. Since these two parameters both affect the on-state loss of the devices, it is necessary to take precise measurements in order to ensure the accuracy of the model. Figure 2.10 shows the curves of the saturation voltage drop of MOSFET and forward voltage drop of the diode at different temperatures.



**Figure 2.10:** *V-I characteristics of SiC MOSFET and anti-parallel SiC diode at different temperatures* [8]

The Figure 2.10 shows that both voltages increase as temperature increases, which will of course lead to the on-state losses increasing.

Since the temperature of the junction of the module depends on the power loss of the module in that particular instant, the thermal impedance  $Z_{th}$  of the system is needed. In order to evaluate the  $Z_{th}$ , a thermal network formed by the layers of the chip must be modeled.

## **Thermal Network**

In many engineering investigations of the physical behaviour of mechanical, electrical, fluid dynamic, thermal and optical systems, the complexity of analytical approximate numerical solutions of the mathematical formulation of the problem may become very difficult and excessively time consuming. Analogies between systems of different energies are vastly used by scientists and engineers [17], thus an electrical system may be analogous to a mechanical, thermal or other system provided there is a likeness between the two systems. In this type of modeling, a lumped-element approach is used, which is when the size of an element is smaller than the wavelength of the applied signals. In lumped elements, the effect of wave propagation can be neglected.

| Electrical            | Thermal                                                                       |
|-----------------------|-------------------------------------------------------------------------------|
| Voltage [V]           | Temperature [K]                                                               |
| Charge [C]            | Heat [J]                                                                      |
| Current [A]           | Heat flow-rate, $\frac{dQ}{dt}$ [W]                                           |
| Resistance $[\Omega]$ | Thermal Resistance $\theta_{XY} \left[\frac{K}{W}\right]$                     |
| Capacitance [F]       | Thermal Capacitance (Volumetric heat capacity) $\left[\frac{J}{K+m^3}\right]$ |
| Conductivity [S]      | Thermal Conductivity $\left[\frac{W}{K*m}\right]$                             |

 Table 2.3: Electro-thermal analogy



Figure 2.11: Heat transfer

The physical dimensions of lumped elements make it so that signals do not vary

over the interconnects interfacing them. Complex systems are shown to be reducible to Norton/Thevenin equivalent circuits, which capture overall system performance in as little as one or two parameters. A key building block of this framework is Ohm's law, providing the basis of the electrical resistance concept. Similarities between these two different mechanisms (heat transfer and charge transfer) suggests that a similar network model could also apply to thermal networks.

There are two fundamental physical elements that make up thermal networks, thermal resistances and thermal capacitance. There are also three sources of heat, a power source, a temperature source, and fluid flow.

The heat dissipation path can be simulated by an electric circuit with the thermal resistance  $R_{th}$  and the heat capacity  $C_{th}$  The thermal resistance represents how fast the heat is transferred from a side to the other side of a specific material.

The thermal capacitance, also called thermal mass, is a measure of how much heat a specific object can store. If an object has thermal capacitance its temperature will rise as heat flows into the object, and the temperature will lower as heat flows out.



-  $P_{JA}$  is the total power dissipated by the device from junction-to-ambient -  $T_{J,MAX}$  is typically 175 - 200°C for a SiC MOSFET junction

We will generally be interested in temperature differences, not absolute temperatures (much as electrical circuits deal with voltage differences). Therefore, we will generally take a reference temperature, and measure all temperatures relative to this reference.

In semiconductors, one temperature reference point is always the device junction  $T_{vj}$ , taken as the hottest spot inside the chip, operating within a given package. The other relevant reference point will be either the case of the device  $T_C$  or the ambient temperature  $T_A$  of the surrounding air.

This then leads in turn to the  $\theta_{JA}$  (junction-to-ambient) and  $\theta_{JC}$  (junction-to-case) are two more specific terms used in dealing with thermal issues in electronics.

Finally, the network can be synthesized as Foster or Cauer thermal analysis, depending on the topology used.



Figure 2.13: Cauer Network

The Cauer thermal model represents heat transfer through multiple layers of a semiconductor module, layers like the chip, the solder, the substrate and base. Its resistive  $R_{th}$  and capacitive  $C_{th}$  parameters can be obtained either by calculations based on geometric parameters and thermal properties or by fitting the model to a thermal impedance curve. The individual RC elements can be assigned to the individual layers of the module.

$$Z_{th,tot} = \frac{R_{th,1} + A}{s * C_{th,1} * (R_{th,1} + \frac{1}{s * C_{th,1}} + A)},$$
(2.9)

$$A = \frac{R_{th,2} + B}{s * C_{th,2} * (R_{th,2} + \frac{1}{s * C_{th,2}} + B)},$$
(2.10)

$$B = \frac{R_{th,3} + C}{s * C_{th,3} * (R_{th,3} + \frac{1}{s * C_{th,3}} + C)},$$
(2.11)

$$C = \frac{R_{th,4}}{s * C_{th,4} * (R_{th,4} + \frac{1}{s * C_{th,4}})}$$
(2.12)

The nodes in a Cauer model have physical meaning and therefore allow access to the temperature of the internal layers  $T_{1,2,3}$ . As formula 2.9 suggests, the model is quite complex.



Figure 2.14: Foster Network

The Foster thermal model represents heat transfer through a semiconductor module. The individual RC elements of the partial-fraction circuit no longer represent the layer sequence.

In this case the network nodes do not have any physical significance. This illustration is used in datasheets, as the coefficients can be easily extracted from a measured cooling curve of the module. Furthermore, they can be used to make analytical calculations. Each layer can be modeled as follows:

$$Z_{th,i} = R_{th,i} / \frac{1}{s * C_{th,i}} = \frac{R_{th,i}}{1 + s * C_{th,i}R_{th,i}}$$
(2.13)

The thermal impedance between  $T_j$  and  $T_c$  is the sum of the layers:

$$Z_{th,tot} = \frac{R_{th,1}}{1 + s * C_{th,1}R_{th,1}} + \dots + \frac{R_{th,4}}{1 + s * C_{th,4}R_{th,4}}$$
(2.14)



Figure 2.15: Modeling a real system with the Cauer network

The transient thermal impedance  $Z_{th}$  is the thermal resistance until the heat capacities  $C_{thC1}$ ,  $C_{thC2}$  and  $C_{thC4}$  shown in the heat dissipation path in Figure 2.15 are saturated, and is a function of time:  $\lim_{t\to\infty} Z_{th} \approx R_{th}$ .

$$R_{th} = \frac{T_j - T_a}{P_{loss}} \tag{2.15}$$

## **Junction Temperature Estimation**

Since the steady-state thermal resistance  $R_{th}$  is not affected by thermal capacitance, the junction temperature of a thermal network with that thermal resistance in a room with  $T_c$  [°C] dissipating P [W] can be easily calculated as:



Figure 2.16: T<sub>vj</sub> response to different power loss profiles

A mission profile that represent some real-life behaviour is far from constant in time so the computation of the junction temperature is performed through the  $Z_{th}(i)$ for better correctness and preciseness.



**Figure 2.17:**  $R_{th}$  vs  $Z_{th}$  evolution when applied a single impulse of Ploss

Figure 2.17 shows the main difference between the  $R_{th}$  and  $Z_{th}$ : the second one has an exponential evolution since it takes time for the thermal capacitances of the thermal network to fully charge after a step of  $P_{loss}$  [W]. Clearly, the  $R_{th}$  stays at its constant value throughout the whole time interval. This is a pessimistic approach that must be used only in case of continuous loss, as previously shown at the top of the Figure 2.16.

It is possible to estimate the junction temperature via solving the Laplace transform of the transfer function of the thermal network at each iteration or with the approximation of the transfer function to its first order in the s variable as the table 2.4 shows.

First order approximation: 
$$Z_{th} \approx R_{th} * (\frac{1}{s(s * \tau + 1)})$$
  
 $\mathscr{L}[Z_{th}] \approx R_{th} * (1 - e^{-\frac{t}{\tau}})$   
Full response:  $Z_{th} = \frac{b_m * s^m + ... + b_1 * s + b_0}{a_n * s^n + ... + a_1 * s + a_0}$ 

 Table 2.4: Differences between first order approximation and full transfer function

m,n can be as high as 50-60



**Figure 2.18:** Junction temperature estimation when applied a single impulse of Ploss, calculated with/without approximation

Figure 2.18 shows that in less than 5 seconds the junction temperature estimated with the exponential and the one estimated with lsim (without approximation) tend to their final value to a response of an impulse of 250 Watts. The approximation is faster in terms of computing time but also less precise as the graph suggests.

The Figure 2.19 shows the concept previously stated that occurs when a continuous power loss necessitates of the Zth in order to keep up with small changes in the junction temperature. In fact, the lsim command used to compute the Zth in the state-space form considers the previous state of the system, which in our case means that it takes into account the previous  $\Delta T$  values stored inside the thermal capacitances of the layers of the thermal network. This behaviour is fundamental in order to take into account the previous instant of the calculation and to estimate the behaviour of each layer (the next state) of the system.



**Figure 2.19:** Junction temperature estimation when applied continuous pulses of Ploss, calculated with/without approximation

As previously stated for the power loss profile, a temperature value for the junction will be obtained for every switching period, where we consider for few milliseconds to have a constant power loss and so a constant junction temperature. At each instant we will also add the coolant temperature since it has a fundamental role in the thermal network of the system.

Up to now it has been explained how power loss, thermal network and the junction temperature are evaluated. The fun part starts right when a mission profile is plugged into the script and at each iteration, so at each switching period, the power loss is fixed, the transfer function of the thermal impedance  $Z_{th}(i)$  produced by the manufacturer is computed with the MATLAB command lsim, with the power loss  $P_{loss}(i)$  as input to get the i-th junction temperature  $T_{junc}(i)$ .

Once the whole mission profile has been passed through the script a temperature profile for the junction of the SiC DSC module will be obtained, called "thermal jump history". To compare the stress induced by the mission profile to power cycling test result, temperature swings during the mission profile have to be extracted.

The Rainflow method, a procedure from material mechanics, gives the best results for counting temperature swing which creates a comparable stress level from different stress tests.

## **Rainflow Algorithm**

The Rainflow-counting algorithm is used in the analysis of fatigue data to reduce a spectrum of varying stress into an equivalent set of simple stress reversals. The method successively extracts the smaller interruption cycles from a sequence, which models the material memory effect seen with stress-strain hysteresis cycles.



Figure 2.20: Flow of the Rainflow algorithm and its equivalent stress

This simplification allows the fatigue life of a component to be determined for each Rainflow cycle. Compared to other counting methods global maxima, resulting e.g. from slow increase of coolant temperature, are counted. Depending on the applied lifetime model, the counting method has to be adapted, to extract all necessary data from the cycle.

In power semiconductor applications, cycle counting algorithms are applied to temperature data obtained from the loss calculation and thermal model of the semiconductors. By using lifetime models based on temperature obtained from manufacturer's test data, a degradation model can be estimated assuming linear accumulation of degradation with Miner's rule.

Miner's rule is one of the most widely used cumulative damage models for failures caused by fatigue. It states that if there are k different stress levels and the average number of cycles to failure at the i-th stress,  $S_i$ , is  $N_i$ , then the damage fraction, C, is:

$$\sum_{i=1}^{k} \frac{n_i}{N_i} = C \tag{2.19}$$

#### Lifetime Estimation Model



Figure 2.21: Block diagram of the steps involved in lifetime estimation of power semiconductor

2.21 possibilmente da rifare da capo

To predict the life of a component subjected to a variable load history, cycle counting methods are applied to reduce the complex history into a number of events that can be compared to the available constant amplitude test data.

The results of the counting algorithm are the following for each jump in temperature:

- the range of the thermal jump  $\Delta T_{\nu j}$ ,
- the duration  $t_{on}$  of that jump,
- the average temperature  $T_{avg}$  in that particular time frame,
- the number of cycle  $N_{cycle}$ , either 1 or 0.5, considered for the particular thermal jump.

The Rainflow Counting Algorithm according to ASTM International Standard (E 1049-85) is directly implemented in MATLAB in recent versions [13].

The first step of the algorithm is to find all the local extrema of the signal under test like shown in Figure 2.22.



Figure 2.22: Extrema of a random signal

The MATLAB syntax is very easy since it just needs the signal array of data and the length in time or the sampling frequency:  $rainflow(T_j, time)$ . The output of the function populates a matrix filled with the occurrences of the temperature jumps, their time interval, the average temperature and the number of cycles. Once obtained the array of the extrema, the algorithm implements the Four Point Counting method [14] where, as the name suggests, four points at a time are considered. From their relative position, the temperature difference and the delta in time between each point is computed iteratively. Based on these values and the previous once, either 1 or 0.5 of a cycle is assigned to the event. Citing the "Standard Practices for Cycle Counting in Fatigue Analysis" [14], the rules for the Rainflow Algorithm are:

let X denote the range under consideration, Y the previous range adjacent to X and S starting point in the history.

- 1. Read next peak or valley. If out of data, go to Step 6.
- 2. If there are less than three points, go to Step 1. Form ranges X and Y using the three most recent peaks and valleys that have not been discarded.
- 3. Compare the absolute values of ranges X and Y.
  - If X < Y, go to Step 1.
  - If  $X \ge Y$ , go to Step 4.
- 4. If range Y contains the starting point S, go to Step 5; otherwise, count range Y as one cycle; discard the peak and valley of Y; and go to Step 2.

- 5. Count range Y as one-half cycle; discard the first point (peak or valley) in range Y; move the starting point to the second point in range Y; and go to Step 2.
- 6. Count each range that has not been previously counted as one-half cycle.

Example:



Figure 2.23: Rainflow example from [14]

- 1. S = A; Y = |A B|; X = |B C|; X > Y. Y contains S, that is, point A. Count |A - B| as one-half cycle and discard point A; S = B (See Figure 2.23b).
- 2. Y = |B C|; X = |C D|; X > Y. Y contains S, that is, point B. Count |B - C| as one-half cycle and discard point B; S = C.
- 3. Y = |C D|; X = |D E|; X < Y.
- 4. Y = |D E|; X = |E F|; X < Y.
- 5. Y = |E F|; X = |F G|; X > Y.

Count |E - F| as one cycle and discard points E and F. (See Figure 2.23d. Note that a cycle is formed by pairing range E-F and a portion of range F-G.) And so on.

## 2.3.1 Lifetime Estimation

During Power Cycling  $PC_{sec}$  with short load current on-time  $(t_{on} < 5s)$  load currents periodically applied to the module cause rapid temperature changes. Typical degradation mechanisms, which are shown in Figure 2.24, are:

wire bond lift off, degradation of chip metallization and degradation of the chip to substrate solder joint.



Figure 2.24: Typical wear out mechanisms in power modules due to power cycling

Relevant temperature swings will only occur for very high currents and switching frequencies. Additionally the power cycling capability increases with shorter  $t_{on}$ , because e.g. typical Al bond wires in power modules need more than 100ms to heat up due to their thermal capacitance. Therefore temperature swings with  $t_{on}$  times lower than 100ms usually can be neglected and filtered for drive inverter application.

When performing the Power Cycling  $PC_{min}$  with a longer load current on-time ( $t_{on} > 15$ s) higher stress on solder joints are created and possibly also solder joint cracking between substrate/baseplate is induced. Therefore, both types of power cycling tests are required to verify the package technology.

The empirical lifetime model in the formula 2.20 derived from test data was developed in the LESIT project and included temperature cycle  $\Delta T_{vj}$  through a Coffin Manson law and on the mean temperature  $T_{vj,avg}$ , by means of an Arrhenius term. The extension of this model was necessary due to time dependence effect of solder degradation. The empirical model that quantifies the amount of Power Cycles needed to bring to failure the power module is given by the AQG Guideline [1]:

$$N_f = A * \Delta T_j^a * e^{\frac{E_a}{k_B * T_{j,avg}}}$$
(2.20)

where,

 $N_f$  is the field service life in terms of number of PC cycles,

A is an experimental constant, derived by the manufacturer,

 $\Delta T_i^a$  is the equivalent temperature variation on which the lifetime is evaluated,

 $E_a$  is the energy-gap of the SiC module,

 $k_B$  is the Boltzmann constant,

 $T_{j,avg}$  is the mean temperature of the junction.

Other lifetime models derived from a large dataset of test results show the impact of more parameters, like diameter of bonding wires, current, on-time  $t_{on}$  of the module etc.

The test results for  $PC_{sec}$ ,  $PC_{min}$  at different  $\Delta T$  has to be provided by the power module manufacturer. A lifetime model which can be based on a model presented in formula 2.20 or other curve fitting method should be aligned with the power module manufacturer. The curve should represent the individual test result and provide a continuous robustness function.

Including the Rainflow counting results in this function it is possible to calculate from each individual  $\Delta T(i)$ ,  $t_{on}(i)$ ,  $T_{vj,max}(i)$  a lifetime consumption and transfer it into one equivalent stress value. The lifetime curve has to represent a defined probability of survival, e.g. 95%. Therefore, a lifetime consumption of 100% can be understood as 5% parts have reached their end of life criteria. Figure 2.25 shows an example of a lifetime model. In the diagram on the curve, the black asterisks mark data points, that are verified with the  $PC_{sec}$  and  $PC_{min}$  tests. Figure 2.25 should be understood as example only. Depending on the product, different curves for MOSFET, IGBT and diode might exist.



Figure 2.25: Lifetime model example PC<sub>sec</sub>, PC<sub>min</sub> from AQG324 [1]

### **Norris & Landzberg Acceleration Factor**

The Acceleration Factor (AF) has the main purpose of quantifying how much faster or slower a particular mission profile is degrading the device under test with respect to an equivalent stress that could be performed through a Power Cycling lifetime test.

The Rainflow Algorithm extracts from the mission profile the thermal jump history of the current signal. These information are used to generate different acceleration factors based on what they analyze in terms of fatigue analysis.

The one implemented in this thesis is the Norris Landzberg [27] version and it's the product between three contributions: the thermal jumps  $\Delta T_{vj}(i)$ , their duration in time  $t_{on}(i)$  and the number of cycles  $N_{cycle}$ , either 1 or 0.5 explained before. It is an update of the Coffin-Manson model [3], which is used to describe fatigue life of plastic materials under shear strain from thermal expansion and contraction.

The Coffin-Manson model is defined as follows in 2.21:

$$AF_{CM} = \left(\frac{\Delta T_{field}}{\Delta T_{test}}\right)^a \tag{2.21}$$

The available literature on solder systems [27] reveals that tin-lead solder is quite susceptible to metal fatigue. In the controlled chip collapse interconnection this can take the form of thermal fatigue, which differs from mechanical fatigue in that fixed strain levels rather than fixed stress levels are exhibited.

These two scientists [27] have found that thermal fatigue of the solder joint will occur within the usage time-period of the integrated circuit component, when temperature is changed (as during machine turn-on and shutdown) and large strains are initiated.

They modified the equation to account for effects of thermal cycling frequency  $(f_{sw})$  the frequency into the amount of time the power module stays ON, we obtain the on-time acceleration term:

$$AF_{Time} = \left(\frac{t_{on,field}}{t_{on,test}}\right)^b \tag{2.22}$$

For the maximum temperature  $T_{j,max}$  instead they implemented the Arrhenius term [3], which is used to describe the kinetics of many chemical and molecular phenomena:

$$AF_{Arr} = e^{c*\left(\frac{1}{T_{j,max,field}} - \frac{1}{T_{j,max,test}}\right)}$$
(2.23)

In fact, inside the exponent c, which is evaluated experimentally, it is present the activation energy of the SiC junction and the Boltzmann constant.

All these three terms come from the definition of accelerating factor as the ratio between the number of cycles of the accelerated test and the number of cycles that bring to failure the device, called field service life:

$$AF \coloneqq \frac{N_{test}}{N_{field}} \tag{2.24}$$

The number of cycles  $N_{field}$  from the previous definition is the same value  $N_f$  from formula 2.20 provided by the manufacturer: the number of cycle  $PC_{sec}/PC_{min}$  that brings to failure the device.

Once found the acceleration factor AF of a particular mission profile, we are able to find  $N_{test}$  through this definition 2.24. Since the AF tells us how much accelerated each mission profile is with respect to one  $PC_{sec}/PC_{min}$  cycle at a specific working condition  $(\Delta T, t_{on}, T_j)$ ,  $N_{test}$  is the equivalent number of Power Cycling cycles of the current mission profile.

As explained before in the Rainflow Algorithm section, once fixed the working conditions in terms of temperature and on-time, the goal of the acceleration factor is to compare different mission profiles through this fatigue analysis at the same equivalent stress of  $\Delta T_{field}$ ,  $t_{on,field}$ ,  $T_{j,field}$ . By calculating the strain levels in the interconnection for an applied temperature change, Norris & Landzberg had established and verified a model to determine the effects of thermal fatigue.

Since our focus was to understand the reliability of the chip-near/chip-far interconnection of the solder chip, this model came in very useful and handy as a fatigue analysis tool. The Coffin-Manson equation was found to be inadequate for projecting the thermal fatigue of solder interconnections, in laboratory experiments it was found to yield very pessimistic estimates of fatigue lifetimes [27]. Using the modified Coffin-Manson model, the number of cycles to failure for all solder fatigue failures was converted to equivalent machine cycles to failure.

The Norris & Landzberg model, from their paper [27] from 1968, is defined as follows:

$$AF_{NorrisLandzberg} = \left(\frac{\Delta T_{field}}{\Delta T_{test}}\right)^{a} * \left(\frac{t_{on,field}}{t_{on,test}}\right)^{b} * e^{c*} \left(\frac{1}{T_{j,max,field}} - \frac{1}{T_{j,max,test}}\right)$$
(2.25)

which can be rewritten in compact form as:

$$AF_{NorrisLandzberg} = AF_{CM} * AF_{Time} * AF_{Arr}$$
(2.26)

The exponents in the formula 2.25 are chosen experimentally by the manufacturer, as U. Scheuermann says in the Conference of Integrated Power Electronics Systems (CIPS [16]).

In the MATLAB script implementation the acceleration factor is computed as shown in the extended formula 2.25. The equivalent fatigue stress evaluated for a specific working condition, which is called field  $\Delta T_{field}$ ,  $t_{on,field}$  and  $T_{j,max,field}$  according to the AQG324 Guideline [1] and the Norris & Landzberg paper [27].



Figure 2.26: Flow of the Rainflow algorithm and its equivalent stress

The script computes this loop a number of times equal to the different amount of thermal jump that the Rainflow Algorithm has counted.

In Figure 2.27 and 2.28 are shown typical results from the rainflow algorithm of a common input power loss profile. The granularity of the plot tells us the amount of variability that a specific mission profile can have.



**Figure 2.27:** *Cycle counts from a typical mission profile* 

**Figure 2.28:** *Rainflow matrix from a typical mission profile* 

50 45

40

35 30

25

20

15

10

## 2.3.2 Script Results

The following tests are performed at  $T_{j,max} = 150^{\circ}$ C,  $\Delta T_j = 90^{\circ}$ C,  $t_{on} = 2s$  conditions where the power loss input is the only change to the system.

#### Test with continuous pulses of power loss

Here we have as input a continuous impulse of power loss that let the junction temperature increase constantly by 60%. The first order approximation keeps tracking very closely the waveform of the power loss, almost too close. The full response in orange continuously solves the transfer function which takes few seconds to get to its steady state for that particular value of power loss.



**Figure 2.29:** Comparison between approximation and complete response with a continuous impulse loss input

In Figure 2.30 a 3D matrix of the outputs of the rainflow algorithm for the range/mean visualization: for each cycle we have the average temperature, its thermal jump and how many times the temperature has done this shift. That alone compose a block, which sizes can be adjusted accordingly on x, y and z axis based on the granularity of the temperature.



Figure 2.30: Rainflow matrix from the continuous impulse loss

In Figure 2.31 a 2D-range histogram is shown that enumerates the amount of cycle counts for each  $\Delta T$  stress range. This simple test produced mainly two different types of variations in temperature, the bigger one of  $\Delta T = 30^{\circ}$ C is due to the initial conditions of the simulations; all the other variations of  $\Delta T = 18 - 21^{\circ}$ C corresponds to the actual continuous impulse losses.



Figure 2.31: Cycle counts of variation of temperature from the continuous impulse loss

The amount of damage (or lifetime loss) produced on the module by this simulation is only 0,012% of its life.

### Test with continuous random pulses of power loss

From figure 2.32 is very clear the difference between the first order approximation and the complete full response. A complete response takes into account all the variations, positive and negative, where an exponential behaviour is expected. The first order approximation cuts instantly the curve based on the instantaneous value of power, and it is very visible if we consider the two graph overlapped.



**Figure 2.32:** Comparison between approximation and complete response with a random impulse loss input

Figure 2.33 shows the rainflow matrix of the current simulation where temperature reaches higher values with respect to the previous in Figure 2.30. Due to the randomness of the signal, this 3D-matrix has different blocks at different entries, on stress range, average temperature and their occurrences.



Figure 2.33: Rainflow matrix from the random impulse loss

Also in Figure 2.34 is very clear that the randomness has surely caused a larger variance of stress range, in fact multiple columns spreads across the left part of the x-axis.



Figure 2.34: Cycle counts of variation of temperature from the random impulse loss

The amount of damage (or lifetime loss) produced on the module by this simulation is only 0,34% of its life. This value is 30 times higher than the previous simulation performed thanks to the evident higher range of  $\Delta T$  allover the signal.

Finally, in Figure 2.35 we can see the full results of the rainflow algorithm of the random pulses of power loss translated into reference temperatures as shown in Figure 2.26: these are the temperature range  $\Delta T_{ref}$  values at which the  $PC_{sec}$  must be performed N amount of times to bring to failure the DUT.



**Figure 2.35:** *Lifetime model estimation on four reference temperatures*  $\Delta T_{ref}$ 

In particular, this figure says that the random pulses of power loss from the previous example correspond to these different number of  $PC_{sec}$  cycles depending on the reference temperature range which they are obtained:

the same damage caused by the random pulses can be reproduced either by N number of  $PC_{sec}$  cycles at  $\Delta T_{ref} = 125^{\circ}$ C or by 7 \* N number of  $PC_{sec}$  cycles at  $\Delta T_{ref} = 75^{\circ}$ C.

This estimated model obtained from our script can be compared with the one from the AQG324 Guideline in Figure 2.25.

# Chapter 3

# **Proposed Hardware Tests**

## **3.1** Test Bench Setup for Power Cycling

The main goal of this proposed setup is to perform a temperature measurement through the internal body-diode while monitoring the  $V_{DS,on}$  of the same DUT, being one of the two End of Life parameters in table 2.2 that the AGQ324 [1] states. By having these measurements in hand we are able to correct with real data the lifetime estimation script, so to validate experimentally a theoretical model.

The first sketches about how to design an hardware test bench were necessary to understand the main challenges that the type of test required. In particular, a first distinction between static and dynamic test shows the different instruments needed:

| Static Test               | Dynamic Test              |
|---------------------------|---------------------------|
| High-Current Power Supply | High-Voltage Power Supply |
| Low-Voltage Power Supply  | Inverter Gate Driver      |
| 2x Multimeter             | 2x Multimeter             |
| SMU                       | SMU                       |
| Cooling System            | Cooling System            |

 Table 3.1: List of instruments for these tests

Even though half of this list is shared between the tests, the Dynamic one had more drawbacks: realizing a test bench to perform dynamic test requires a load to be attached to the phase of one of the three legs of the inverter. Since the instruments need to stay physically attached, a very fast multi-relay system was a must in order to properly feed an high current to the DUT and an high voltage few milliseconds after. In short terms, the project would become too expensive and too complex.

The AQG 324 Guideline only considers the Dynamic Gate Stress (DGS) and Dynamic Reverse Bias (DRB) as dynamic tests, but they are still under discussion since SiC modules are quite new and they have been added to this guide in 2021 only. This

means that there not much experience behind.

All tests besides those two are static and a real module will continuously switch during its life; in particular a SiC device switches at even higher frequencies so the dynamic counterpart is something that surely will be expanded in the next few years. Companies for now stick with the current AQG guideline and so static test are enough for them.

In this regard, we proceeded designing a static HW-SW bench test in order to perform a Power Cycling test. Since the Power Module has two MOSFETs inside we have two DUTs at our glance for each module. Both of them will be separately turnedon/off by a channel-independent low-voltage power supply, also capable of going to small negative values, in order to have a stable power signal for both devices.



**Figure 3.1:** *Typical high-current power supply* 



Figure 3.2: Typical low-voltage power supply

The power source of the system will be an High-Current DC Precision Power Supply needed to heat our DUT with very high currents. An ideal instruments can be a 1000A - 80V power supply.

Since the module has an NTC sensor inside, a dedicated digital multimeter (DMM) is needed to continuously monitor the temperature inside the module.

The third instrument is another digital multimeter, it's only task is to continuously monitor the forward voltage of the MOSFET, one of the two key parameters that the AQG guideline specifies as a failure and evaluation criteria. The DUT is failed if the forward voltage  $V_{DS,on}$  of a SiC-MOSFET increase more than 5% of its nominal value.



Figure 3.3: Digital Multimeter



Figure 3.4: SMU

The last key instrument of this setup is a SMU, Source Measurement Unit, an

instrument that combines a sourcing function and a measurement function on the same pin or connector. It can source voltage or current and simultaneously measure voltage or current.

It integrates the capabilities of a power supply, a digital multimeter (DMM), a current source, and an electronic load into a single, tightly synchronized instrument.



Figure 3.5: General SMU block diagram

Last but not least we have the cooling system. Also during this Power Cycling tests that try to replicate daily usage a water-cooled system must be considered. Strong impulse of large load current  $I_L$  will heat up very fast the power module.

In many cases it might be required to collect lots of data so that a plot or graph of the performance over time is generated. However, doing this manually is timeconsuming and error-prone. There are also lots of different experiments that require automated data collection to get faster or more accurate measurements or to take measurements over a long time-scale (months or even years).

The Hardware-Software bench test that we come up with has the job of reproducing either the  $PC_{sec}$  or the  $PC_{min}$ .



Figure 3.6: Current and temperature curve for PC<sub>sec</sub>

In order to perform the power cycling,

- the current source will provide the load current  $I_L$  during  $t_{on}$ ,
- the first DMM will continuously monitor the  $V_{DS,on}$ ,
- the second DMM will continuously monitor the internal NTC,
- the SMU will inject a small constant current and measure the voltage across the diode while the switch is off.

## Instruments synchronization

The very complex part about this implementation is the measurement of the voltage of the diode by the SMU. The goal of the measurement is to convert the voltage into a temperature value through a look up table that has to be previously filled.

Thanks to the properly designed cooling, its thermal properties of the material and the thermal impedance of the power module, heat is easily lost in the ambient. This means that in order to obtain a realistic value of the internal virtual junction temperature, the voltage measurement of the diode must be executed as fast as possible: within few microseconds the chip will lose up to 20% of its temperature. On a  $T_{j,max}$  of 175°C it's a 30°C of loss.

In order to achieve valuable data it's important to have either a very fast SMU in the order of microseconds like the R & S NGU [28] or the KeySight [29] so that accurate measurements (nA) can be performed just before the device changes temperature.

The SMU has the possibility to implement negative voltages, which is a very important features since many devices, in particular semiconductor, work differently if a positive or a negative voltage is applied. In fact in our case the SMU will be connected to the drain and the source of the MOSFET in order to perform two tasks on the body diode of the SiC device: during each cycle of the Power Cycling the SMU will measure the voltage across the diode exactly in the few micro-seconds after the device is OFF, every N (100 - 10000) cycles the SMU will characterize again in the full spectrum I/V the diode's characteristics since the test itself can damage the diode, our internal probe.



Figure 3.7: Current excitation and body-diode measurement, steps to be repeated at each cycle

In order to characterize again the body-diode as internal temperature sensor it is better to close the channel ( $V_{GS} = V_{GS,min}$ ).



Figure 3.8: I/V characteristics of the body-diode

The AQG 324 guideline specifies that the scope of random samples for these tests is at least six topological switches from at least three different DUTs. For this, it must be ensured that application-relevant current paths are tested in each case.

Finally, the steps that must be performed for every PC cycle are:

- 1. the low power voltage supply sets  $V_{GS} = ON$  and the high current power supply sets  $I_L$  for  $t_{on}$  seconds,
- 2. the low power voltage supply sets  $V_{GS} = \text{OFF}$  and within few microseconds the SMU injects a very stable and known low current to measure the anti-parallel diode voltage,

- only after a certain amount of PC cycles (10-1000), the body-diode must be characterized again

The full setup for the static test that comprehends the four main instruments explained above is shown in Figure 3.9. The real challenge of this system in figure is the synchronization between the instruments: within few microseconds the channel of the DUT must be closed and the flow of large current stopped, the SMU must be already reading the voltage across the diode to capture it as soon as possible. In this way, one can obtain a quite precise value of temperature that actually describes

the junction of the device.



**Figure 3.9:** Full setup for static test (Power Cycling PC<sub>sec</sub>/PC<sub>min</sub>)
### **3.2** Test Bench Setup for *R*<sub>th</sub> Measurement

Another test bench setup that can be implemented is the monitoring of the thermal resistance  $R_{th}$ , the second parameter of the End of Life criteria in table 2.2 stated by AGQ324 [1]. The main idea behind this setup is to take advantage of the two switches present inside the power module. Inside the first switch will flow a specific constant current for a certain amount of seconds to stabilize the temperature while the second switch, being in the proximity of the first, it will be heated by the first switch.



Figure 3.10: DUT circuit schematic

Since the dissipated power of the system will be only the one produced by the first switch,  $P_{loss}$  can be found multiplying the drop on the diode by the known excitation current; once read the ambient temperature  $T_{amb}$  and being this relation 3.1 always valid, one can easily obtain a first approximation of the  $R_{th}$  of the power module.

$$R_{th} = \frac{T_j - T_{amb}}{P_{loss}} \tag{3.1}$$

The system can be organized as follows:

- a low power voltage supply to keep one of the switches OFF during the measurement: the  $R_{th}$  will be measured from this switch;
- a low power voltage supply to keep the other switch ON during the measurement: this switch will be used to heat the power module;
- 2x SMU or one dual-channel SMU, one channel sources one body-diode while the other channel sources current and measures the anti-parallel voltage of the body-diode.

#### **MEASUREMENT SETUP - RTH TEST**



Figure 3.11: Test setup for the proposed R<sub>th</sub> measurement

### Chapter 4

### **Performed Hardware Tests**

#### 4.1 Temperature measurement with the internal body-diode

In either case one wants to implement the first Hardware bench for the PC tests or the second one for the  $R_{th}$  measurement, the main goal is to measure the forward voltage of the body-diode  $V_{SD}$  in order to extract a temperature measurement. Temperature in a power semiconductor device as an indicator of the aging of the module for reliability reasons.

There are three main temperature evaluation methods [19] that are currently used to evaluate the temperature of power semiconductor devices:

- Optical methods, which require modifying the power module: the chip has to be seen by the optical system; the polymer package and the dielectric gel have to be removed. High voltage operating conditions are therefore limited.
- Physical contact methods, where the main solution is using thermal probes (thermistors or thermocouples).
- Electrical methods, since the thermal dependence of electrical properties of semiconductor devices is used to determine the temperature: the chip is itself the temperature sensor.

The voltage measurement only gives a global temperature of the different dies inside the chip, and it is not possible to determine which one is the hottest or the coldest. Thermal mapping is needed in converters or in multi-chip modules where there are several devices in parallel. Our choice is trying to estimate the virtual junction temperature  $T_{vj}$  of the power module through the voltage of the diode, so via electrical method. The diode will be used as an internal temperature sensor. In order to convert the measurement from a voltage V to a temperature value  $T_j$ , a look-up table must be filled during a characterization process.

#### 4.2 Diode Characterization in Temperature

The characterize the body diode of the power module require either the first or the second configuration explained below:

- a low-voltage power supply, to keep the junction of the SiC module as empty as possible from forward conduction,
  - a low-current power supply, to polarize the anti-parallel body diode,
  - a digital multimeter (DMM), to read the voltage drop on the diode,
- 2. a low-voltage power supply, to keep the junction of the SiC module as empty as possible from forward conduction,
  - a Source Measure Unit (SMU) which, as explained before, embodies a current source and voltage sense device in one instrument.

In order to inject a constant current and read the voltage, it is necessary to completely shut off the switch; that is why a low-voltage power supply is needed. The  $V_{GS}$ must be set at its minimum value  $V_{GS,min}$ , which is a negative value in general for SiC power modules.

These datasheets from ST [21], Infineon [22] and ROHM [23] of 1200 V Automotive SiC Power Modules have typical ranges of the gate-source voltage that go from:

| Brand    | $V_{GS,min}$ [V] | $V_{GS,max}$ [V] |
|----------|------------------|------------------|
| ST       | -5               | 18               |
| Infineon | 0                | 15               |
| ROHM     | -4               | 22               |

**Table 4.1:** Typical values of Gate-Source  $V_{GS}$  voltage to switch on/off the power modules

Once set the  $V_{GS} = V_{GS,min}$  value, we can polarize the diode and read the voltage on the diode. The SMU can do this with its 2 probes without the need of other instruments. It has a sweep function that computes in one shot the whole I/V diode characteristics, but in this case we are just interested in one voltage value at a specific constant current.

The characterization must be performed at different temperature values that range from the minimum to the maximum temperature that the power module will experience during its whole life. These automotive power modules are tested for a storage temperature of 150°C and their junction temperature can easily reach 175°C.

Various sweep measurements at different temperature will look like this Figure 4.1 below, while Figure 4.2 shows the final characterization graph that one obtains after few measurements at a fixed constant current and interpolating the results.



**Figure 4.1:** *Typical temperature dependence of an I/V diode characteristics* 



**Figure 4.2:** Forward voltage dependence by junction temperature obtained at different values of current

During the aging of the power module, this process must be performed over and over again every 100-1000 cycles with the same excitation current in order to re-set again and update the look-up table since also the diode can present some aging too.

To start off with the test bench implementations and measurements, I soldered some fixture on top of the power module to perform the characterization of the two body diodes inside the DUT. I took specific high-temperature wires that were easily able to withstand up to 180°C.

The device was placed inside a laboratory oven and the I/V characteristics were taken after 1 hour of soak time, so after the device has been at constant temperature for at least 1 hour.





**Figure 4.3:** *High-side body diode I/V characteristics over temperature* 

**Figure 4.4:** Low-side body diode I/V characteristics over temperature

These figures show that as the temperature increase, the curve shifts to the left since the threshold voltage  $V_{th}$  to activate the junction decreases. It's important to underline that since this is an high-power high-current module, the complete characteristics must be obtained while reaching voltages and currents similar to the one used during normal operations. Of course this approach would have meant to have big and expensive power supplies. These tests were performed within the maximum power outlet of the SMU (20-30W).

During the characterization process I also injected three different constant stable currents through the SMU inside the body diodes one at a time. This measurements is the actual characterization needed so that the same current can be injected inside the power module during tests to retrieve the junction temperature.



Figure 4.5: Current injections at different temperatures for the voltage measurement,  $I_1 < I_2 < I_3$ 

# **Chapter 5**

## Conclusions

Thanks to the results gained both with simulations and experimental investigations, it was possible to create a lifetime model of a Silicon Carbide based power module.

The relationship between power, thermal resistance and temperature is something that has been studied for a long time, but especially nowadays it plays a fundamental role in the understanding of semiconductor's physics. Nowadays the main goal is to simulate the behaviour of the devices in order to predict in the first phases future problems.

The coexistence of performance, efficiency and thermal management is key to obtain better products overtime and simulation can save a lot of resources.

The hardware tests performed have the fundamental role of providing real-life data to populate and validate the model. On the other hand the software part lets us change easily the parameters of the current simulation in order to have a wide range of case studies to analyze.

We obtained a full working model thanks to the software implementations created throughout this internship and to the hardware data collected useful to correct the estimation model.

Initial future works on this topic are explained in chapter 3 since Power Cycling is the main lifetime testing to be performed to find the limits of the DUT. Both short  $(PC_{sec})$  and long  $(PC_{min})$  tests are fundamental to test all failure modes. Another important test to be performed is the thermal resistance  $R_{th}$  measurement: the AQG324 Guideline [1] suggests to sense it from two holes on the dual-side heat-sink, we presented another method in section 1.3.1 through electrical measurements only that can give a first qualitative result. Future tests include also a comparison between the only two available temperature sensors inside the power module: the readings from the voltage of the diode and the internal NTC thermistor; since the latter is far from the device junction, measurements can differ very much from real ones. Due to this, electrical measurements from the diodes that we performed are even more important and so more research into this field is always welcomed.

# **Bibliography**

- [1] AQG 324 Qualification of Power Modules for use in Power Electronics Converter Units in motor vehicles (2021 version, which includes SiC MOSFETs) - https: //www.ecpe.org/research/working-groups/automotive-aqg-324/
- [2] CS.00054 General Electrical and EM Performance requirements for E/E components (Internal FCA Document)
- [3] CS.00056 Environmental Specification for Electrical/Electronics (E/E) components (Internal FCA Document)
- [4] QR.00001 Global Product Assurance Testing (GPAT, Internal FCA Document)
- [5] JEDEC Global Standards for the Microelectronics Industry https://www. jedec.org/document\_search/field\_doc\_type/150
  - JEP 183 Guidelines for measuring the threshold voltage (VT) of SiC MOS-FETs,
  - JEP 184 Guideline for evaluating bias temperature instability of SiC MOS-FETs devices for Power Electronics conversion
- [6] Wolfspeed wolfspeed.com/knowledge-center
- [7] Analog www.analog.com
- [8] Loss Model and Efficiency Analysis of Tram Auxiliary Converter Based on a SiC Device - energies - Hao Liu, Xianjin Huang, Fei Lin and Zhongping Yang - https: //www.mdpi.com/1996-1073/10/12/2018/htm
- [9] ROHM SiC MOSFETs and Diodes Application Note https://www.ROHM.com/ documents/11303/2861707/sic\_app-note.pdf
- [10] A New Lifetime Model for Advanced Power Modules with Sintered Chips and Optimized Al Wire Bonds - https://www.researchgate.net/publication/ 281631078\_A\_New\_Lifetime\_Model\_for\_Advanced\_Power\_Modules\_with\_ Sintered\_Chips\_and\_Optimized\_Al\_Wire\_Bonds

- [11] EV Traction Inverter Employing Double-Sided Direct-Cooling Technology with SiC Power Device - https://ieeexplore.ieee.org/document/8507756
- [12] MATLAB lsim function https://it.mathworks.com/help/control/ref/lti. lsim.html
- [13] Rainflow Algorithm in MATLAB https://it.mathworks.com/help/signal/ ref/rainflow.html
- [14] Rainflow Algorithm: Standard Practices for Cycle Counting in Fatigue Analysis https://tajhizkala.ir/doc/ASTM/E1049-85%20(Reapproved%202011)e1.pdf
- [15] Rainflow Algorithm Based Lifetime Estimation of Power Semiconductors in Utility Applications - https://www.osti.gov/pages/servlets/purl/1327629
- [16] U.Scheuermann, P.Beckedahl: The Road to the Next Generation Power Module - 100% Solder Free Design, Proc. CIPS 2008, Nuremberg, ETG-Fachbericht 111, 111-120. - https://www.researchgate.net/publication/224231941\_ Model\_for\_Power\_Cycling\_lifetime\_of\_IGBT\_Modules\_-\_various\_factors\_ influencing\_lifetime
- [17] Thermal Resistance Circuits https://web.mit.edu/16.unified/www/FALL/ thermodynamics/notes/node118.html
- [18] Junction Temperature Measurement Method for Thermal Resistance Testing of SiC MOSFET Module - https://www.researchgate.net/publication/ 351468136\_Junction\_Temperature\_Measurement\_Method\_for\_Thermal\_ Resistance\_Testing\_of\_SiC\_MOSFET\_Module
- [19] Temperature Measurement of Power Semiconductor Devices by Thermo-Sensitive Electrical Parameters - A Review - https://ieeexplore.ieee.org/document/ 6096434
- [20] Electric cars are all vying for SiC...but why? https://www.avnet.com/wps/ portal/apac/resources/article/electric-cars-are-all-vying-for-sic/
- [21] Datasheet ST SiC Power Module https://www.st.com/en/ power-transistors/sct070hu120g3ag.html
- [22] Datasheet Infineon SiC Power Module https://www.rohm.com/products/ sic-power-devices/sic-mosfet/sct3080klhr-product
- [23] Datasheet ROHM SiC Power Module https://www.infineon.com/cms/en/ product/power/mosfet/silicon-carbide/discretes/aimw120r080m1/

- [24] Datasheet NTC Thermistor from Vishay 68kΩ https://www.vishay.com/docs/ 29049/ntcle100.pdf
- [25] Functional Introduction about 3-phase Inverters https://www.switchcraft. org/learning/2017/3/15/space-vector-pwm-intro
- [26] OnSemi Dual-side Cooled Power Module Application Note https://www. onsemi.com/download/application-notes/pdf/and9984-d.pdf
- [27] Reliability of Controlled Collapse Interconnections, Norris, Landzberg https: //dl.acm.org/doi/abs/10.1147/rd.133.0266
- [28] R & S NGU SMU Source Measurement Unit https://www.rohde-schwarz.com/ it/prodotti/misura-e-collaudo/alimentatori-in-corrente-continua/ rs-ngu\_63493-1005128.html
- [29] KeySight B2900 SMU Source Measurement Unit https://
  www.keysight.com/it/en/products/source-measure-units-smu/
  b2900-series-precision-source-measure-units-smu.html