# Delay and Power Consumption of Static Bulk-CMOS Gates Using Independent Bodies D. Guerrero, A. Millan, J. Juan, M. J. Bellido, P. Ruiz-de-Clavijo and E. Ostua Electronic Technology Department, University of Seville phone: +34 954559964; fax: +34 954552764; e-mail: guerre@dte.us.es #### **Abstract** Digital designs implemented using SOI processes employ separated bodies for each transistor. This approach is not usually considered in digital bulk-CMOS design because of its obvious area penalty. However, the advantages obtained can justify its utilization in selected parts of the circuit. This is discussed in this paper. ### 1. Introduction A vital part of the effort in digital design is to improve the performance of logic gates. Static CMOS gates are the most widely used, and most of them are produced using a bulk-CMOS process. In a bulk-CMOS process, the body of NMOS or PMOS transistors is formed using the semiconductor material of the substrate itself. So, in the conventional approach (which we will call COBO, for Common Body) the transistors of the same type usually share the same body. As a consequence, they suffer from the socalled body effect [1, 4, 5], that is, their conductance is reduced when their source-body voltage V<sub>sb</sub> is not zero. This effect can be modeled as a dependence of the threshold voltage V<sub>t</sub> on the source-body voltage V<sub>sb</sub> [3]. The performance of traditional bulk-CMOS gates is degraded dramatically as the number of inputs increases, due to the body effect and the internal parasitic capacitance associated to series-connected transistors. This is shown in Figure 1.a: If all the inputs are set to logic value 1 and input I<sub>0</sub> experiments a rising transition, the series connected NMOS transistors corresponding to inputs I<sub>1</sub>, I<sub>2</sub> and I<sub>3</sub> will suffer from a reduced conductance due to an increased threshold voltage, lowering the performance of the gate. The number of transistors affected by the body effect depends on which input changes, so the pin-to-pin delay varies widely along the inputs (about a 45% in four input gates). Additionally, parasitic capacitance associated to nodes A, B and C in Figure 1a will also affect negatively the delay and power consumption of the gate. The delay variation along the inputs makes the circuit design process more complex and limits the fan-in of the gates. In SOI (Silicon On Insulator) processes the transistors are formed over a substrate of insulating material and each Figure 1: Schematics of two implementations of a NAND gate transistor uses a separated body. This makes it possible to avoid the body effect. Like SOI, current bulk-CMOS twin-well and triple-well technologies make it possible to use independent bodies for each NMOS and PMOS transistor of a logic gate. However, the possibility of using independent bodies for each series transistor has not typically been considered in bulk-CMOS logic design because of the associated area overhead. This possibility is explored in the proposed approach which we will call INBO (for Independent Bodies). The authors propose to make $V_{\rm sb}=0$ by connecting the source and body terminals of the transistors in the serial tree (Figure 1b). In this paper, the performance of both approaches is compared. In the following section the behavior of gates using independent bodies will be analyzed. In section three the layout of the gates used in the comparison will be presented. Section four will show the measures obtained for pin-to-pin delay, dynamic and static power consumption. Conclusions will be summarized in the last section. # 2. Analysis of the behaviour of INBO gates Connecting each body to the corresponding source has two remarkable consequences: - The source-to-body junction parasitic capacitance can be neglected. Furthermore, drain-to-body junction capacitances in the series chain are not grounded anymore and are charged at a lower voltage, accumulating less charge. Gate-body parasitic capacitances are also charged at a lower voltage. However, the bodysubstrate capacitance can affect negatively to performance. - Since V<sub>sb</sub> = 0 for every transistor, the body effect is avoided and transistor conductance is improved. With respect to static power consumption, it has two main causes in CMOS gates [5]: - Reverse-bias leakage currents - Subthreshold conduction The first cause is explained by the existence of parasitic diodes in CMOS gates such as the inverter shown in Figure 2. Each p-n junction forms a parasitic diode, so there is one for each drain and source terminal and one for the n-well. Depending on the input value, the diodes corresponding to the source and drain terminals can be reverse-biased, driving a little reverse bias leakage current that contributes to the static power consumption. The second cause of static power consumption is the conductivity of a MOS transistor not being completely equal to zero when the gate voltage does not reach $V_t$ . Hence, inactive transistors allow a little subthreshold current to flow from supply to ground. The reduced $V_t$ of the INBO gates can increase the subthreshold conduction, but the additional isolation provided by the individual wells should eliminate the reverse leakage currents. Figure 2: Parasitic diodes in a CMOS inverter ## 3. Test set-up Four input NAND and NOR gates have been designed using the COBO (Figures 3 and 4) and INBO (Figures 5 and 6) styles in order to compare their performance by electrical simulation after parasitic extraction. They have been implemented using a 0.18 µm triple-well CMOS process with transistor sizes wn = 240 nm, wp = 240 nm for NAND and wn = 240 nm, wp = 1440 nm for NOR and minimum lengths. Inputs are named $I_0$ , $I_1$ , $I_2$ and $I_3$ with input index increasing for series transistors nearer the output. Figure 3: Layout of a COBO NOR gate Figure 4: Layout of an INBO NAND gate Figure 5: Layout of a COBO NAND gate Figure 6: Layout of an INBO NOR gate Using independent bodies clearly introduces a remarkable area overhead when compared to the equivalent traditional implementation using a common body. Thus, the COBO implementations of the NAND and NOR gates take up 14.2 $\mu m^2$ and 19.6 $\mu m^2$ , while their INBO counterparts take up 43.7 $\mu m^2$ and 48.5 $\mu m^2$ respectively. It could be argued the transistors of the COBO cell could be resized to match the area of its INBO counterpart in order to improve their conductance, but this would increase the input capacitance so the driving gate would be slower and would consume more energy. On the other hand, the input capacitance of the INBO cell is not increased with respect to its COBO counterpart, so the driving gate is not negatively affected. Moreover, since the bodies are not connected to ground, the gate-body capacitances will be charged to a lower voltage. The next section will show the pin-to-pin delay, static and dynamic power consumption of each implementation obtained by electric simulation. The simulations have been carried out after parasitic extraction using the HSPICE [2] electrical simulator with the model card provided by the foundry and a nominal supply voltage of 1.8V (fan-out from 1 to 4 has been considered). #### 4. Results # **4.1. Delay** The delay results for both NAND and NOR gates can be observed in Figures 7, 8, 9 and 10. Each graph shows the propagation delay versus fan-out for each input of the gate. In the proposed approach the source-to-body junction parasitic capacitances have been eliminated at the expense of introducing body-substrate capacitances. At first glance, it might appear that the later ones are longer, but this is not necessarily true: the capacitance depends on the doping, and the doping employed in the wells can differ from the used in source/drain. In fact, there is a remarkable delay reduction using the proposed INBO style in almost all the cases. The delay for input I<sub>3</sub> (the nearest to the output node) is similar in COBO and INBO gates because parasitic capacitances are of little importance in this case: they are already discharged when the input rises in the NAND gate (falls in the NOR), and they are isolated when the input falls in the NAND gate (rise in the NOR). For the rest of the inputs, parasitic capacitances become important and the INBO style provides better performance. The slowest inputs get the best delay improvements, so the pin-to-pin delay is more homogeneous along the inputs with the INBO style. The NOR gate specially benefits from the INBO style and gets larger delay improvements in all the cases. If we compare the $t_{PLH}$ delay of the slowest input (I<sub>0</sub>) of both implementations we will see a reduction of a 9% in the NAND gate for fan-out 1, while in the NOR gate t<sub>PLH</sub> is reduced by 25% for the same fan-out. The speed up should be even larger for gates with more inputs Figure 7: $t_{\text{PLH}}$ versus fan out of the COBO and INBO NAND gates Figure 8: $t_{\text{PHL}}$ versus fan out of the COBO and INBO NOR gates Figure 9: $t_{\text{PLH}}$ versus fan out of the COBO and INBO NOR gates Figure 10: $t_{\text{PHL}}$ versus fan out of the COBO and INBO NAND gates ### 4.2. Dynamic power consumption Tables 1 and 2 show a remarkable enhancement in the power consumption of the INBO NOR gate, especially for the inputs consuming more power. The energy consumed when a pulse is applied at $I_0$ (the input consuming most power) is reduced by 22%. In the NAND case, on the other hand, there is not a clear improvement. Thus, in the input consuming most power, there is a power reduction of 5% for fan-out 1, while the power is increased by 0.5% for fan-out 4 (tables 3 and 4). | Input receiving the pulse | $I_0$ | $I_1$ | $I_2$ | $I_3$ | |-----------------------------|-------|-------|-------|-------| | Power for fan-out 1 (fJ) | 76.64 | 65.89 | 51.17 | 38.02 | | Power for fan-out 2 (fJ) | 88.16 | 77.72 | 64.25 | 49.28 | | Power for fan-out 3(fJ) | 99.21 | 89.45 | 74.92 | 62.52 | | Power for fan-out<br>4 (fJ) | 112.4 | 102.8 | 88.07 | 72.32 | Table 1: Power consumed when the COBO NOR gate receives an input pulse | Input receiving the pulse | $I_0$ | $I_1$ | $I_2$ | I <sub>3</sub> | |---------------------------|-------|-------|-------|----------------| | Power for fan-out 1 (fJ) | 60.40 | 57.17 | 46.60 | 35.77 | | Power for fan-out 2 (fJ) | 73.30 | 69.31 | 60.68 | 50.59 | | Power for fan-out 3(fJ) | 85.40 | 81.64 | 74.23 | 66.76 | | Power for fan-out 4 (fJ) | 98.53 | 94.96 | 86.55 | 78.18 | Table 2: Power consumed when the INBO NOR gate receives an input pulse | Input receiving the pulse | $I_0$ | $I_1$ | $I_2$ | I <sub>3</sub> | |---------------------------|-------|-------|-------|----------------| | Power for fan-out 1 (fJ) | 27.77 | 24.24 | 21.08 | 17.96 | | Power for fan-out 2 (fJ) | 32.57 | 28.95 | 25.27 | 22.36 | | Power for fan-out 3 (fJ) | 36.96 | 33.46 | 30.23 | 27.01 | | Power for fan-out 4 (fJ) | 42.09 | 38.42 | 35.11 | 31.62 | Table 3: Power consumed when the COBO NAND gate receives an input pulse | Input receiving the pulse | $I_0$ | $I_1$ | $I_2$ | $I_3$ | |---------------------------|-------|-------|-------|-------| | Power for fan-out 1 (fJ) | 26.34 | 23.93 | 20.86 | 17.91 | | Power for fan-out 2 (fJ) | 31.65 | 29.21 | 26.29 | 23.17 | | Power for fan-out 3(fJ) | 36.68 | 34.34 | 31.66 | 28.35 | | Power for fan-out 4 (fJ) | 42.26 | 39.60 | 37.04 | 33.92 | Table 4: Power consumed when the INBO NAND gate receives an input pulse ## 4.3. Static power consumption The static power consumption of NAND and NOR gates for all the possible input patterns and both INBO and COBO styles has been measured by electric simulation. The inputs vectors are numbered so that the number associated to the input vector ( $I_3$ , $I_2$ , $I_1$ , $I_0$ ) is $I_32^3 + I_22^2 + I_12^1 + I_02^0$ . The input vector 5, for example, corresponds to ( $I_3$ , $I_2$ , $I_1$ , $I_0$ )=(0, 1, 0, 1). When the serial tree is conducting, the voltage of each body is the same in the INBO and COBO implementations, so the static power consumption does not vary (see table 5). | | NAND | NOR | |-------------------------------|---------|----------| | Input vector | 15 | 0 | | COBO static power consumption | 76.1 pW | 483.5 pW | | INBO static power consumption | 76.0 pW | 483.5 pW | Table 5: Static power consumption when the series tree is conducting When the series tree is not conducting, on the other hand, the INBO approach shows lower static power consumption for almost any input, as shown in Figures 11 and 12. The input vectors have been grouped depending on the number of active transistors in the series tree. From one group of patterns to the other, the major contribution is due to sub-threshold current. From group to group, the number o cut-off transistors increase and the total impedance of the chain increases as well in both types of gates (COBO and INBO). Static power in INBO gates is almost constant inside a group while it varies significantly in COBO gates. The reason is that in the COBO style, the static consumption within a group will depend on the number of reverse-biased parasitic diodes corresponding to source/drain terminals. This is determined by the number of transistors in the series tree connected to the output by an active path. Thus, in the COBO NAND gate the input vector 12 (I<sub>3</sub>I<sub>2</sub>I<sub>1</sub>I<sub>0</sub>=1100) is more leaky than input vector 10 (I<sub>3</sub>I<sub>2</sub>I<sub>1</sub>I<sub>0</sub>=1010), since for the first vector there are three reverse-biased diodes while for the second vector there are only two. Input vectors 9 and 10 will in turn be leakier than 3, 5 and 6, since in the later ones only the drain diode connected to the output is reverse-biased. In the INBO NAND gate, on the other hand, the source to body parasitic diodes can be neglected since source and body are connected together, and the drain to body diodes cannot be reverse-biased since the n-wells are not connected to ground (except the one corresponding to input $I_0$ ). The lack of reverse leakage currents in the INBO gates make them less leaky than their COBO counterparts in almost all the cases, and is the reason why its static power consumption is almost constant inside each group of patterns. As mentioned, the transistors in the INBO gates present better conductance because they do not suffer from the body effect. This effect is evident when there is only one cut-off transistor in the tree. The transistors between the cut-off one and the output have better conductance in the INBO than in the COBO case. That is why the INBO NAND gate is a bit leakier for input vectors 7, 11, 13 and 14 while the INBO NOR gate is slightly leakier for input vector 8. Figure 11: Static power of the NAND gates when the series tree is not conducting Figure 12: Static power of the NOR gates when the series tree is not conducting Considering all the input vectors with the same probability, there is a static power saving of a 5% in the NAND case using the INBO style. In the NOR case the average static power saving is a about a 10% using the INBO style. #### 5. Conclusions As triple-well technologies become main-stream, the use of independent bodies (INBO) for each series transistor in static CMOS logic brings remarkable performance improvements in speed, dynamic and static power consumption when compared to the conventional common body approach (COBO). INBO gates also show more homogeneous pin-to-pin delay along the inputs. As mentioned, these improvements are obtained at the expense of area, but the proposed gates are not intended to replace the traditional ones in the whole design. By using the proposed approach just in critical paths, nodes with high switching activity or when a gate with a large number of inputs is required, the speed and power consumption of the system can be improved while keeping area under control ## 6. Acknowledgments This work has been partially supported by the Ministry of Education and Culture of the Spanish Government through the TEC2007-61802/MIC (HIPER) project and the Andalusian Regional Government's EXC-2005-TIC-1023 project. ## 7. References - [1] Kontiala, M., Kuulusa, M. and Nurmi, J., "Comparison of static logic styles for low-voltage digital design", Proceedings of 8th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Vol. 3, pp. 1421-1424, Malta (2001). - [2] Synopsys Inc., HSPICE Simulation and Analysis User Guide (2003). - [3] Tsividis, Y., Operation and Modelling of the MOS Transistor, McGraw-Hill (1987). - [4] Veendrick, H., Deep-Submicron CMOS ICs, Kluwer Academic Publishers, Ten Hagen en Stam, Deventer, The Netherlands (2000). - [5] Weste, N. and Eshraghian, K., *Principles of CMOS VLSI Design*, Addison Wesley, (1993).