# Do Temperature and Humidity Exposures Hurt or Benefit Your SSDs?

Adnan Maruf<sup>\*</sup>, Sashri Brahmakshatriya<sup>\*</sup>, Baolin Li<sup>†</sup>, Devesh Tiwari<sup>†</sup>, Gang Quan<sup>\*</sup>, and Janki Bhimani<sup>\*</sup> <sup>\*</sup>Florida International University <sup>†</sup>Northeastern University

Abstract—SSDs are becoming mainstream data storage devices, replacing HDDs in most data centers, consumer goods, and IoT gadgets. In this work, we ask an uncharted research question: What is the environmental conditions' impact on SSD performance? To answer it, we systematically measure, quantify, and characterize the impact of various commonly changing environmental conditions such as temperature and humidity on the performance of SSDs. Our experiments and analysis uncover that exposure to changes in temperature and humidity can significantly affect SSD performance.

Index Terms-robust performance, SSDs, design of tests.

### I. INTRODUCTION

Mitigating environmental impacts from temperature and humidity has been a challenge for all kinds of computer systems, from high-performance supercomputers [1], [2], [3] to edge computing platforms [4]. Such environmental impacts have caused severe damage to large data centers, leading to longduration service outages in the past. For example, Amazon's AWS and Microsoft's Azure datacenter failures were caused by unexpected weather in 2021 [5], 2020 [6], and 2018 [7]. Interestingly, a significant number of cases are caused by the storage system's performance degradation and failure under such impacts [8], [9], [10]. Therefore, system-level online testing for appropriate diagnosis is critical towards the robust performance and reliability of storage systems while operating under various environmental conditions [11], [12].

In the storage world, many Hard Disk Drives (HDD) are being replaced by flash-based Solid State Drives (SSD) by most data center providers worldwide for improved performance and reliability of SSDs over HDDs [13], [14]. We find very scares literature that identifies SSD failures in adverse environments [15], [16], [17], [18]. But none of the studies present the fine-grained runtime and post-performance effects of SSDs exposed to various commonly experienced temperatures and humidity. We believe the root cause of the vulnerability may be essentially different for SSDs compared to HDDs, as SSDs rely on integrated circuit (IC) modules, which may be gradually impacted by temperature and humidity [19], [20], [21] than HDDs' mechanical components. So, it is very important to carefully design controlled accelerated lab tests for high significance diagnoses that can later be used to predict storage failure and prevent data loss.

SSDs consist of the components such as NAND cells, memory controller, SATA/NVMe interface, onboard DRAM, capacitors, and other integrated circuits [22]. How the temperature and humidity impact SSDs would be the reflection of the effects of temperature and humidity on these internal components of SSDs. We found much evidence from the literature [15], [19], [23], [24], [25] on the solid-state in physics that shows that the high-temperature age the NAND cells more quickly than at the normal temperature due to the acceleration of charge leakage (i.e., retention loss) at a superlinear rate. Moreover, previous researches [20], [21] on electron devices such as capacitors and ICs found that the humidity levels impacted the lifetime of the capacitors and performance of ICs, due to an increase in interconnect capacitance and dielectric loss. *Therefore, our core insight is that the SSD performance is high-likely to be impacted by changes in temperature and humidity, as the literature* [15], [19], [21], [23], [24], [25] shows scattered *evidence of various components of SSD such as NAND flash cells, ICs, and capacitors are individually impacted.* 

In this paper, we design accelerated system-level online tests to understand the runtime and post-performance effects of various commonly experienced temperature and humidity changes on SSDs manufactured by multiple vendors. We provide an indepth analysis of how the SSD performance is impacted under a range of realistic environment settings. In particular, this is the first work to investigate the following research questions (RQs):

**RQ1:** What are the runtime impacts of temperature and humidity changes on SSD performance?

**RQ2:** What are the post-impacts of exposure to temperature and humidity?

**RQ3:** How do the impacts of the exposure to temperature and humidity on SSD performance vary across different SSD types and I/O operation types (e.g., read and write)?

To ensure the high significance of our diagnoses and reproducibility of our experiments, we repeat each experiment over six new out-of-the-box SSDs. In total, we tested over a hundred SSDs to derive the results presented in this paper. We ensured that the humidity and temperature values were always within the vendor-specified limits for all our experiments. However, by the end of each experiment, many SSDs came out bruised due to post impacts. Some SSDs succumbed to their adverse aging effects due to our accelerated experiments performing a large amount of I/Os. Thus, one of the biggest challenges of this study was not being able to reuse the same SSD for multiple experiments, as that would impact the correctness of our observations. The high cost of data collection and long experimentation time perturb us from exhaustively collecting data at every possible temperature and humidity within vendorspecified limits. We collect and analyze a large amount of sensor and performance data from each experiment using various I/O tools. Here, we share our selected findings and observations that we could conclude with high statistical significance. Anonymized experimental data is being made publicly available at https://github.com/adnanmaruf/SSD-Temp-Humid for the research community to understand better, model, and design alternative solutions to overcome the adverse effects.

## II. BACKGROUND

Among many components of an SSD, NAND flash cells, DRAM, memory controller, capacitors, and ICs are the key



ones. In this section, we briefly discuss the details of some of these internal components of the SSD.

[NAND flash cells] are used in SSD devices because of their density, durability, cost, and performance. NAND flash uses Floating-Gate MOSFET (FGMOS) transistors to store data. Fig. 1a shows a simple FGMOS cell. Similar to the MOSFET transistors, the FGMOS acts like an electrical switch where current flows between the source and drain terminal. The MOSFET channel becomes conductive when a voltage greater than the threshold voltage (V1) is applied to the control gate (CG). Instead of only the CG in MOSFET, in FGMOS, there is another gate called floating gate (FG) to control the flow of current. The FG is separated from the CG and the MOSFET channel by the oxide layer. When a high positive voltage is applied to the CG and a high negative voltage is applied to the source, electrons tunnel through the thin oxide layer and reach the FG. This operation is called tunneling. Electrons trapped inside the FG stay there even after the tunneling operation, making the FGMOS a non-volatile memory cell that can store data. When FG is charged with electrons, the threshold voltage is increased to V2 (V2>V1, e.g., see SLC in Fig. 1b) and the channel will be conductive only when a voltage greater than the V2 is applied to the CG. Now, depending upon the voltage at which the channel conducts, we can read the bit stored within this FGMOS [26]. This is how data is read from the NAND flash cell. As shown in Fig. 1b, a single-level cell (SLC) stores a single bit, either 0 or 1 differentiated with one threshold voltage, multi-level cell (MLC) stores two-bit data differentiated with three threshold voltages, and triple-level cell (TLC) stores three-bit data differentiated with seven threshold voltages to read the data.

Tunneling is also used to release the electrons from the FG. This time a high negative voltage to the CG and high positive voltage is applied to the source, which is the erase operation of the NAND flash. The tunneling used for both writes and erase operations gradually deteriorates the thin oxide layer, allowing the electrons to get inside and out from the FG more freely. This is known as the retention loss or charge leakage of the NAND cell. Such retention loss leads to an increase in the raw bit error rate (RBER). Thus, different methods like read-retry, error-correcting code (ECC) bits are used to ensure the correctness of the data at the cost of increased I/O latency of the read and write operations.

[Integrated Circuits (IC)] are key elements in modern electronics. A set of electronic components, e.g., resistors, transistors, capacitors, etc., are integrated into a small semiconductor material-based chip. Thus, ICs are a magnitude smaller, high-performing, cost-efficient compared to discrete



Fig. 2. Testbed for SSD environmental vulnerability tests.

components. The fabrication process of ICs has two main processing, Front-end-of-line (FEOL), when IC components are formed directly on the semiconductor material like silicon, and Back-end-of-line (BEOL), when all components are integrated to interconnect them with metal wiring.

[Chip capacitors] are another key component in SSDs as almost all ICs use capacitors. The volatile memory, i.e., random access memory unit used in the SSD, is mainly based on capacitors. In a capacitor, electrodes are separated by a dielectric medium such as air, vacuum, paper, titanium, etc. The metallic electrodes hold the charge, and the electricity starts to flow once the plates are connected. The capacitance of the capacitor depends on the area of the metal plates, the distance between the plates, and the dielectric material.

#### III. EXPERIMENTAL METHODOLOGY

This section describes our testbed setup, experiment sequences in different temperature and humidity levels, benchmark implementation, and performance metrics. Our experiments are designed to perform controlled tests on the SSD under different environmental conditions to capture performance impact accurately.

# A. Experiment Setup

To ensure that the temperature and humidity impacts are only applied to the SSD and are isolated from the other components of our host machine, we use a specially designed test chamber [27] to conduct the experiments. Our test chamber can maintain a steady-state temperature and humidity without being affected by each other or external environmental conditions. The host machine and the test chamber are placed in an isolated room with HVAC to keep the room temperature and humidity constant during the experiments. We have set up our testbed to ensure that only the SSD is exposed to the temperature and humidity control. At the same time, the other components of the host machine are kept under constant room temperature and humidity. The full setup is illustrated in Fig. 2. The main components of the testbed include a host machine, the SATA extension cables, the test chamber, and the SSD unit under test. The SSD is extended from the host machine connector using SATA extension cables through a sealed portal on the side of the test chamber. The host machine runs the I/O benchmark while the test chamber controls the environmental conditions to which SSD is exposed. Temperature and humidity sensors are placed inside the chamber close to the SSD for chamber feedback control.

Table I shows the hardware and software specifications used in our experiments. The test chamber we purchase is from the

| TESTBED SPECIFICATIONS. |                                       |
|-------------------------|---------------------------------------|
| Component               | Specs                                 |
| Test Chamber            | AES LH-1.5 [27]                       |
| Host Server             | Optiplex9020                          |
| Processor               | Intel(R) Core(TM) i7-4770 CPU, 3.4GHz |
| Cores, L3, DRAM         | 16 Cores, 8192K,16GB                  |
| Operating System        | Ubuntu 16.04 LTS (4.4.0-137-generic)  |
| SSD Capacity            | 120 GB                                |
| SSD Type                | SLC, MLC, and TLC                     |
| SATA Version            | SATA 3.2 6.0 Gb/s                     |

TABLE I 'estbed specification

TABLE II EXPERIMENT SEQUENCES.

| Exp. ID | Sequence                                                                                        |
|---------|-------------------------------------------------------------------------------------------------|
| 1       | $22.5^{\circ}$ C, 50RH $\rightarrow$ (22.5-60)°C, 50RH $\rightarrow$ 60°C, 50RH                 |
|         | $\rightarrow$ 60°C, (20-80)RH $\rightarrow$ 60°C, 50RH                                          |
| 2       | 22.5°C, 50RH $\rightarrow$ 50°C, 50RH                                                           |
| 3       | 22.5°C, 50RH $\rightarrow$ 60°C, 50RH                                                           |
| 4       | $60^{\circ}$ C, $80$ RH $\rightarrow 60^{\circ}$ C, $50$ RH                                     |
| 5       | $60^{\circ}$ C, 20RH $\rightarrow 60^{\circ}$ C, 50RH                                           |
| 6       | $70^{\circ}$ C, $80$ RH $\rightarrow 10^{\circ}$ C, $80$ RH                                     |
| 7       | $60^{\circ}$ C, $50$ RH $\rightarrow 60^{\circ}$ C, $80$ RH $\rightarrow 60^{\circ}$ C, $50$ RH |
| 8       | 22.5°C, 50RH $\rightarrow$ 50°C, 50RH $\rightarrow$ 22.5°C, 50RH                                |

AES temperature and humidity environmental chamber product series. Industries also use such test chambers to perform product characterization and testing [27]. The test chamber is capable of adjusting the temperature from 5°C to 94°C and relative humidity from 10% to 98%. It is equipped with a programmable automatic controller. After the temperature and humidity have reached the control setpoint during our experiments, we wait ten extra minutes before starting the I/O benchmark to avoid fluctuations. One limitation of this test chamber is that reducing relative humidity at lower temperatures requires an additional air dryer. Our test chamber is not capable of operating below 50 RH relative humidity when the temperature is below 20°C. For this reason, we do not include characterization for both low relative humidity and low temperature in our test sequences.

# **B.** Experiment Sequences

Table II shows our experiment sequences varying the temperature and humidity. In total, we listed eight sequences. Each of the experiments at any particular environmental condition listed in Table II is six hours long. For each run of the experiment sequences in Table II, we used out-of-the-box SSDs to avoid post-impacts of the previous experiments. Before starting each experiment sequence, we preconditioned the SSDs to make them reach a steady state. Preconditioning is a process of applying workload to an SSD to move it from the initial freshout-of-the-box state to a state where the steady performance of the device can be reproduced while repeatedly running the same experiments. For the trusted operational range of our SSDs, we conduct all experiments within vendor-specified limits in the datasheet (e.g., 10°C to 70°C). To ensure statistical significance and reproducibility of experiments, we conducted each of the experiment sequences six times. A new SSD is used from the three types (SLC, MLC, and TLC) for each experiment sequence in Table II repeated six times, and we present the average performance observed across them. We conduct experiments with only one SSD in the test chamber for better accuracy of our results. We also performed experiments over other sequences, but we only report the results that were interesting, and we could conclude with high statistical significance. For example, as we did not observe any impact for the humidity change at the room temperature, so we do not include that in Table II.

One of the biggest challenges of this study was not being able to reuse any SSDs for multiple experiments. Thus, these experiments are expensive and time-consuming, so designing a good test sequence is very important. First, to resemble the natural exposure to various temperatures and humidity while SSDs are deployed within electrical vehicle systems, IoT devices, and distributed data centers, we expose SSDs to continuous change in both the temperature and humidity. Specifically, Experiment ID 1 (Table II) simulates abrupt environmental factor changes over time while SSDs are in use. The temperature is varied within 22.5°C to 60°C and humidity is varied within 20RH to 80RH. 60°C and 50RH is the most common condition we found across multiple systems while processing. In idle state, 22.5°C and 50RH is the most common condition. Only while datacenter water cooling systems are fully active at high temperatures, we see high humidity of 80RH. We found an interesting impact on the performance of the SSDs, and to understand the impacts better, we conducted more controlled experiments changing only one of the environmental factors, keeping the other unchanged.

In the next five experiments, we capture the runtime effects of exposure to change in temperature or humidity. Thus, first, we benchmark SSD for six hours at the initial state of temperature and humidity, then we change the chamber conditions to the final state of temperature and humidity at which we benchmark SSDs again. We compare the performance between the initial and final states. In Experiment IDs 2 and 3 (Table II), we increase the temperature from the room condition (22.5°C) to high (50°C and 60°C) at room humidity (50RH). We observed a high positive impact on the SSD bandwidth at 60°C. We also experimented with varying the humidity at room temperature. However, we did not observe any impact on the humidity change at the room temperature. Then, we pick the best performing temperature (i.e., 60°C) to observe the impacts of decrease (Experiment ID 4 in Table II) and increase (Experiment ID 5 in Table II) in humidity compared to room level humidity. We found that decreasing humidity impacts the SSD performance positively. Next, to observe the effects of temperature decrements, we reduce the temperature within vendor-specified limit, i.e., 70°C to 10°C at 80RH in Experiment ID 6 (Table II). Note that although we observe the best performance at low humidity, we could not experiment with the temperature drop at a lower humidity level due to test chambers' limitations. We also surveyed that most test chambers have a similar limitation. Moreover, due to the inversely proportional relationship between temperature and humidity, this condition is least likely to happen in real environments.

Finally, we observe the post-impacts of the humidity and temperature change in Experiment IDs 7 and 8 (Table II). The initial and the final state of these experiments are the same. We compare the performance of benchmarks before and after being exposed to change in humidity and temperature. Particularly, in Experiment ID 7 (Table II), we run a sixhour experiment at room humidity and the best-performing



Fig. 3. (a) Tail latencies can improve up to 50% while running experiments on SSDs exposed to varying temperature and humidity (Latency - Lower the better); (b) SSDs can show higher average bandwidth at high temperature (50°C and above); (c) SSDs tail latencies decrease when humidity level decreases; and (d) tail latencies increase when humidity level increases.

temperature (i.e.,  $60^{\circ}$ C). Then we expose SSD to high humidity and finally set back to room humidity to run our second set of six-hour experiments. Similarly, in Experiment ID 8 (Table II), we measure the post impacts of the increase in temperature. We conducted Experiment ID 8 (Table II) at room humidity rather than best performing low humidity because of the same above-explained chamber limitation that obstructs attaining low humidity.

## C. I/O Benchmark Configuration

In this work, we use the popular open-source tool FIO (Flexible I/O) benchmark [28] to generate I/O workloads for the SSD. The FIO is configured to use the "libaio" I/O engine, with an I/O depth of 16, 50:50 read/write ratio, and I/O sizes of 4KB to 1MB to better mimic scenarios where the I/O queries by a real application stack. The I/O pattern is configured to be random since usually, it is the bottleneck I/O type to meet the service level agreements (SLA) in latency-critical applications. All the analyses presented in section IV, if not specifically mentioned, show the I/O performance for the mixed workloads with both read and write.

#### D. Performance Metrics

Among other performance metrics, tail latency is a critical metric in many real applications such as cloud computing and autonomous vehicles. Depending on the application, the tail percentile target varies. In this paper, we cover a broad range of tail latency percentiles at 90th, 95th, 99th, 99.9th, and 99.99<sup>th</sup> to account for different scenarios. The other metric we characterize is the I/O bandwidth as IOPS. IOPS captures the SSD operation throughput. We found the tail latency and the bandwidth are impacted the most compared to the other metrics, e.g., average latency and throughput. The tail latency and the bandwidth are also standard SSD performance metrics evaluated in other works [4], [29], [30]. As this study is conducted using real SSDs and not an emulated SSD, due to the proprietary internal details of SSD such as flash transition layer, we cannot directly instrument the internal characteristics of SSDs such as bit flip rate and read retries.

# IV. RESULTS AND ANALYSIS

In this section, we discuss the results and analysis from the experiments we performed on the SSDs. We observed that among SLC, MLC, and TLC SSD types, SLC has very minimal impacts. Hence, we only discuss the results for the MLC and TLC using the most impacted metrics. We begin by discussing how SSD performance is affected during runtime temperature and humidity change. Then, we discuss the post-effects of temperature and humidity on SSD performance. Finally, we discuss the long-term impact of temperature and humidity.

Counter-intuitively, we observed that the tail latencies improved 50% after SSDs were exposed to abrupt temperature and humidity changes. To the best of our knowledge, we did not find any prior work that studies the performance impact of the SSDs exposed to temperature and humidity. So, in search of finding motivation, we begin our experiments to replicate abrupt environmental changes in temperature and humidity in Experiment ID 1 (Table II). As discussed in section I, we see disaggregated evidence from the literature [15], [19], [21], [23], [24], [25] that individually various components that are used within SSDs such as NAND flash cells, ICs, and capacitors are impacted. Thus, we anticipate the overall performance of SSDs to be impacted as well. To verify the above anticipation, in this experiment, we first measure the performance of the SSDs at the room condition, i.e., 22.5°C, 50RH. Then while running the workload, we increase the temperature and humidity. Finally, we measure the performance at 60°C, 50RH. Experiment ID 1 (Table II) shows the stages of this sequence. Fig. 3a shows the percentage decrease in tail latency for TLC and MLC SSDs. This observation motivated us to further systematically study the impact of both temperature and humidity.

We found that runtime temperature changes mostly affects the TLC SSDs. To find the impact of the temperature changes, we first analyze the performance of the SSDs at high temperatures in room humidity in Experiment IDs 2 and 3 (Table II). From Fig. 3b we observe that TLC SSDs show 51% better average bandwidth than the room condition at 50°C, and 67% better average bandwidth at 60°C. MLC SSDs showed a considerably small amount of improvements in bandwidth at a higher temperature. We have observed the runtime effects of the temperature changes mostly on the TLC NAND flash devices. When temperature increases, electrons can move more freely between the floating gate (FG) and channel [19]. We assume that the high cell electron flow due to temperature increment helps distinguish different threshold voltage levels easily (as we discussed in sec. II). Thus, TLC flash provides faster read/write operations. On the other hand, in MLC flash, there is a higher difference among the three threshold voltage levels. Hence, this small increase in the flow of electrons does not impact the MLC SSDs at all.

Next, we observe the impact of the humidity changes on the SSD performance. We found that SSD performance deteriorates at high humidity level. We first analyze the impact of the humidity decrements and increments at room temperature. However, we did not observe any performance



Fig. 5. Post-impacts to SSD tail latency after exposure to (a) high humidity and (b) high temperature.

(b) TLC

(a) TLC and MLC

changes. Then we pick 60°C for varying the humidity level as we observed high bandwidth gain at this temperature in the previous observation. We perform Experiment IDs 4 and 5 (Table II) to observe the impact of humidity change at a high temperature of 60°C. In Experiment ID 4 (Table II), humidity decreases from 80RH to 50RH, and in Experiment ID 5 (Table II), humidity increases from 20RH to 50RH. In Fig. 3c, when humidity decreases, the 99.9<sup>th</sup> tail latency decreases by 58% for TLC SSDs and the 99.9th tail latency decreases by 45% for MLC SSDs. In Fig. 3d, the tail latency increases when the humidity increases. Although the TLC SSDs show a very little performance degradation, the  $99^{th}$  tail latency of the MLC SSDs can increase up to 18%. We anticipate this performance drop at high humidity is due to the impact of humidity on the SSD IC capacitance. Ref. [21] discovered that humidity severely hinders capacitor's performance. All capacitors lose capacitance at an increasing slop, and their Equivalent Series Resistant (ESR) values climb at high humidity. Moreover, the capacitance of MLC flash cells is originally higher than TLC flash cells, so the impacts are immediately more evident for MLC SSDs. However, as on-chip DRAM cells also depend upon capacitance, so we anticipate even TLC SSDs may be impacted upon prolonged exposure.

Finally, we simulate the conditions of sudden temperature drop in case of climate control system failure. **Temperature decrement shows a lower bandwidth for TLC SSDs.** Fig. 4 shows the percentage of bandwidth change while temperature changes from 70°C to 10°C maintaining relative humidity at 80RH (Experiment ID 6 in Table II). Similar to the earlier findings, only TLC SSDs are impacted. This time the bandwidth suffers a high degradation ranging from 35% to 65%. When the temperature decreases due to the reduced mobility of electrons, distinguishing seven different threshold voltage levels becomes more challenging for TLC SSDs.

Understanding the post-impact of exposure to any environmental condition is important as this would be the lasting performance of SSD even at normal environment conditions later. **SSD tail latencies showed negative post-impact when exposed to high humidity.** To observe the post-impacts of the environmental changes, we conduct sequences Experiment IDs 7 and 8 (Table II). We expose SSDs to high humidity (80RH) for six hours. Then, to analyze the post-impacts, we compare the performance of SSDs over six hours prior to increasing in humidity and six hours after exposure, at room humidity. As we can see in Fig. 5a, the 99.99<sup>th</sup> tail latency of TLC and MLC SSDs can degrade up to 75% and 10% respectively. In our previous finding, we observed that high humidity degrades the runtime performance of the SSDs. From Fig. 5a, we find that the performance degradation is not only at runtime, but high humidity leaves post-impact by damaging the IC and capacitors. IC Back-End-of-Line (BEOL) components are prone to permanently suffer from humidity penetration due to their sole protection being a moisture-crack barrier [20].

We observed negative post-impact of high temperature on TLC SSDs. To explore the post-impact of the temperature changes, we perform Experiment ID 8 (Table II), where we compare the performance at normal conditions, i.e., 22.5°C, 50RH before and after exposing the SSD to 50°C, 50RH. As shown in Fig. 5b, we observe negative post-effects on TLC SSDs with an increase in the tail latencies. MLC SSDs did not show any post-impacts of increase in temperature. In the NAND flash cell, the required tunneling for write and erase operations degrades the thin oxide layer over time which causes electrons to leak from the FG. We anticipate this process of retention loss acceleration at high-temperature impacts the TLC flash cell permanently. Increased retention loss may lead to increased read retries and replacement of bad cells with new cells from over-provisioned (OP) cells, causing performance penalties [19]. As MLC flash is less sensitive to retention loss, MLC SSDs do not show post-impact of the high temperature after the short period of exposure. We did not observe any post-impacts of decreasing humidity and temperature. We also analyze the performance difference of read and write I/Os separately. We think because of the same above-mentioned reasons, we observed that temperature and humidity changes impact the write I/Os more than the read I/Os. We found that, on average, read I/Os bandwidth can degrade up to 62% while the write I/Os bandwidth can degrade up to 85%.

Finally, we continued our long-term experiments with the intent to let it continue until the SSD wears out by writing more data than what is specified in the warranty sheet. To our surprise, some SSDs running under high humidity observed fail-stop faults much before they surpassed the write endurance limit. We note that these failures were not observed in all SSDs, making it difficult to predict and proactively manage such failures. Also, any SSD which was under normal room conditions did not show any such behavior. The SSD failures resulted in the total loss of all the data present on the SSDs. Upon further inspection of the logs collected until SSD was operational, we observe that the "media wear-out" increased at a 2x higher rate towards the end, despite the same workload. This may be because the damaged NAND cells are rapidly replaced by spare NAND cells of the over-provision (OP) region until there are no new NAND cells to replace, and at that stage, SSD fails. For temperature changes at room and lower humidity, we do not observe any SSD failure.

We summarize our observations in Fig. 6, with temperature on the x-axis and relative humidity on the y-axis. The intersection of axes in the plot is set at the room condition, i.e., 22.5°C



Fig. 6. Summary of the impacts of temperature and humidity changes.

and 50RH. Due to the test chamber's operating constraint, we could not run any experiment in the lower-left quadrant, i.e., low temperature-low humidity zone. All the experiment sequences from Table II are represented using white circles. The arrows indicate the sequence state of the experiments. For example, 2i denotes the initial state of Experiment ID 2 (Table II), i.e., 22.5°C, 50RH and 2f denotes the final state of Experiment ID 2 (Table II), i.e., 50°C, 50RH. The black and white-colored arrows are for the runtime and post-impact experiments, respectively. Based on the observed performance of the SSDs in different temperature and humidity levels, we colored the quadrants where the worst performance is represented by red and the best performance by green. From Fig. 6, we can see that it is safe to operate at high temperature ( 60°C) and low humidity (20RH) levels as it gives higher runtime performance with minimal post-impacts. In comparison, high humidity at low as well as high temperature can extremely deteriorate SSD performance and reliability.

### V. CONCLUSION

This paper begins by posing a simple question for investigation: do temperature and humidity exposures hurt or benefit your SSDs? We conclude by observing that humidity has a severe post-impact on the tail latency of the SSD, even when SSDs are operating under room conditions. This can have profound implications for data center SLAs and usage of SSDs in autonomous vehicles. The extent of the impact is dependent on the NAND flash types of the SSDs. Additionally, our finding shows that a small increase in temperature may be instantaneously beneficial to the performance of SSDs, but it may also have some minimal post impacts. In the future, we plan to further alleviate our understanding by designing models to capture the observed trends to simulate the performance behaviors, exploring different real workloads, NVMe interface, long-term impacts, and characterizing with other environmental factors such as electromagnetic waves.

### ACKNOWLEDGMENTS

This work was partially supported by Cyber Florida Collaborative Seed Award, Northeastern University, and National Science Foundation Award CNS-2008324 and 1910601.

#### REFERENCES

 AVTECH. (2021) Humidity In Your Data Center. [Online]. Available: https://avtech.com/articles/8644/humidity-in-your-data-center/

- [2] M. Platini, T. Ropars, B. Pelletier, and N. De Palma, "Cpu overheating characterization in hpc systems: a case study," in *FTXS*. IEEE, 2018.
- [3] UptimeInstitute. (2021) Extreme weather affects nearly half of data centers. [Online]. Available: https://journal.uptimeinstitute.com/ extreme-weather-affects-nearly-half-of-data-centers/
- [4] S.-C. Lin, Y. Zhang, C.-H. Hsu, M. Skach, M. E. Haque, L. Tang, and J. Mars, "The architectural implications of autonomous driving: Constraints and acceleration," in ASPLOS, 2018.
- [5] M. Otey. (2021) AWS Details Frankfurt Data Center Outage Cause. [Online]. Available: https://petri.com/?p=658442
- [6] M. Foley. (2020) Microsoft's March 3 Azure East US outage: What went wrong (or right)? [Online]. Available: https://www.zdnet.com/article/ microsofts-march-3-azure-east-us-outage-what-went-wrong-or-right/
- [7] TheRegister. (2018) Azure North Europe downed by the curse of the Irish – sunshine. [Online]. Available: https://reg.cx/2K4X
- [8] E. Xu, M. Zheng, F. Qin, Y. Xu, and J. Wu, "Lessons and actions: What we learned from 10k ssd-related storage system failures," in ATC. USENIX, 2019.
- [9] D.-Y. Lee, J. Lee, J. Hwang, and S.-H. Choa, "Effect of relative humidity and disk acceleration on tribocharge build-up at a slider–disk interface," *Tribology international*, vol. 40, no. 8, pp. 1253–1257, 2007.
- [10] S. Sankar, M. Shaw, K. Vaid, and S. Gurumurthi, "Datacenter scale evaluation of the impact of temperature on hard disk drive failures," ACM *Transactions on Storage (TOS)*, vol. 9, no. 2, pp. 1–24, 2013.
- [11] S. Halder and G. Sivakumar, "Embedded based remote monitoring station for live streaming of temperature and humidity," in *ICEECCOT*. IEEE, 2017.
- [12] S. Sachdev, J. Macwan, C. Patel, and N. Doshi, "Voice-controlled autonomous vehicle using iot," *Procedia Computer Science*, 2019.
- [13] C. Metz. (2012) Flash drives replace disks at amazon, facebook, dropbox. [Online]. Available: https://www.wired.com/2012/06/flash-data-centers
- [14] N. Agrawal, V. Prabhakaran, T. Wobber, J. D. Davis, M. S. Manasse, and R. Panigrahy, "Design tradeoffs for ssd performance." in *ATC*, 2008.
- [15] J. Meza, Q. Wu, S. Kumar, and O. Mutlu, "A large-scale study of flash memory failures in the field," in *SIGMETRICS*. ACM, 2015.
- [16] H. Zhang, E. Thompson, N. Ye, D. Nissim, S. Chi, and H. Takiar, "Ssd thermal throttling prediction using improved fast prediction model," in *ITherm.* IEEE, 2019.
- [17] Y. Wang, X. Dong, X. Zhang, and L. Wang, "Measurement and analysis of ssd reliability data based on accelerated endurance test," *Electronics*, 2019.
- [18] B. Schroeder, A. Merchant, and R. Lagisetty, "Reliability of nand-based ssds: What field studies tell us," *Proceedings of the IEEE*, 2017.
- [19] Y. Luo, S. Ghose, Y. Cai, E. F. Haratsch, and O. Mutlu, "Heatwatch: Improving 3d nand flash memory device reliability by exploiting selfrecovery and temperature awareness," in *HPCA*. IEEE, 2018.
- [20] F. Stellari, C. Cabral, P. Song, and R. Laibowitz, "Humidity penetration impact on integrated circuit performance and reliability," in *IEDM*. IEEE, 2019.
- [21] H. Wang, P. D. Reigosa, and F. Blaabjerg, "A humidity-dependent lifetime derating factor for dc film capacitors," in *ECCE*. IEEE, 2015.
- [22] J. Bhimani, T. Patel, N. Mi, and D. Tiwari, "What does vibration do to your ssd?" in DAC, 2019.
- [23] N. Mielke, T. Marquart, N. Wu, J. Kessenich, H. Belgal, E. Schares, F. Trivedi, E. Goodness, and L. Nevill, "Bit error rate in nand flash memories," in *IEEE IRPS*, 2008.
- [24] W. Choi, M. Arjomand, M. Jung, and M. Kandemir, "Exploiting data longevity for enhancing the lifetime of flash-based storage class memory," *SIGMETRICS*, 2017.
- [25] M. Xu, C. Tan, and M. Li, "Extended arrhenius law of time-to-breakdown of ultrathin gate oxides," 2003.
- [26] S. Aritome, NAND flash memory technologies. Wiley, 2015.
- [27] A. E. Systems. (2021) Environmental Test Chamber LH Series. [Online]. Available: https://www.associatedenvironmentalsystems.com/ products/lh-1-5
- [28] J. Axboe. (2021) Fio. [Online]. Available: https://fio.readthedocs.io/en/ latest/fio\_doc.html
- [29] H. S. Gunawi, R. O. Suminto, R. Sears, C. Golliher, S. Sundararaman, X. Lin, T. Emami, W. Sheng, N. Bidokhti, C. McCaffrey *et al.*, "Fail-slow at scale: Evidence of hardware performance faults in large production systems," *TOS*, 2018.
- [30] J. Bhimani, A. Maruf, N. Mi, R. Pandurangan, and V. Balakrishnan, "Auto-tuning parameters for emerging multi-stream flash-based storage drives through new i/o pattern generations," *IEEE TOC*, 2020.