Bharat Tank, Associate Director - IT & Operations, RICS School of Built Environment
Rahul Kumar is a seasoned IT professional. He was thrilled when he was offered a job at a multi-national company, which was entering the Indian market by establishing a manufacturing plant in a Special Economic Zone in the Delhi NCR. Armed with over 18 years of experience in managing IT Infrastructure, Kumar was hired to configure a high quality data centre, which would be the core of manufacturing plant from the IT operations perspective. The manufacturing plant was to be fully automated and would use their legacy ERP application for the local production.
It took almost four months from infrastructure designing to implementation to user training to production for the data centre to go live. The manufacturing unit started production as per the defined timelines and the companies’ customers started receiving quality products on time. The IT team grew in strength and Rahul was happy to showcase this as yet another achievement.
The joy was however short-lived. Within a year of setting up the data centre, the IT team realized that the rate of hardware failure was increasing, resulting in a lot of calls from OEMs (Original Equipment Manufacturers) asking for the faulty hardware to be replaced. At first, it started out with one or two cases of hardware failure, but the rate at which hardware started failing increased steadily. Initially, Rahul and his team treated hardware failure cases as both individual and isolated cases, without realizing that there was a larger problem.
Nonetheless, hardware failure became a regular talk point in the company’s weekly IT meetings. The IT team was puzzled about the reason for the failures. To be sure, the IT team had been maintaining the recommended cooling and humidity in the data centre at regular events as well as hardware diagnostics. Apparently, there was nothing wrong with data centre and cause of hardware failure looked to be a mystery. The failures had begun to affect the company’s business. OEMs started refusing to replace their hardware when the hardware failure rates continue unchecked.
Overcoming Dilemma
It was time to set things right. Rahul and his team decided to perform a thorough health check-up of data centre and its environment. Based on the final report, it was found that the server room air quality was contaminated with corrosive gasses such as sulphur dioxide/hydrogen sulphide, that caused corrosion on IT components containing copper and silver parts. On further investigation, it was found that the presence of open drains near manufacturing plant that carried Noida city’s garbage and untreated waste water of manufacturing/dyeing units, resulted in emission of highly poisonous and corrosive gases in the environment. Though the production unit was far from open drains, it was not possible to block the flow of contaminated air, which was affecting data centre. The immediate requirement was to improve air quality of the data centre and the team had to find ways either to stop it or at least delay the rate at which corrosion was impacting IT components. Assurance was given to OEMs, that an air quality control unit will be installed in the data centre to stop further damage. The IT team evaluated multiple industrial air purifier systems available in the market and checked recommendations from existing users, who faced similar problems before related to data centre. Air purifier system is installed in data centre and air quality improved substantially. After six months, the frequency of hardware failure reduced to one or two cases in six months or a year, compared to 2–3 cases on an average in a quarter earlier.
One of the key factors in deciding the capacity of the air purifier is the size of the data centre (similar to deciding the tonnage of an AC depending on the size of the room). Air purifiers need to run 24x7 and its filters need to be regularly cleaned and changed depending on the severity of the air quality in the data centre. Significantly, air quality of the data centre should be checked every quarter or at least twice in a year to understand the effectiveness of the air purifier. It is equally important to ensure that the data centre does not have any provisions for external air to enter into it. This can reduce effectiveness of the air purifier.
Air Quality around Data Centre Matters
IT heads should pay attention to the quality of air around data centre. When deciding budget for your company’s IT Infrastructure, it is highly recommended that a small portion of budget should be set aside for testing air quality of the data centre. Frequent hardware failures can distract IT department from focusing on important matters and they might waste time on solving petty issues. If the data centre's air quality is unhealthy and you do not have options to improve it, please take your IT partner or vendor into confidence and explain the situation before placing any IT infrastructure order. The OEMs should be made aware of the situation and it is advisable to take their confirmation in advance, so that they can solve the problem (if any). Document everything diligently, so that you will have support of your IT partner in case a downtime arises.