Natural disasters? Man-made disaster? Take precautions so that IT crisis is no longer the terrible

After a lapse of nearly half a year in Beijing, Miss Lee still can not forget that in March the Bank of Communications down give her a series of shadows: the morning of March 21, Miss Lee would like an online bank transfer through the Bank of Communications, enroll in a learning classes, but online banking The page is slow to open so that she could not miss the last registration time, and later she learned the banking system failure, including Shanghai, Guangzhou and other cities nationwide, including outlets are not part of the transaction.

The Bank of Communications hosts the network and network link failures caused by IT system crisis, to as many customers like Ms. Li has brought a lot of trouble, it was sometimes even apply for a Bank of repaying the bank card business, so that the reputation of the Bank of Communications and Customer confidence has also been greatly hurt.

As more and more business operations in an increasingly complex IT systems over time, IT crisis is gradually moving from the "imaginary enemy" into a real center of the storm a little slip of concentration, it may cause a fatal blow to the business. The variety of IT sources of the crisis - earthquakes, floods, lightning, natural disasters such as terrorist attacks, system crashes, data-out loss, equipment damage and other technical problems, viruses, misuse, IT backbone of separation man-made, without exception, to enable IT departments The CIO were badly shaken up, skating on thin ice. While the CIO is not every day they have experienced these events, but if you do not take precautions prepared to deal with IT crisis, preparations for the event of problems, ranging from IT departments to work overtime late at night, she then retire CIO departure while in companies may fall into no restore the situation.

How to avoid IT crises? How can IT reduce the negative effects of the crisis to a minimum area? These are thought-CIO had a major problem day and night.

Imaginative response to natural disasters

Earlier this year, the earthquake in Taiwan through the Taiwan Strait led to a number of recent submarine cable disruptions course ought to have greatest impact to IT systems is one of the natural disasters. Under its direct impact on the Chinese mainland leading to China Taiwan, North America, Europe, Southeast Asia, the direction of the Internet and other large-scale paralysis, to the many enterprises causing serious economic losses.

IT has also been a similar crisis occurred in India, but India Mphasis company's business has not been affected in the slightest. Mphasis is India's leading software services exporter, in order to ease the company's business process outsourcing (BPO) operational risk, the company is not only the continuity of its customers to implement the plan, its IT department also uses a multi-layer label switching MPLS network to connect the other branches in their countries; and in the last mile access area, Mphasis company has taken redundancy protection, using two different transmission media are each other's backup, in which a problem when another line also allows service maintained. Mphasis company's network is a double line with redundant ATM (Asynchronous Transfer Mode) rings, which means that the company can in the event of unforeseen circumstances when the automatic switching system to no fault of the lines. These flash in the pan so that it can not only easy to deal with the risks from man-made, even if an earthquake of such natural disasters, Mphasis of the CIO can sit back and relax.

In addition to earthquakes, the floods CIO often test their crisis management capabilities. July 26, 2005, torrential rains hit Mumbai, flooding into India's leading automobile manufacturer Mahindra & Mahindra company's data center power basement room. At that time, flooding has risen to the generator panel, the fuel is more you use less, while the company's disaster recovery center began to water. In this moment of extreme crisis, the company has been working closely with the CIO prompt attention to flood emergency response team in consultation resolutely ordered to shut down 200 servers command. The system is shut down at 4:00 to choose, only lasted 4 hours, at the same time, it is the power house the water cleared out, and found a new fuel, and use dryer to dry the generator panel .
For these unforeseen risks, CIO who can not just sit waiting for IT to deploy a rush after the crisis countermeasures, but should be in IT as far as possible before a crisis foreseen in the network, hardware or data problems may occur with respect to from the will be the source of its resolve to maintain business continuity. Meanwhile, prepare the same time prepared to deal with the crisis, IT departments must constantly monitor the crisis in the signal and to carry out proper analysis and identification, and to take appropriate measures to deal with the impact of the crisis to reduce to a minimum.

Sub-level to solve equipment problems

Natural disasters is indeed frightening, the likelihood is not good news. They run to the CIO, the biggest threat probably comes from "man-made" crisis caused by IT. As more and more migrate to the core business IT platform, the core of IT equipment, once the problem occurs, triggered by the IT crisis, but business is not just a crisis, it is less than to promote prevention.

MNYL New York Life Insurance Company in India, has more than 100 branches, MNYL highly customized Enterprise applications, which are hosted in India, Gurgaon in the data center room. In March this year, the company's fiscal year end of a month, the data center core switches in a sudden failure. As the company's network system is a centralized star structure, the core switch failures led directly to a standstill among the country's business.

For CIOKumar is, unfortunately, the silver lining is the time to buy this core switches signed 24x7-hour replacement contract. At about 1 am, MNYL company and switch vendors have made contact more than received a 4:00 replacement equipment. Unexpectedly, a new problem has emerged, which with the existing equipment is not the same model, the switch has more than the existing slot on the one hand it can not be on to an existing rack, but also the original switch backup set of little usefulness.

CIO Kumar said: "We have this switch in a rack on the temporary installation. And then start all the cables from the switch moved to the replacement of old equipment, and its configuration. To 11 o'clock, 90% The operation is restored. "

Prevention of core equipment problems, MNYL the company's experience is worth learning: All the key equipment to be ready to spare equipment, core equipment and settings backups are essential. At the same time, support contracts should be carefully developed to the extent possible, taking into account the occurrence of any unforeseen circumstances, such as replacement of equipment, turnaround time and solve time everything should be in the contract with the equipment manufacturers specified. There is one detail, CIO are more to keep in mind: the detailed documentation and structured cabling work in the disaster recovery is particularly important. So 10000 can not be neglected to bring a big role in the details.

In addition to the core of IT equipment, gadgets may also sometimes lead to major crisis. In the IT field, any part (regardless of whether it falls within the scope of IT systems) are likely to fail, and upgrade IT crisis. Enterprise Information person in charge to do well to face any unforeseen happening.
India's state-owned refiner Bharat Petroleum's chief IT director Agrawal on the experience of a junction box caused by the small IT crisis. One day in April 2006 late in the evening, Agrawal received one from the corporate data center maintenance personnel of the telephone, said the data center is facing a crisis of power outages. Within 30 minutes Agrawal and his team rushed to the data center, found that backup power is being rapidly depleted and has forced data center shut down a relatively unimportant 10-12 system. Electricity in the UPS can support approximately 45 minutes, when, Agrawal decided to shut down all systems. In the UPS can hold 20 minutes of power when the server to complete a normal shutdown.

Later found out a little trouble wiring boxes. It turned out that the data center has come from two different power grid power line, but then in the same junction box. Junction box in the design, when there is a single point of failure, if one of the electricity supplier is a problem, can lead to the failure of this junction box; data centers can immediately switch to another backup power supply, but it can only persist in 90 minutes. After a fault, Agrawal arranged for more power in the data center maintenance staff on duty, but also added a closed-circuit television to monitor the data center, previously only every 3-4 hours to once every hour and now a physical examination, scope of the inspection is no longer limited to computer technology, but also in areas such as electricity and cable and so on.
For the IT response to the crisis caused by the general equipment, CIO the most basic approach is to establish a crisis management plan as well as the IT sub-plan. IT crisis management plan that the Crisis Management Plan (CMP), including a clear definition of IT risk management personnel, roles, responsibilities and authority, identifying the type and response to IT crisis response procedures, and to identify the required resources. An emergency response plan, including IT, business continuity plan, IT disaster recovery plan, etc., which also supports the various sub-plans for a major IT crisis management plan approach.

Operational errors lead to big trouble

In addition to IT equipment, the "man-made" also often lead to IT crisis. In the "man-made", in particular in order to hack attacks, the greatest hazard to the spread of viruses. If the core system suffered damage caused by viruses, IT departments may wish to learn India's Pantaloon Retail department store approach: first, the temporary placement several clean, safe computer, installation of key enterprise applications to allow users to visit such as the ERP of the Some of enterprise systems category in order to ensure normal operations. A temporary program should virus I, IT departments must deploy strong network security assurances.

The wipe out the virus while working, IT departments need to create an emergency help desk, kill the virus after the emergency desk to set a unified desktop computer settings, so that it can easily be a server-side management. IT departments can use the server-level network management tools within the framework of one or several infected computers easily and effectively isolated. At the same time, IT departments need to develop appropriate strategies to limit user access to USB devices and the Internet in order to reduce the virus entered the channel.

Man-made misoperation is the CIO had to face another problem. Sometimes, problems occur misoperation was extremely stupid, but the consequences are extremely serious. A domestic company to a network administrator in the budding IBMDS4800 do RAID, when wrongly KVM production system, even in the HUB on the new disk array on the DS4800 and the original production system on the DS4300 Disk Array At the same time made a DEMO, and carried out the clock synchronization, so that all Volumn Group fell to a production stopped ... ... 3.5 billion transaction data was missing. After many setbacks, the IBM engineers, under the guidance of second-tier, 3.5 billion transaction data was regained.

CIO is necessary to formulate a strict workflow, and set different permissions assigned to each IT staff, require employees to conduct themselves in accordance with the workflow within the purview of work. At the same time, IT departments should carry out regular training, the latest progress of their business and technical updates communicated to each employee. At the same time, arrange a specific staff positions daily log backups, and full backups on a regular basis to ensure that the daily work of trajectory can be traced back, every step of the operation can be to find the operational person in charge of what at what time operational information.

General IT Crisis Solution

For different types of IT crisis, CIO can summarize a variety of solutions. The IT crisis management can be taught CIO to formulate a complete set of early warning processes and the crisis solution.

① IT crisis early warning system must be established. Early-warning mechanism is not in general terms, CIO were in line with a variety of IT crisis, for different types of early warning programs. CIO an IT risk assessment can be listed in the table detailing the list of possible crises, and to assess their level, according to the likelihood from the most likely to less likely to arrange in sequence. For example, for equipment failure or human action resulting from IT disaster backup, CIO should be based on good business practices need to develop a detailed disaster recovery plan, backup time interval, the backup type, a local backup or remote backup.

② the formation of IT crisis management team. Its main role is comprehensive and clear to the business may face various crises to predict, in order to deal with the crisis to develop the strategy and steps; right IT staff crisis training; in the face of crisis to make a full and rapid handling of the crisis .

③ determine the level of IT crisis. IT crisis in a different state, there are different approaches. There are no predetermined level of IT crisis, a list of crisis management will bring about great confusion and inconvenience. IT departments need to be a crisis level, and to develop appropriate crisis management methods, the only way the crisis comes, so that "soldiers are to be blocked, water flooded soil."

④ to establish IT crisis management procedures and implementation details. These processes do not work normal business, but the IT crisis would prompt start and to function effectively, to play an important role in the handling of the crisis. This once a crisis occurs, departments, employees know what to do, without having to rely on a single key to turn the tide of Jizhongshengzhi.

⑤ IT crisis simulation to preview. IT crisis exercise was to assess whether the crisis early warning systems and effective implementation. Simulation training on a regular basis can not only enhance IT Crisis Group's rapid response capability, strengthen the sense of crisis management, but also can detect the crisis has drawn up contingency plans is substantial and feasible. IT crisis warning to identify deficiencies in preparation and can be improved.

IT crisis, very hard to detect, therefore, how to deal with the crisis has occurred, the loss and impact of the crisis to minimize also need a legal basis.

When the most damaging crisis, so IT departments need to do the first step is to contain the crisis, such as Pantaloon Retail companies to do some temporary solution, the first in the shortest possible period of time to master and control a crisis situation, the losses to a minimum. The second step to do is to prevent the crisis from spreading.

① on crisis management, we should immediately launch a crisis management team of the IT crisis, to do a comprehensive analysis of the situation: What are the causes of the crisis, internal or external causes? Status and trends of development of the crisis how? At this stage, speed is the key, the crisis not wait for anyone. After the crisis in the IT within the shortest time to respond to take corresponding measures, according to different circumstances to determine work priorities, with the loss becomes minimal.

② the causes of the crisis for the IT immediately formulate corresponding plans and crisis response. Is the switching network line or a quick call back-up devices is a temporary solution to start or manual data recovery, Once a good response, CIO necessary to define which departments and personnel involved in the rights and responsibilities, effective allocation of personnel, so that things Some things tube, which came in a crisis are able to quickly find their own place.

③ must have the IT crisis management budget. IT crisis management must be based on its own human, material and financial resource-based, and not to IT based on the type of crisis, or crisis management will be in the water, Jing Zhonghua, there is no practical significance.

Test the effectiveness of crisis management, CIO who can follow the following two steps: First, if a crisis in the non-office hours, the company what kind of internal communication system? For example, Sunday faced hard times, how long it takes to convey the message to every one of the responsible person? Secondly, the crisis for the IT types, the company what kind of emergency response plans? This program the last update was when? Previously there used to confirm whether it is effective? It is the other response plans with the company can match?

As the old saying goes: "If a man long and short term, there must be problems." Since the IT crisis is unavoidable, and only to prevent crises before they occur, can we reverse the crisis in Danxizhijian IT. IT usually a little more sense of crisis, multi-sets to deal with the development of various possible strategies IT crisis, the crisis comes, will be much more calm calm.

