Helicopter safety reports published

July 17, 2014

The UK House of Commons Transport Committee has published its report into offshore helicopter safety available here.  The report acts as a companion to the technical report conducted by the Air Accident Investigation Branch (here), focusing on the May 2012 Aberdeen, and August 2013 Shetland Super Puma helicopter incidents, where both helicopters were ditched into the sea after coolant failures.

The Government report notes:

  • problems with the safety briefings provided to passengers, where the passengers chose not to use the emergency breathing system, based on what they were told during the pre-flight briefing, and
  • a ‘culture of bullying’, where staff concerns over the safety of the helicopters were ignored (although no evidence was found to suggest the Super Pumas are less safe than other helicopters).

The Government has asked for a further report from the Civil Aviation Authority as to why more helicopter incidents are reported in Norway than in the UK.  The Government also notes that the impact of commercial pressure on helicopter safety has not been looked at in enough detail, due to commercial sensitivities making it difficult to see the contractual obligations being placed on helicopter providers.


Case study: Liverpool rail accident, October 2011

July 22, 2013

In October 2011 teenager Georgia Varley was killed by a train she had left 30-seconds earlier, after she was struck by the train and fell between the train and the platform.  The investigation concluded that the guard had dispatched the train whilst the girl was leaning against the train.

On Liverpool’s Merseyside railway network, train guards are required to check the area along the side of the train is clear of passengers prior to despatching the train from each station.  The procedures for this include closing all passenger doors, leaving the train to visually inspect the edge of the platform is clear of passengers, re-entering the train, closing the guard door and then sending the ‘ready to start’ code to the driver.  The process from the guard’s door closing and the train departing can take up to 12 seconds.

Guards frequently used unauthorised methods during train dispatch, including sending the ‘ready to start’ code before the guard door is closed, and briefly opening and closing the passenger doors to prompt passengers to enter or leave the train.  These methods could reduce the time needed to get the train moving by 6 seconds.  This seems to have been done in this case.

This case study throws up a number of important human factors and process safety issues that could be applicable to other industries:

Looked but failed to see?

Moments before the train set off, the guard warned Georgia to move away from the train, however it is unknown if the guard failed to see Georgia before sending the ‘ready to start’ code.  She had left the train late, walked over to the station wall and then walked back to the train, leaning against the window as the doors were closing for a second time.  The guard was prosecuted for gross negligence, as it was assumed by the court that he had seen her and given the signal to start anyway, perhaps because he had expected her to move – a gross violation of operating procedures.  However, the report notes that it is possible the Guard did not see her before sending the ‘ready to start’ code.  His attention may have been on the crowd of people exiting the train, it may have been on the control panel used to send the ‘ready to start’ code, or he may have ‘looked but failed to see’ her, a common phenomenon in routine, repetitive tasks.

Non-compliance with procedures

After the accident, full compliance with authorised dispatch procedures resulted in significant delays, which only abated once common but previously unauthorised methods of dispatch were authorised.  Although the report did not find evidence that the guard or driver was under time pressure and this does not seem to be a causal factor, this does beg the question of whether the use of unauthorised dispatch methods were a result of time pressure and whether time pressure poses a risk of similar incidents in future?

It may also be worth noting that if the authorised procedures had been followed, the time from closing the guard door and the train departing would have been around 12 seconds.  Georgia had been leaning against the train for 11 seconds prior to the guard warning her to move (after the ‘ready to start’ code had been sent), and at least 3 seconds before the ‘ready to start’ code had been sent.  The report notes that those 12 seconds provide adequate time for a situation such as this to arise which a guard could fail to notice:

“The guard could have followed Merseyrail’s published procedure to dispatch the train when the passenger doors first closed, as they were unobstructed and the platform adjacent to the train was clear. Under the Merseyrail procedure the guard would board and then wait until his door closed before sending the ‘ready to start’ code. He would not have been able to see the young person approach and come into contact with the train because of his narrow field of view through the door’s fixed window and, if all other events remain unchanged, the outcome would have been unchanged.”

The operating procedures in this case may therefore not have been ideal for both business and safety needs.

Ergonomics and design

If the guard’s door was closed before he sent the ‘ready to start’ code, he would not have been able to see along the platform edge anyway.  Likewise, if the control panel was tactile, the guard could have sent the ‘ready to start’ code whilst still looking along the platform edge.  Whilst these features are now standard, they were not at the time the train was built.  The report notes other methods to improve visibility along the platform, such as mirrors, monitors and a different placement of controls.

Mitigation

Once the ‘ready to start’ code had been sent there was no way to quickly stop the train that would have affected the outcome of the incident.  Both methods (sending a stop code, or pulling the emergency handle) took time to do, and were reliant on the driver reacting (in some cases the driver might decide to carry on to the next station).

One of the most effective improvements that could be made is to make it much harder for passengers to fall between the gap and the train, through screen doors, reducing the gap between the platform and the train, or other safety features.  owever, these are likely costly improvements to make.

Final thoughts

Case studies such as this show that, particularly when operating legacy equipment and designs which are hard to change, the human element is often (over?)relied on to control hazards.

Consider the risk that human failure (whether an error or a violation) poses to you organisation: is it the last link in the chain?

Is it possible to operate safely and efficiently by sticking to the procedures, or do the procedures need changing?

If a total system redesign is not possible, can the design be retroactively improved as a short-term solution to mitigate human failure?


How to avoid mistakes

April 2, 2013

Episode 3 of the latest series of BBC’s Horizon, How to avoid mistakes in surgery, provides a fascinating insight into attempts to reduce human failure in the medical and aviation industries and fire service.

In particular, it focuses on crew resource management-type initiatives – a topic of a current EI Human and Organisational Factors Committee project – and successes in preventing human error using simple techniques, such as checklists.

Click the link to watch (availability may be limited depending on your country).

Do you have any simple successes to report as good as the improvements mentioned in the programme regarding the use of checklists or other simple techniques?


Anhydrous hydrofluoric acid (AHF) release in South Korea, 5 dead, 18 injured

October 29, 2012

Five workers were killed and 18 others injured in an AHF release which took place 27th September, 2012, at Hube Global, based 200 kilometers southeast of Seoul, South Korea.  CCTV footage of the event has been made available.

3000 people downwind of the incident also received emergency care for nausea and other symptoms, and the incident has reportedly affected crops and livestock, and caused an estimated $15.9 million lost production to nearby businesses.  The area affected has been classed a special disaster zone, eligible to receive central government funding for clean-up operations.

During the incident, two workers on top of a tank lorry for transfer, two workers at ground level for pump repair and one officer in a nearby office building died when AHF released from a valve from the top of the lorry.  Whilst the cause of the release is being subject to an enquiry, early reports suggest a human factors cause – the workers may have mistakenly fully opened the transfer valve.  Some issues apparent from the CCTV footage and worth considering include:

  • Both workers working on top of the lorry had no chemical protective clothing or self-contained breathing apparatus (SCBA).
  • There didn’t seem to be any fall-protection in place.

There may have also been issues concerning lack of emergency response equipment/systems to mitigate the leak, and emergency responders not being aware of the treatment for AHF.

An important question to ask is was a safety critical task analysis or a quantitative human reliability analysis conducted for this task?  These types of analyses may have identified these risks and helped put in place safeguards (for example, changing the valve design, the use of adequate PPE).


Non-technical skills: lessons from Air France Flight 447

September 4, 2012

Popular Mechanics examines Air France Flight 447, which was flying from Rio de Janeiro to Paris on 1 June 2009 when it crashed into the Atlantic Ocean, killing the 228 passengers and crew.

The incident happened after a build-up of ice caused the autopilot to completely disengage.  Two equally-ranked first officers safely took manual control of the plane and tried to increase the altitude.  However in doing so the plane began to signal an audible stall alarm (which occurred 75 times during the course of the incident), which was seemingly ignored.  After they had gained altitude and speed one of the pilots then tried climbing again – reverting back to rules for gaining altitude during take-off or aborting landings, but not applicable at higher altitudes – signalling the stall alarm again.  When the pilots tried to take remedial action to increase speed, they took opposite actions on their respective yokes, one pulling down (the correct action) and one pulling up – the plane’s duel yoke controls weren’t synchronous (i.e. there was no feedback between the two) making competing actions possible.  By the time the captain came back from his rest break he was unable to take corrective action until it was too late, only realising in the last few seconds that one of the pilots was still pulling back on his yoke.

A number of human factors issues can be identified, many of which can be classed as non-technical skills (skills supplemental to technical skills, but vital to effective working in high hazard industries):

  • Poor communication between the first officers, resulting in counterproductive actions.
  • Lack of leadership – the captain was not present for much of the incident, and the two first officers were of equal rank.  It was not clear who was in charge.
  • Poor decision making – being a highly stressful situation the pilots reverted to ‘rule-based’ thinking and were unable to understand why the plane was not responding.
  • Lack of situation awareness – the captain entered into the situation late and so did not have adequate situation awareness to make a decision on time.  It was also suggested in subsequent BEA recommendations that the stall alarm was unclear, consisting of an aural alarm but with ambiguous visual cues.  If the stall alarm coincided with an instruction (i.e. “pull down”) then this may helped the pilots understand what they needed to do.

Comparisons with the energy sector

There are synergies between this incident and those typical of the energy sector, e.g. within control rooms.  Indeed, these sorts of non-technical skill issues are common to major incidents in many sectors, including the energy sector.

For example, during the Longford incident (1998), due to lack of communication it was unclear what the production manager’s intentions were as he tried to solve an issue with a heat exchanger.  This  led to an issue when an instruction to open valve TC3 was misheard as PC3 – if the operator understood the manager’s intentions it’s possible this error may been avoided.

During the Macondo incident (2010) is was reported that an anomalous pressure reading was explained away as a ‘bladder effect’ by the drill crew, which was accepted by the well site leaders.  Later on, a decision to jettison mud overboard from the riser was not made – if this was made “the consequences of the accident  may have been reduced” (Source: Deepwater Horizon: Accident Investigation Report).  Both of these actions (or inactions) may suggest problems with situation awareness and leadership.

Crew resource management training

Crew resource management (CRM) training has been mandatory for aircrews since the early 1990’s.  CRM basically teaches non-technical skills to enable air crews to work together safely and effectively (focusing on teamwork, leadership, communication, situation awareness, etc.).  It is being used more in the marine sector, and the Health and Safety Executive explored its use in the offshore oil and gas sector in 2003.  Interestingly, Air France Flight 447 is an example of when it wasn’t enough – however the Hudson River crash is an example where non-technical skills and CRM training resulted in a successful outcome (watch the 60 minutes interview with Captain Sullenberger discussing this incident).

Recognising the significance of CRM/non-technical skills to major incidents in the energy sector, the EI Human and Organisational Factors Committee (HOFCOM) is producing guidance on the implementation of CRM-type training in the energy sector.  It will focus on what non-technical skills CRM-type training should include, an assessment of the impact of CRM-type training, the practicalities of implementing CRM-type training (including in selecting the non-technical skills needed in the organisation), and integration of CRM skills in the safety management system.  This will allow managers in the energy and related process industries determine if, why and how they should implement CRM-type training in their own organisations.  For more information visit the EI human factors website.


Rule- and/or risk-based approaches to fatigue management: new EU rules on pilot working hours

June 12, 2012

The European Aviation Safety Agency (EASA) is drafting new rules on pilot and cabin crew working hours, aiming to set a European-wide standard to manage aircrew fatigue.  However the British Airline Pilots Association (Balpa) has noted a number of issues with the new rules including airport stand-by time not counting towards hours worked.  Under one scenario a pilot could be required to land a plane 22 hours after having woken up in the early morning.  However this scenario has been labeled as an ‘extreme’ case by the UK Civil Aviation Authority (UK CAA).

Currently, not all countries have rules regarding the maximum length of hours pilots are allowed to fly, which the EASA rules would define.  In the UK flight duty hours are currently limited to 13.25 hours, which new proposals would extend.  Balpa has released a video (watch it here) explaining its position on the new rules, which it sees as driven by commercial reasons rather than safety.  Yet, whilst the UK CAA acknowledges that in some cases the proposals will see flight hours extended, they will also require airlines to manage fatigue with greater vigilance, suggesting the new rules are part of a growing trend towards the use of fatigue management systems.

Fatigue management systems are processes put in place to monitor and manage fatigue dynamically,  through watching for the warning signs – e.g. monitoring overtime and encouraging staff to report when they are fatigued – and arranging the workload in a way that minimises the risk of human failure due to fatigue.

According to the International Air Transport Association (IATA) website, crew flight duty and flight time limits are being re-examined in light of new, more scientific approaches to managing the risk of crew fatigue.  Whilst crew fatigue has typically been controlled by a simple set of prescriptive rules concerning flight time and duty limitations these rules can sometimes lead to situations in which crew are given rest periods when they are unlikely to be able to sleep, such as when circadian rhythms have been interrupted due to time zone changes.

“It has been demonstrated that the timing of the break is more important than the duration of the break itself. A prescriptive approach, based only on daily time limits, cannot take into account the complex interaction of factors that are linked to hours of work and rest periods.

In other words, prescriptive rules are not the total solution. With a well-managed fatigue risk management system, flight duty time and schedule of operation will be optimised, and this enhances efficiency.”

In 2011 IATA, the International Civil Aviation Organization (ICAO) and the International Federation of Airline Pilots’ Associations (IFALPA) jointly published The Fatigue Risk Management Systems (FRMS) Implementation Guide for Operators.  This guide provides insight in to the methodology and framework for implementing an effective fatigue management programme, the science supporting it, and aims to move the airline industry away from a prescriptive approach to fatigue management towards a risk-based approach.

Energy sector experience

The use of fatigue management systems has become more common in the energy and allied process industries in recent years.  API RP 755, published in 2010 by the American Petroleum Institute, outlines recommended practice for implementing such a system, and the EI is currently updating its publication Improving alertness through effective fatigue management to include good practice on implementing fatigue management systems as well.

Whilst API RP 755 still sets limits for shift length, these limits are fairly relaxed, with maximum unscheduled shift lengths of up to 18 hours, and recommending only a minimum of 8 hours downtime between shifts of 14-16 hours.  These recommendations are somewhat controversial: for example, research presented in the 2006 edition of EI Improving alertness through effective fatigue management shows that the risk of incidents occurring significantly increases after 12 hours on the job (although this publication is currently being updated with the latest research into fatigue management); HSG256, also published in 2006, recommends that shifts longer than 12 hours should be avoided.  As previously reported on HOF Blog, there is also an increasing trend in some industries to account for commuting time when setting shift lengths and patterns, recognising that work and non-work activities both cause fatigue.

However, API RP 755 provides a set of minimum standards which can always be surpassed.  The shift lengths in API RP 755 are in light of having an effective fatigue management system in place which should catch and prevent fatigue from becoming an issue.  Those implementing API RP 755 should also be aware of local regulations – for example, under in EU Working Time Directive 2003/88/EC, companies must provide 11 hours downtime between shifts.

One without the other?

Risk-based approaches to fatigue management are gaining support in both the energy and aviation sectors, and in some cases we are seeing a move away from strict limits on working hours.  These moves are perhaps not without controversy, yet there seems to be evidence to support both methodologies.

It may be worth asking whether they need to be mutually exclusive: is it possible to implement an effective fatigue management system without having cautious prescribed limits on working hours?


Dial 911… I mean 919! Area code system leads to emergency service misdials

May 21, 2012

Here’s an interesting case study about the large consequences of a seemingly small historical oversight in the US system for assigning telephone area codes.

In Raleigh, North Carolina, the area code is 919, which is similar to the emergency services telephone number of 911.  Until recently within Raleigh, dialling the area code was optional, alleviating the risk of misdialling the emergency services.  However, Raleigh is now large enough that dialling the area code is mandatory which has caused an influx of misdialled calls into the emergency services.

Misdials can be verified in a number of ways – such as at the time of the call, through the operator calling back if the caller hung-up, or through sending out police officers to investigate a hang-up.  The problem has gotten so bad that officers are being sent out to investigate hang-ups every 7.5 minutes on average.

The majority of misdials are caused by the elderly, who are less used to having to dial the area code, and businesses, who often need to dial ‘9’ to get an outside line.  Changing the area code is not really considered an option, as it is felt it will be too complicated.  The Director of Emergency Communications has implored citizens to ‘dial carefully’ – though not likely to be an effective solution.

It’s a fascinating problem that highlights how a lack of human factors foresight can go on to cause major operational issues.

Is it worth remembering this case study within industry, particularly when designing communications systems, controls, procedures, etc., so as to future-proof them?