Failure Modes and Effects Analysis (FMEA): The Heart of an Equipment Maintenance Plan

By Michael W. Blanchard, CRE, PE, Life Cycle Engineering

The primary purpose of an equipment maintenance plan (EMP) in a manufacturing facility is to minimize the impact of unplanned events on safety, the environment, and business profitability. The reliability tool best serving as a vehicle to achieve and sustain EMP goals is the failure modes and effects analysis (FMEA). The optimum long-term cost of ownership is typically a result of an effectively facilitated and thoroughly implemented FMEA.

Initial Groundwork

The first step in laying the groundwork for an FMEA-based reliability improvement effort is to identify candidate equipment. The preferred method is by critical analysis, a tool used to evaluate how equipment failures impact organizational performance in order to systematically rank plant assets for the purpose of work prioritization, material classification, preventative maintenance / predictive maintenance (PM/PdM) development, and reliability improvement initiatives.

The criticality analysis is a team effort that requires cross-functional input from Operations, Maintenance, Engineering, and Materials Management groups; and representation from the Environmental Health and Safety (EH&S) organization. This team will identify a prioritized list for EMP development.

Next, ensure that design criteria, existing maintenance tasks, operating strategies and past experiences are available for input to the subsequent FMEA. These are typically found in:

Equipment Files & Drawings
Failure Reporting and Corrective Action System (FRACAS)
Safety Event Tracking
Asset Utilization Database
Computerized Maintenance Management System (CMMS)
Reliability Near-Miss Tracking
Process Database

After the candidate equipment is identified and front-end information is gathered, develop an FMEA project charter that clearly defines the following:

Problem & Goal Statements
Value Proposition
Scope & Boundaries
Team Members (Roles & Responsibilities)
Deliverables
Project Timeline

Conduct the FMEA

FMEAs are not developed in a vacuum – they are typically conducted by a diverse team with different views and expertise of the equipment and processes under investigation. Be sure to include front-line operators and maintenance specialists on the team, and include the process owner as an ad-hoc member.

The first step in conducting the FMEA is to build a functional block diagram (FBD), which shows how different components interact with each other and which describes each component and its function. The FBD shows major components as blocks connected together by lines that indicate the relationships of components and which establish a structure around which the FMEA can be developed. The FBD should always be included with the FMEA.

Next, calculate the baseline Overall Equipment Effectiveness (OEE) and the associated financial impact for the equipment targeted for improvement. Three years of historic data are ideal but as little as one year can suffice. The FMEA project charter is updated with the baseline OEE and target OEE including the value proposition. There are three OEE factors to consider in its calculation:

OEE = Availability x Performance x Quality
- Availability = Operating Time ÷ Planned Production Time
- Performance = (Total Pieces ÷ Operating Time) ÷ Ideal Run Rate
- Quality = Good Pieces ÷ Total Pieces

FMEA Phase 1 Analysis – Definition and Identification

Once the team has identified the focus equipment’s functions and measured baseline reliability, the team can proceed to Phase 1 of the FMEA analysis. The elements of Phase 1 analysis are defined in terms of equipment function and functional failure, as detailed in the FBD, along with each component failure modes, root causes, effects of failure and current-state controls.

There are many types of FMEA and different versions, but we’ll use the pump system FMEA shown in Figure 1 for illustration:

Equipment Function – List the functions of the equipment being studied
Functional Failure – List the situation in which the functions would be considered lost. Most functions will have more than one loss condition
Component – A grouping of parts into some identifiable package that will perform at least one significant function, typically an item identified in the FBD
Potential Failure Mode(s) – The manner by which a possible failure is observed; it generally describes the way the failure occurs or its observable characteristics
Potential Effect(s) of Failure – Describe what will happen if the failure mode occurs
Potential Cause(s) of Failure – Try to anticipate the cause of the failure mode described
Current Controls – What are we doing now (the current state) that prevents, mitigates, or detects the previous cause
Current Process Frequency – How frequent are the current process controls done?

Figure 1. Example FMEA Phase 1 Analysis

The next step in Phase 1 analysis is to identify potential failure modes and their effects, root causes, and detection processes. Brainstorm all possible failure modes, including those that have occurred and rare problems. Then, for each failure mode listed, associate all possible causes. Ask “why” until the root cause is revealed. Review all potential causes of failure, and identify actions already taking place to eliminate the causes of failure. Also, identify how the causes of failure are currently detected and intervention tasks and their frequencies to reduce the severity of the effects on the production process.

This step typically involves some form of condition monitoring or alarms systems to alert the operator in the early stages of failure. Potential failure modes, root causes, failure effects and detectability can be further explored using a variety of supplementary reliability tools:

Brainstorming – Explores potential failure modes, causes and their effects
5-Why Analysis – Drills down to root causes
Fishbone Diagram – Analyzes cause and effect relationships
Data Mining – Quantitatively measures the effects of failure

FMEA Phase 2 Analysis – Quantifying, Prioritizing and Mitigating Risk

To begin Phase 2 analysis, the team quantifies the risk of each failure mode under the current control process. Risk is measured using a risk priority number (RPN) that is the product of severity, likelihood of occurrence, and detectability factors. Assigning RPNs to failure modes helps the team prioritize areas to focus on and can also help in assessing opportunities for improvement.

For every failure mode identified (see Figure 2), the team should answer the following questions and assign the appropriate score:

RPN = SEV x OCC x DET
- Severity (SEV) – If this failure mode occurs, what impact would the failure have on EH&S, Capacity, or Cost? Assign a score between 1 and 10, with 1 meaning “no impact” and 10 meaning “extreme impact”.
- Likelihood of Occurrence (OCC) – How likely is it that this failure mode will occur? Assign a score between 1 and 10, with 1 meaning “very unlikely to occur” and 10 meaning “very likely to occur”.
- Likelihood of Detection (DET) – If this failure mode occurs, how likely is it that the failure will be detected? Assign a score between 1 and 10, with 1 meaning “very likely to be detected” and 10 meaning “very unlikely to be detected”.

There is no value above which it is mandatory to take action or below which the failure mode is exempt from action. However, start with the top 20% RPNs and prioritize using the following guidelines:

Severity (SEV) is given the most weight when assessing risk.
Severity and Occurrence (SEV x OCC) combination would then be considered.

Figure 2. Example FMEA Phase 2 Analysis

The next step in Phase 2 analysis is to minimize risk by utilizing the team’s expertise to brainstorm ways of reducing the severity, likelihood of occurrence, or detectability of the failure. Include the process owner in developing the improvements, as this will prove invaluable when negotiating roadblocks in their implementation.

Define risk mitigation tasks and their respective frequencies for the top 20% RPNs, and prioritize the implementation of those tasks that provide maximum value by either detecting failure at the start of its potential failure (PF) curve (see Figure 3) or by preventing a failure from occurring in the first place through re-design efforts.

Figure 3. Potential Failure (PF) Curve

Potential mitigation tasks, frequencies, their potential value, and ownership can be further explored using a variety of supplementary reliability tools:

Brainstorming – Explores potential risk reduction tasks
Cost/Benefit Analysis – Assists the team to select optimum solutions
Potential Failure Curves – Maps failure development
RACI Chart – Aligns roles and responsibilities

Selected tasks are assigned ownership to the appropriate functions including detailed responsibilities and timing. New RPNs are calculated using the projected severity, likelihood of occurrence, or detectability factors and added to the FMEA.

Implementing Solutions

When poorly implemented, even the best solutions are doomed to fail, so don’t treat this phase of the project lightly. Below are several key actions necessary to effectively implement solutions:

Gain support of the process owner
Obtain agreement from the person being assigned the action items
Clearly define tasks including ownership and delivery dates
Follow the Management of Change (MOC) process
Input action items into your company’s tracking system
Monitor the effectiveness of action item implementation
Update FMEA

One year after full implementation, recalculate OEE and estimate value delivered by the EMP, and then promptly communicate success to key stakeholders.

The EMP and FMEA are living documents and require periodic reviews. Whenever a failure occurs the FMEA should be updated with new failure modes or root causes. If the failure mode was previously identified, the mitigation strategy should be re-evaluated. The documents should also be proactively reviewed annually as part of your company’s Document Control process. Use this powerful strategy – do not let it collect dust!

Further information

Criticality Analysis: Single Point Lesson: Criticality Analysis; Life Cycle Engineering;
Project Charter: Meaning, Importance and its Elements; Management Study Guide;
Failure Mode Effect Analysis: FMEA from Theory to Execution, 2^nd Edition; D.H.Stamatis

Michael Blanchard is a Reliability Engineering Subject Matter Expert with Life Cycle Engineering (LCE). He has more than 25 years’ experience as a reliability leader in a variety of industries. Mike is a licensed Professional Engineer, a Certified Reliability Engineer, and a Certified Lean-Six Sigma Master Black Belt. You can reach Mike at mblanchard@LCE.com.

Reliability & Asset Management

Reliability Engineering, Risk & Productivity

Maintenance Program Optimization

Digital Transformation

Talent & Staffing

Organizational Change Management

Education & Certifications

Coaching & Custom Programs

Apprenticeship Programs

SEA Coach®

Augmented Reality & Electronic Work Instructions

Cyber & IT Training

Digital Optimization & Transformation

IT & Operational Technology

Cybersecurity

Cyber & IT Workforce Development

Waterfront Fleet Services

Maintenance Program Optimization

Logistics

Augmented Reality Training

Marine Engineering

Marine Engineering

Software & Cyber Engineering

Reliability Engineering

Engineering Design & Manufacturing

Logistics

Commercial

Goverment

Just How Safe and Reliable Is Your Plant?

Plan for Risk, Let Reliability Flow

A Reliable Operation Is More Efficient, Safe, and Standardized

Investing in Reliability Pays off for Commercial Bakery

HackWarz®: Fueling Innovation and Expertise in the Cybersecurity Arena

Life Cycle Engineering Receives $15.7 Million Prime Contract Award

Who We Are

Our Team

Careers & Culture

News & Updates

Contract Vehicles

Why Choose Us